Toddy Prawiraharjo wrote:
>
> From: Rob Dixon [mailto:[EMAIL PROTECTED]
> Sent: Friday, 29 September 2006 10:04 AM
> To: [email protected]
> Subject: Re: Querying very big database
>>
>> Toddy Prawiraharjo wrote:
>>
>>> Hello all,
>>>
>>> I am not sure if i'm inside the correct mailing list, but it should be
>>> simple for you guys. Recently i started a simple perl program to catch all
>>> syslog from my netscreen firewall, and put them into a database, so later
>>> can do some analysing on it (sort of AWStats). the database itself is pretty
>>> big, with 600k entry (for 3 weeks of the running firewall) worth 80 megs in
>>> mysql. My question is, during the analysing my perl script giving very slow
>>> processing. I know the query to mysql itself is pretty quick, less than 2
>>> seconds to return about 40k to 60k result to perl to be analysed
>>>
>>> Here it goes:
>>> <snipped>
>>> $query = "SELECT src, rcvd FROM no_name WHERE start_date >=
>>> '$fromdate' AND start_date <= '$todate';";
>>> $sth = $dbh->prepare($query);
>>> $sth -> execute() || &errorhere("SQL Error: $DBI::errstr");
>>> print "$query<br>";
>>>
>>> my %src = ();
>>> my ($totalsent, $totalrcvd, $srcchecked);
>>>
>>> while(@row = $sth->fetchrow_array){
>>> $srcchecked = 0;
>>> while($source, $total = each (%src)){
>>> if ($source eq "$row[0]"){
>>> $srcchecked = 1;
>>> $src{$source} = $src{$source} + $row[1];
>>> }
>>> }
>>> if ($srcchecked != 1){
>>> print "$row[0]<br>";
>>> $src{$row[0]} = $row[1];
>>> }
>>> $totalrcvd = $totalsent + $row[1];
>>> #$totalrcvd = $totalrcvd + $row[2];
>>> }
>>> </snipped>
>>>
>>> The while loop to do analysing on the data take more than 15 minutes, and
>>> that only to a query for 1 day long records($fromdate-$todate) So, if I
>>> want perl to give me faster result, what's the solution? Did I make
>>> fundamentally wrong approach?
>>
>>
>> You've missed the whole point of hashes, which is that they will access
>> a data
>> value directly from a key. What you've written is similar to
>>
>> my $i = 0;
>> my $value;
>> foreach my $v (@array) {
>> if ($i++ == $n) {
>> $value = $v;
>> }
>> }
>>
>> instead of
>>
>> $value = $array[$n];
>>
>> Try this while loop instead of the one you have and see if you get an
>> improvement:
>>
>> while (@row = $sth->fetchrow_array) {
>>
>> my $key = $row[0];
>>
>> print "$key<br>" unless exists $src{$key};
>> $src{$key} += $row[1];
>>
>> $totalsent += $row[1];
>> #$totalrcvd += $row[2];
>> }
>>
>> HTH,
>>
>> Rob
>>
>
> THX! I always knew it's my n00b scripting that caused the problem. It
> now down to 4 lines inside while loop, and I'm flying! The processing
> down from ~15 minutes to 4 secs! But, for longer date range (2 weeks
> time limit) it clocked at 25 wallclock secs (with processing ~350k mysql
> return entries). Any more way to process this beast any faster? Any good
> reading or reference about this? Just afraid, if this rolled to
> production level, the report for months activities will take minutes to
> make!
>
> <snipped>
> $query = "SELECT src, sent, rcvd FROM no_name WHERE start_date
> >= '$fromdate' AND start_date <= '$todate';";
>
> $sth = $dbh->prepare($query);
> $sth -> execute() || &errorhere("SQL Error: $DBI::errstr");
> #print "$query<br>";
>
> my %src = ();
> my ($totalsent, $totalrcvd);
>
> while(@row = $sth->fetchrow_array){
> my $key = $row[0];
> $src{$key} += $row[2];
> $totalsent = $totalsent + $row[1];
> $totalrcvd = $totalrcvd + $row[2];
> }
> </snipped>
Hi Toddy
(Please bottom-post your replies. One day everybody in the world will know to do
this and perl.beginners will be a happy, shiny place!)
Let the database engine do it for you!
my $query = qq(
SELECT src, SUM(sent) AS totalsent, SUM(rcvd) AS totalrcvd
FROM no_name
WHERE start_date >= '$fromdate' AND start_date <= '$todate'
GROUP BY src
);
my $sth = $dbh->prepare($query);
$sth->execute() || errorhere("SQL Error: $DBI::errstr");
my $src = $sth->fetchall_hashref('src');
foreach my $srcval(keys %$src) {
$totalsent += $src->{$srcval}{totalsent};
$totalrcvd += $src->{$srcval}{totalrcvd};
}
(I couldn't easily test this, although it is syntax checked, so beware)
Now, $src is a reference to a hash similar to your original %src, but has
another hash reference as its values, so you can extract
my $totalsent = $src{src1}{totalsent};
and so on.
HTH,
Rob
--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
<http://learn.perl.org/> <http://learn.perl.org/first-response>