Re: How to promote the efficiency
On Dec 8, 2005, at 9:37, Jennifer Garner wrote: hi,lists, I have a file which is so large,which looking as: 61.156.49.18:28360 61.183.148.130:27433 222.90.207.251:25700 202.117.64.161:25054 218.58.59.73:24866 221.233.24.9:22507 222.187.124.4:21016 ... and more than 4500 lines. Is the time spent on the file process or on printing to the console? If on the file process, would it be possible to put the file somewhere for download so we can benchmark alternatives against real data? If on printing try redirecting to a file with for instance. -- fxn -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/ http://learn.perl.org/first-response
Re: How to promote the efficiency
On Dec 8, 2005, at 11:44, Xavier Noria wrote: On Dec 8, 2005, at 9:37, Jennifer Garner wrote: hi,lists, I have a file which is so large,which looking as: 61.156.49.18:28360 61.183.148.130:27433 222.90.207.251:25700 202.117.64.161:25054 218.58.59.73:24866 221.233.24.9:22507 222.187.124.4:21016 ... and more than 4500 lines. Is the time spent on the file process or on printing to the console? If on the file process, would it be possible to put the file somewhere for download so we can benchmark alternatives against real data? If on printing try redirecting to a file with for instance. There's a RESULT filehandle in the print call I missed. Forget that remark on printing then. -- fxn -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/ http://learn.perl.org/first-response
Re: How to promote the efficiency
Sorry, the file is more than 900M, too large to download. I have run it for one day,and still have nothing to output.Crying... I think maybe some arithmetic is useful for me,and now I'm thinking over it. On 12/8/05, Xavier Noria [EMAIL PROTECTED] wrote: On Dec 8, 2005, at 9:37, Jennifer Garner wrote: hi,lists, I have a file which is so large,which looking as: 61.156.49.18:28360 61.183.148.130:27433 222.90.207.251:25700 202.117.64.161:25054 218.58.59.73:24866 221.233.24.9:22507 222.187.124.4:21016 ... and more than 4500 lines. Is the time spent on the file process or on printing to the console? If on the file process, would it be possible to put the file somewhere for download so we can benchmark alternatives against real data? If on printing try redirecting to a file with for instance. -- fxn -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/ http://learn.perl.org/first-response
Re: How to promote the efficiency
Jennifer Garner [JG], on Thursday, December 8, 2005 at 19:21 (+0800) typed the following: JG Sorry, the file is more than 900M, too large to download. JG I have run it for one day,and still have nothing to output.Crying... JG I think maybe some arithmetic is useful for me,and now I'm thinking over it. ok, why you cache the results ? try to print them to _file_, when conditions pass. After that, you can sort them, uniq and so on. Did you look, how memory eats your script ? also you dont use $num there, so I would do splitting in other way, something like this: my ($ip) = $_ =~ /^([^:]+)/o; -- How do you protect mail on web? I use http://www.2pu.net [I love cartoons. - Yakko Warner] -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/ http://learn.perl.org/first-response
RE: How to promote the efficiency
Jennifer Garner mailto:[EMAIL PROTECTED] wrote: : open (FILE,$file) or die $!; : while(FILE) : { : next if /unknown/o; : next if /^192\.168\./o; : chomp; : my ($ip,$num) = split/:/,$_; : if ($ip = ~ /^(\d+\.\d+\.\d+\.)(\d+)/o){ : my ($net,$bit) = ($1,$2); : $total{$net}{low}{$bit} = 1 if $bit 128; : $total{$net}{high}{$bit} = 1 if $bit =128 and : $bit 255; : $total{$net}{total}{$bit} = 1; : } : } On the file reading side: We could get rid of those /o modifiers on the regexes. IIRC, they're for variables in regexes. You can also eliminate the chomp() since we are throwing away that part of the line. We can remove the lexical variables which are slowing things down each time they are declared. And finally, we can eliminate the split(). while( FILE ) { next if /unknown/; next if /^192\.168\./; next unless /^(\d+\.\d+\.\d+\.)(\d+):/; $total{$1}{low}{$2} = 1 if $2 128; $total{$1}{high}{$2}= 1 if $2 =128 and $2 255; $total{$1}{total}{$2} = 1; } : close FILE; HTH, Charles K. Clarkson -- Mobile Homes Specialist 254 968-8328 . . . With Liberty and Justice for all (heterosexuals). -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/ http://learn.perl.org/first-response
Re: How to promote the efficiency
Jennifer Garner wrote: hi,lists, I have a file which is so large,which looking as: 61.156.49.18:28360 61.183.148.130:27433 222.90.207.251:25700 202.117.64.161:25054 218.58.59.73:24866 221.233.24.9:22507 222.187.124.4:21016 ... and more than 4500 lines. the part after : is no use for me,I only need the IP. for each IP,such as '218.58.59.73', I want to get this result: 218.58.59. xxx yyy xxx+yyy I want to know how many IP are in the range of '218.58.59.1' to ' 218.58.59.127',this is 'xxx'; and how many IP are in the range of '218.58.59.128' to '218.58.59.254',this is 'yyy'. I write this code: open (FILE,$file) or die $!; while(FILE) { next if /unknown/o; next if /^192\.168\./o; chomp; my ($ip,$num) = split/:/,$_; if ($ip = ~ /^(\d+\.\d+\.\d+\.)(\d+)/o){ my ($net,$bit) = ($1,$2); $total{$net}{low}{$bit} = 1 if $bit 128; $total{$net}{high}{$bit} = 1 if $bit =128 and $bit 255; $total{$net}{total}{$bit} = 1; # Accumulate the totals directly if( $bit 128 ){ $total{$net}{low} ++; }else{ $total{$net}{high} ++; } $total{$net}{total} ++; } } close FILE; foreach (sort { scalar keys %{$total{$b}{total}} = scalar keys %{$total{$a}{total}} } keys %total) { print RESULT $_,\t,scalar keys %{$total{$_}{low}},\t, scalar keys %{$total{$_}{high}},\t,scalar keys %{$total{$_}{total}},\n; } OK, we're going to use a technique called the Schwartzian transformation, named after our good friend Randal L. Schwartz, who invented it. foreach ( map { $_-[0] } sort { $b-[1] = $a-[1] } map { [ $_, $total{$_}{total} ] } keys %total ){ print RESULT $_\t$total{$_}{low}\t$total{$_}{high}\t$total{$_}{total}\n; } -- Just my 0.0002 million dollars worth, --- Shawn Probability is now one. Any problems that are left are your own. SS Heart of Gold, _The Hitchhiker's Guide to the Galaxy_ -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/ http://learn.perl.org/first-response
Re: How to promote the efficiency
Jennifer Garner wrote: hi,lists, Hello, I have a file which is so large,which looking as: 61.156.49.18:28360 61.183.148.130:27433 222.90.207.251:25700 202.117.64.161:25054 218.58.59.73:24866 221.233.24.9:22507 222.187.124.4:21016 ... and more than 4500 lines. the part after : is no use for me,I only need the IP. for each IP,such as '218.58.59.73', I want to get this result: 218.58.59. xxx yyy xxx+yyy I want to know how many IP are in the range of '218.58.59.1' to ' 218.58.59.127',this is 'xxx'; and how many IP are in the range of '218.58.59.128' to '218.58.59.254',this is 'yyy'. I write this code: open (FILE,$file) or die $!; while(FILE) { next if /unknown/o; next if /^192\.168\./o; chomp; my ($ip,$num) = split/:/,$_; if ($ip = ~ /^(\d+\.\d+\.\d+\.)(\d+)/o){ my ($net,$bit) = ($1,$2); $total{$net}{low}{$bit} = 1 if $bit 128; $total{$net}{high}{$bit} = 1 if $bit =128 and $bit 255; $total{$net}{total}{$bit} = 1; } } close FILE; foreach (sort { scalar keys %{$total{$b}{total}} = scalar keys %{$total{$a}{total}} } keys %total) { print RESULT $_,\t,scalar keys %{$total{$_}{low}},\t, scalar keys %{$total{$_}{high}},\t,scalar keys %{$total{$_}{total}},\n; } but it's too slow for me to wait the result.How can I get it more effective and run less time?thanks. This is quite a bit faster then your version: open FILE, '', $file or die Cannot open '$file' $!; my ( %low, %high, %total ); while ( FILE ) { next if /unknown/; next if /^192\.168\./; next unless /^(\d+\.\d+\.\d+\.)(\d+)/; if ( $2 128 ) { $low{ $1 }++; } else { $high{ $1 }++; } $total{ $1 }++; } close FILE; for ( sort { $total{ $b } = $total{ $a } } keys %total ) { print RESULT $_\t, $low{ $_ } || 0, \t, $high{ $_ } || 0, \t, $total{ $_ }, \n; } John -- use Perl; program fulfillment -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/ http://learn.perl.org/first-response
Re: How to promote the efficiency
Thank you for John.I think your method would be much faster than mine. Now I'm going to rewrite this program with C language,but I'll test it using all the ways given by everyone here.Thanks. On 12/8/05, John W. Krahn [EMAIL PROTECTED] wrote: Jennifer Garner wrote: hi,lists, Hello, I have a file which is so large,which looking as: 61.156.49.18:28360 61.183.148.130:27433 222.90.207.251:25700 202.117.64.161:25054 218.58.59.73:24866 221.233.24.9:22507 222.187.124.4:21016 ... and more than 4500 lines. the part after : is no use for me,I only need the IP. for each IP,such as '218.58.59.73', I want to get this result: 218.58.59. xxx yyy xxx+yyy I want to know how many IP are in the range of '218.58.59.1' to ' 218.58.59.127',this is 'xxx'; and how many IP are in the range of '218.58.59.128' to '218.58.59.254 ',this is 'yyy'. I write this code: open (FILE,$file) or die $!; while(FILE) { next if /unknown/o; next if /^192\.168\./o; chomp; my ($ip,$num) = split/:/,$_; if ($ip = ~ /^(\d+\.\d+\.\d+\.)(\d+)/o){ my ($net,$bit) = ($1,$2); $total{$net}{low}{$bit} = 1 if $bit 128; $total{$net}{high}{$bit} = 1 if $bit =128 and $bit 255; $total{$net}{total}{$bit} = 1; } } close FILE; foreach (sort { scalar keys %{$total{$b}{total}} = scalar keys %{$total{$a}{total}} } keys %total) { print RESULT $_,\t,scalar keys %{$total{$_}{low}},\t, scalar keys %{$total{$_}{high}},\t,scalar keys %{$total{$_}{total}},\n; } but it's too slow for me to wait the result.How can I get it more effective and run less time?thanks. This is quite a bit faster then your version: open FILE, '', $file or die Cannot open '$file' $!; my ( %low, %high, %total ); while ( FILE ) { next if /unknown/; next if /^192\.168\./; next unless /^(\d+\.\d+\.\d+\.)(\d+)/; if ( $2 128 ) { $low{ $1 }++; } else { $high{ $1 }++; } $total{ $1 }++; } close FILE; for ( sort { $total{ $b } = $total{ $a } } keys %total ) { print RESULT $_\t, $low{ $_ } || 0, \t, $high{ $_ } || 0, \t, $total{ $_ }, \n; } John -- use Perl; program fulfillment -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/ http://learn.perl.org/first-response
Re: How to promote the efficiency
Jennifer Garner schreef: I have a file which is so large,which looking as: 61.156.49.18:28360 222.187.124.4:21016 and more than 45,000,000 lines. the part after : is no use for me, I only need the IP. You could first convert the file to integers, so from 20 bytes per line down to 4 bytes per line. Or use 61.156.49 = 61*256+156*16+49 = 18161 *2 as a direct index into an array ((255*256+255*16+255+1)*2=139232). Are you trying to find crowded ranges from these numbers? Like 222.187.0.0/16 etc. -- Affijn, Ruud Gewoon is een tijger. -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/ http://learn.perl.org/first-response
Re: How to promote the efficiency
John W. Krahn wrote: if ( $2 128 ) { $low{ $1 }++; } else { $high{ $1 }++; } $total{ $1 }++; Why track all three? You could just track (say) low and total, and derive high as (total - low) at print time. -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/ http://learn.perl.org/first-response
Re: How to promote the efficiency
Hi,John I think you have understanded wrongly with my meaning. The result of $low{ $1 }++ is no use for me.I just want the frequency of IP exists. For example, if there are some IPs exists in '22.33.44.0' : 22.33.44.11 22.33.44.22 22.33.44.22 22.33.44.33 22.33.44.33 22.33.44.44 22.33.44.55 Now I only want the uniq times of all IP appeared,this is 5. On 12/8/05, John W. Krahn [EMAIL PROTECTED] wrote: Jennifer Garner wrote: hi,lists, Hello, I have a file which is so large,which looking as: 61.156.49.18:28360 61.183.148.130:27433 222.90.207.251:25700 202.117.64.161:25054 218.58.59.73:24866 221.233.24.9:22507 222.187.124.4:21016 ... and more than 4500 lines. the part after : is no use for me,I only need the IP. for each IP,such as '218.58.59.73', I want to get this result: 218.58.59. xxx yyy xxx+yyy I want to know how many IP are in the range of '218.58.59.1' to ' 218.58.59.127',this is 'xxx'; and how many IP are in the range of '218.58.59.128' to '218.58.59.254 ',this is 'yyy'. I write this code: open (FILE,$file) or die $!; while(FILE) { next if /unknown/o; next if /^192\.168\./o; chomp; my ($ip,$num) = split/:/,$_; if ($ip = ~ /^(\d+\.\d+\.\d+\.)(\d+)/o){ my ($net,$bit) = ($1,$2); $total{$net}{low}{$bit} = 1 if $bit 128; $total{$net}{high}{$bit} = 1 if $bit =128 and $bit 255; $total{$net}{total}{$bit} = 1; } } close FILE; foreach (sort { scalar keys %{$total{$b}{total}} = scalar keys %{$total{$a}{total}} } keys %total) { print RESULT $_,\t,scalar keys %{$total{$_}{low}},\t, scalar keys %{$total{$_}{high}},\t,scalar keys %{$total{$_}{total}},\n; } but it's too slow for me to wait the result.How can I get it more effective and run less time?thanks. This is quite a bit faster then your version: open FILE, '', $file or die Cannot open '$file' $!; my ( %low, %high, %total ); while ( FILE ) { next if /unknown/; next if /^192\.168\./; next unless /^(\d+\.\d+\.\d+\.)(\d+)/; if ( $2 128 ) { $low{ $1 }++; } else { $high{ $1 }++; } $total{ $1 }++; } close FILE; for ( sort { $total{ $b } = $total{ $a } } keys %total ) { print RESULT $_\t, $low{ $_ } || 0, \t, $high{ $_ } || 0, \t, $total{ $_ }, \n; } John -- use Perl; program fulfillment -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/ http://learn.perl.org/first-response
Re: How to promote the efficiency
On Fri, 9 Dec 2005, Jennifer Garner wrote: I think you have understanded wrongly with my meaning. The result of $low{ $1 }++ is no use for me.I just want the frequency of IP exists. For example, if there are some IPs exists in '22.33.44.0' : 22.33.44.11 22.33.44.22 22.33.44.22 22.33.44.33 22.33.44.33 22.33.44.44 22.33.44.55 Now I only want the uniq times of all IP appeared,this is 5. So, some kind of structure like foreach @ip { $seen_ip{ $_ }++; } And then work on the keys in the %seen_ip hash. -- Chris Devers ÒñÙ§ÛkN» -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/ http://learn.perl.org/first-response
Re: How to promote the efficiency
Jennifer Garner wrote: Hi,John Hello, I think you have understanded wrongly with my meaning. The result of $low{ $1 }++ is no use for me.I just want the frequency of IP exists. For example, if there are some IPs exists in '22.33.44.0' : 22.33.44.11 22.33.44.22 22.33.44.22 22.33.44.33 22.33.44.33 22.33.44.44 22.33.44.55 Now I only want the uniq times of all IP appeared,this is 5. Do you mean something like this: use Socket; my ( %seen, %total ); while ( FILE ) { next if /unknown/; next if /^192\.168\./; next unless /^(\d+\.\d+\.\d+\.\d+)/; my $ip = inet_aton $1; $total{ $ip \xFF\xFF\xFF\0 }++ unless $seen{ $ip }++; } close FILE; for ( sort { $total{ $b } = $total{ $a } } keys %total ) { print RESULT inet_ntoa( $_ ), \t, $total{ $_ }, \n; } John -- use Perl; program fulfillment -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/ http://learn.perl.org/first-response
Re: How to promote the efficiency
Now I have resolved this problem,still using perl. Just a little modification to that code,shown as below: foreach my $file (@files) { open (FILE,$file) or die $!; while(FILE) { next if /unknown/o; next if /^192\.168\./o; next unless /^(\d+\.\d+\.\d+\.)(\d+)/o; if ( $2 128 ) { $low{$1}{$2}=1; } $total{$1}{$2}=1; } close FILE; } open (RESULT,,allIP.txt) or die $!; for ( sort { scalar keys %{$total{$b}} = scalar keys %{$total{$a}} } keys %total ) { print RESULT $_\t, scalar keys %{$low{$_}}, \t, (scalar keys %{$total{$_}}) - (scalar keys %{$low{$_}}), \t,scalar keys %{$total{$_}}, \n; } close RESULT; Now it run very fast,get the results in 20 minutes.Thanks for all. On 12/9/05, John W. Krahn [EMAIL PROTECTED] wrote: Jennifer Garner wrote: Hi,John Hello, I think you have understanded wrongly with my meaning. The result of $low{ $1 }++ is no use for me.I just want the frequency of IP exists. For example, if there are some IPs exists in '22.33.44.0' : 22.33.44.11 22.33.44.22 22.33.44.22 22.33.44.33 22.33.44.33 22.33.44.44 22.33.44.55 Now I only want the uniq times of all IP appeared,this is 5. Do you mean something like this: use Socket; my ( %seen, %total ); while ( FILE ) { next if /unknown/; next if /^192\.168\./; next unless /^(\d+\.\d+\.\d+\.\d+)/; my $ip = inet_aton $1; $total{ $ip \xFF\xFF\xFF\0 }++ unless $seen{ $ip }++; } close FILE; for ( sort { $total{ $b } = $total{ $a } } keys %total ) { print RESULT inet_ntoa( $_ ), \t, $total{ $_ }, \n; } John -- use Perl; program fulfillment -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/ http://learn.perl.org/first-response