Re: How to promote the efficiency

2005-12-08 Thread Xavier Noria

On Dec 8, 2005, at 9:37, Jennifer Garner wrote:


hi,lists,

I have a file which is so large,which looking as:

61.156.49.18:28360
61.183.148.130:27433
222.90.207.251:25700
202.117.64.161:25054
218.58.59.73:24866
221.233.24.9:22507
222.187.124.4:21016
...

and more than 4500 lines.


Is the time spent on the file process or on printing to the console?  
If on the file process, would it be possible to put the file  
somewhere for download so we can benchmark alternatives against real  
data? If on printing try redirecting to a file with  for instance.


-- fxn


--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/ http://learn.perl.org/first-response




Re: How to promote the efficiency

2005-12-08 Thread Xavier Noria

On Dec 8, 2005, at 11:44, Xavier Noria wrote:


On Dec 8, 2005, at 9:37, Jennifer Garner wrote:


hi,lists,

I have a file which is so large,which looking as:

61.156.49.18:28360
61.183.148.130:27433
222.90.207.251:25700
202.117.64.161:25054
218.58.59.73:24866
221.233.24.9:22507
222.187.124.4:21016
...

and more than 4500 lines.


Is the time spent on the file process or on printing to the  
console? If on the file process, would it be possible to put the  
file somewhere for download so we can benchmark alternatives  
against real data? If on printing try redirecting to a file with  
 for instance.


There's a RESULT filehandle in the print call I missed. Forget that  
remark on printing then.


-- fxn



--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/ http://learn.perl.org/first-response




Re: How to promote the efficiency

2005-12-08 Thread Jennifer Garner
Sorry, the file is more than 900M, too large to download.
I have run it for one day,and still have nothing to output.Crying...
I think maybe some arithmetic is useful for me,and now I'm thinking over it.


On 12/8/05, Xavier Noria [EMAIL PROTECTED] wrote:

 On Dec 8, 2005, at 9:37, Jennifer Garner wrote:

  hi,lists,
 
  I have a file which is so large,which looking as:
 
  61.156.49.18:28360
  61.183.148.130:27433
  222.90.207.251:25700
  202.117.64.161:25054
  218.58.59.73:24866
  221.233.24.9:22507
  222.187.124.4:21016
  ...
 
  and more than 4500 lines.

 Is the time spent on the file process or on printing to the console?
 If on the file process, would it be possible to put the file
 somewhere for download so we can benchmark alternatives against real
 data? If on printing try redirecting to a file with  for instance.

 -- fxn


 --
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]
 http://learn.perl.org/ http://learn.perl.org/first-response





Re: How to promote the efficiency

2005-12-08 Thread Ing. Branislav Gerzo
Jennifer Garner [JG], on Thursday, December 8, 2005 at 19:21 (+0800)
typed the following:

JG Sorry, the file is more than 900M, too large to download.
JG I have run it for one day,and still have nothing to output.Crying...
JG I think maybe some arithmetic is useful for me,and now I'm thinking over it.

ok, why you cache the results ? try to print them to _file_, when
conditions pass.
After that, you can sort them, uniq and so on.
Did you look, how memory eats your script ?
also you dont use $num there, so I would do splitting in other way,
something like this:
my ($ip) = $_ =~ /^([^:]+)/o;

-- 

How do you protect mail on web? I use http://www.2pu.net

[I love cartoons. - Yakko Warner]



-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/ http://learn.perl.org/first-response




RE: How to promote the efficiency

2005-12-08 Thread Charles K. Clarkson
Jennifer Garner mailto:[EMAIL PROTECTED] wrote:
 
: open (FILE,$file) or die $!;
: while(FILE)
: {
: next if /unknown/o;
: next if /^192\.168\./o;
: chomp;
: my ($ip,$num) = split/:/,$_;
: if ($ip = ~  /^(\d+\.\d+\.\d+\.)(\d+)/o){
: my ($net,$bit) = ($1,$2);
: $total{$net}{low}{$bit} = 1 if $bit  128;
: $total{$net}{high}{$bit} = 1 if $bit =128 and
: $bit  255;
: $total{$net}{total}{$bit} = 1;
: }
: }

On the file reading side:

We could get rid of those /o modifiers on the regexes. IIRC,
they're for variables in regexes. You can also eliminate the
chomp() since we are throwing away that part of the line. We can
remove the lexical variables which are slowing things down each
time they are declared. And finally, we can eliminate the split().

while( FILE ) {
next if /unknown/;
next if /^192\.168\./;
next unless /^(\d+\.\d+\.\d+\.)(\d+):/;

$total{$1}{low}{$2} = 1 if $2  128;
$total{$1}{high}{$2}= 1 if $2 =128 and $2  255;
$total{$1}{total}{$2}   = 1;

}

: close FILE;



HTH,

Charles K. Clarkson
-- 
Mobile Homes Specialist
254 968-8328

 . . . With Liberty and Justice for all (heterosexuals).


-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/ http://learn.perl.org/first-response




Re: How to promote the efficiency

2005-12-08 Thread Shawn Corey

Jennifer Garner wrote:

hi,lists,

I have a file which is so large,which looking as:

61.156.49.18:28360
61.183.148.130:27433
222.90.207.251:25700
202.117.64.161:25054
218.58.59.73:24866
221.233.24.9:22507
222.187.124.4:21016
...

and more than 4500 lines.

the part after : is no use for me,I only need the IP.

for each IP,such as '218.58.59.73', I want to get this result:

218.58.59.  xxx yyy xxx+yyy

I want to know how many IP are in the range of '218.58.59.1' to '
218.58.59.127',this is 'xxx';
and how many IP are in the range of '218.58.59.128' to '218.58.59.254',this
is 'yyy'.

I write this code:
open (FILE,$file) or die $!;
while(FILE)
{
next if /unknown/o;
next if /^192\.168\./o;
chomp;
my ($ip,$num) = split/:/,$_;
if ($ip = ~  /^(\d+\.\d+\.\d+\.)(\d+)/o){
my ($net,$bit) = ($1,$2);
$total{$net}{low}{$bit} = 1 if $bit  128;
$total{$net}{high}{$bit} = 1 if $bit =128 and $bit 
255;
$total{$net}{total}{$bit} = 1;


# Accumulate the totals directly
if( $bit  128 ){
  $total{$net}{low} ++;
}else{
  $total{$net}{high} ++;
}
$total{$net}{total} ++;


}
}
close FILE;

foreach (sort { scalar keys %{$total{$b}{total}} = scalar keys
%{$total{$a}{total}} } keys %total)
{
print RESULT $_,\t,scalar keys %{$total{$_}{low}},\t,
  scalar keys %{$total{$_}{high}},\t,scalar keys
%{$total{$_}{total}},\n;
}



OK, we're going to use a technique called the Schwartzian 
transformation, named after our good friend Randal L. Schwartz, who 
invented it.


foreach (
  map { $_-[0] }
  sort { $b-[1] = $a-[1] }
  map { [ $_, $total{$_}{total} ] }
  keys %total
){
  print RESULT 
$_\t$total{$_}{low}\t$total{$_}{high}\t$total{$_}{total}\n;

}


--

Just my 0.0002 million dollars worth,
   --- Shawn

Probability is now one. Any problems that are left are your own.
   SS Heart of Gold, _The Hitchhiker's Guide to the Galaxy_

--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/ http://learn.perl.org/first-response




Re: How to promote the efficiency

2005-12-08 Thread John W. Krahn
Jennifer Garner wrote:
 hi,lists,

Hello,

 I have a file which is so large,which looking as:
 
 61.156.49.18:28360
 61.183.148.130:27433
 222.90.207.251:25700
 202.117.64.161:25054
 218.58.59.73:24866
 221.233.24.9:22507
 222.187.124.4:21016
 ...
 
 and more than 4500 lines.
 
 the part after : is no use for me,I only need the IP.
 
 for each IP,such as '218.58.59.73', I want to get this result:
 
 218.58.59.  xxx yyy xxx+yyy
 
 I want to know how many IP are in the range of '218.58.59.1' to '
 218.58.59.127',this is 'xxx';
 and how many IP are in the range of '218.58.59.128' to '218.58.59.254',this
 is 'yyy'.
 
 I write this code:
 open (FILE,$file) or die $!;
 while(FILE)
 {
 next if /unknown/o;
 next if /^192\.168\./o;
 chomp;
 my ($ip,$num) = split/:/,$_;
 if ($ip = ~  /^(\d+\.\d+\.\d+\.)(\d+)/o){
 my ($net,$bit) = ($1,$2);
 $total{$net}{low}{$bit} = 1 if $bit  128;
 $total{$net}{high}{$bit} = 1 if $bit =128 and $bit 
 255;
 $total{$net}{total}{$bit} = 1;
 }
 }
 close FILE;
 
 foreach (sort { scalar keys %{$total{$b}{total}} = scalar keys
 %{$total{$a}{total}} } keys %total)
 {
 print RESULT $_,\t,scalar keys %{$total{$_}{low}},\t,
   scalar keys %{$total{$_}{high}},\t,scalar keys
 %{$total{$_}{total}},\n;
 }
 
 
 but it's too slow for me to wait the result.How can I get it more effective
 and run less time?thanks.

This is quite a bit faster then your version:

open FILE, '', $file or die Cannot open '$file' $!;

my ( %low, %high, %total );
while ( FILE ) {
next if /unknown/;
next if /^192\.168\./;

next unless /^(\d+\.\d+\.\d+\.)(\d+)/;

if ( $2  128 ) {
$low{ $1 }++;
}
else {
$high{ $1 }++;
}

$total{ $1 }++;
}

close FILE;


for ( sort { $total{ $b } = $total{ $a } } keys %total ) {
print RESULT $_\t, $low{ $_ } || 0, \t, $high{ $_ } || 0, \t,
$total{ $_ }, \n;
}


John
-- 
use Perl;
program
fulfillment

-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/ http://learn.perl.org/first-response




Re: How to promote the efficiency

2005-12-08 Thread Jennifer Garner
Thank you for John.I think your method would be much faster than mine.
Now I'm going to rewrite this program with C language,but I'll test it using
all the ways given by everyone here.Thanks.


On 12/8/05, John W. Krahn [EMAIL PROTECTED] wrote:

 Jennifer Garner wrote:
  hi,lists,

 Hello,

  I have a file which is so large,which looking as:
 
  61.156.49.18:28360
  61.183.148.130:27433
  222.90.207.251:25700
  202.117.64.161:25054
  218.58.59.73:24866
  221.233.24.9:22507
  222.187.124.4:21016
  ...
 
  and more than 4500 lines.
 
  the part after : is no use for me,I only need the IP.
 
  for each IP,such as '218.58.59.73', I want to get this result:
 
  218.58.59.  xxx yyy xxx+yyy
 
  I want to know how many IP are in the range of '218.58.59.1' to '
  218.58.59.127',this is 'xxx';
  and how many IP are in the range of '218.58.59.128' to '218.58.59.254
 ',this
  is 'yyy'.
 
  I write this code:
  open (FILE,$file) or die $!;
  while(FILE)
  {
  next if /unknown/o;
  next if /^192\.168\./o;
  chomp;
  my ($ip,$num) = split/:/,$_;
  if ($ip = ~  /^(\d+\.\d+\.\d+\.)(\d+)/o){
  my ($net,$bit) = ($1,$2);
  $total{$net}{low}{$bit} = 1 if $bit  128;
  $total{$net}{high}{$bit} = 1 if $bit =128 and $bit
 
  255;
  $total{$net}{total}{$bit} = 1;
  }
  }
  close FILE;
 
  foreach (sort { scalar keys %{$total{$b}{total}} = scalar keys
  %{$total{$a}{total}} } keys %total)
  {
  print RESULT $_,\t,scalar keys %{$total{$_}{low}},\t,
scalar keys %{$total{$_}{high}},\t,scalar keys
  %{$total{$_}{total}},\n;
  }
 
 
  but it's too slow for me to wait the result.How can I get it more
 effective
  and run less time?thanks.

 This is quite a bit faster then your version:

 open FILE, '', $file or die Cannot open '$file' $!;

 my ( %low, %high, %total );
 while ( FILE ) {
next if /unknown/;
next if /^192\.168\./;

next unless /^(\d+\.\d+\.\d+\.)(\d+)/;

if ( $2  128 ) {
$low{ $1 }++;
}
else {
$high{ $1 }++;
}

$total{ $1 }++;
}

 close FILE;


 for ( sort { $total{ $b } = $total{ $a } } keys %total ) {
print RESULT $_\t, $low{ $_ } || 0, \t, $high{ $_ } || 0, \t,
 $total{ $_ }, \n;
}


 John
 --
 use Perl;
 program
 fulfillment

 --
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]
 http://learn.perl.org/ http://learn.perl.org/first-response





Re: How to promote the efficiency

2005-12-08 Thread Dr.Ruud
Jennifer Garner schreef:

 I have a file which is so large,which looking as:

 61.156.49.18:28360
 222.187.124.4:21016
 and more than 45,000,000 lines.

 the part after : is no use for me, I only need the IP.

You could first convert the file to integers, so from 20 bytes per line
down to 4 bytes per line.

Or use 61.156.49 = 61*256+156*16+49 = 18161 *2 as a direct index into an
array ((255*256+255*16+255+1)*2=139232).

Are you trying to find crowded ranges from these numbers? Like
222.187.0.0/16 etc.

-- 
Affijn, Ruud

Gewoon is een tijger.



-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/ http://learn.perl.org/first-response




Re: How to promote the efficiency

2005-12-08 Thread Bob Showalter

John W. Krahn wrote:

if ( $2  128 ) {
$low{ $1 }++;
}
else {
$high{ $1 }++;
}

$total{ $1 }++;


Why track all three? You could just track (say) low and total, and 
derive high as (total - low) at print time.


--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/ http://learn.perl.org/first-response




Re: How to promote the efficiency

2005-12-08 Thread Jennifer Garner
Hi,John

I think you have  understanded wrongly with my meaning.
The result of  $low{ $1 }++ is no use for me.I just want the frequency of IP
exists.
For example, if there are some IPs exists in '22.33.44.0' :

22.33.44.11
22.33.44.22
22.33.44.22
22.33.44.33
22.33.44.33
22.33.44.44
22.33.44.55

Now I only want  the uniq times of all IP appeared,this is 5.

On 12/8/05, John W. Krahn [EMAIL PROTECTED] wrote:

 Jennifer Garner wrote:
  hi,lists,

 Hello,

  I have a file which is so large,which looking as:
 
  61.156.49.18:28360
  61.183.148.130:27433
  222.90.207.251:25700
  202.117.64.161:25054
  218.58.59.73:24866
  221.233.24.9:22507
  222.187.124.4:21016
  ...
 
  and more than 4500 lines.
 
  the part after : is no use for me,I only need the IP.
 
  for each IP,such as '218.58.59.73', I want to get this result:
 
  218.58.59.  xxx yyy xxx+yyy
 
  I want to know how many IP are in the range of '218.58.59.1' to '
  218.58.59.127',this is 'xxx';
  and how many IP are in the range of '218.58.59.128' to '218.58.59.254
 ',this
  is 'yyy'.
 
  I write this code:
  open (FILE,$file) or die $!;
  while(FILE)
  {
  next if /unknown/o;
  next if /^192\.168\./o;
  chomp;
  my ($ip,$num) = split/:/,$_;
  if ($ip = ~  /^(\d+\.\d+\.\d+\.)(\d+)/o){
  my ($net,$bit) = ($1,$2);
  $total{$net}{low}{$bit} = 1 if $bit  128;
  $total{$net}{high}{$bit} = 1 if $bit =128 and $bit
 
  255;
  $total{$net}{total}{$bit} = 1;
  }
  }
  close FILE;
 
  foreach (sort { scalar keys %{$total{$b}{total}} = scalar keys
  %{$total{$a}{total}} } keys %total)
  {
  print RESULT $_,\t,scalar keys %{$total{$_}{low}},\t,
scalar keys %{$total{$_}{high}},\t,scalar keys
  %{$total{$_}{total}},\n;
  }
 
 
  but it's too slow for me to wait the result.How can I get it more
 effective
  and run less time?thanks.

 This is quite a bit faster then your version:

 open FILE, '', $file or die Cannot open '$file' $!;

 my ( %low, %high, %total );
 while ( FILE ) {
next if /unknown/;
next if /^192\.168\./;

next unless /^(\d+\.\d+\.\d+\.)(\d+)/;

if ( $2  128 ) {
$low{ $1 }++;
}
else {
$high{ $1 }++;
}

$total{ $1 }++;
}

 close FILE;


 for ( sort { $total{ $b } = $total{ $a } } keys %total ) {
print RESULT $_\t, $low{ $_ } || 0, \t, $high{ $_ } || 0, \t,
 $total{ $_ }, \n;
}


 John
 --
 use Perl;
 program
 fulfillment

 --
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]
 http://learn.perl.org/ http://learn.perl.org/first-response





Re: How to promote the efficiency

2005-12-08 Thread Chris Devers
On Fri, 9 Dec 2005, Jennifer Garner wrote:

 I think you have  understanded wrongly with my meaning.
 The result of  $low{ $1 }++ is no use for me.I just want the frequency of IP
 exists.
 For example, if there are some IPs exists in '22.33.44.0' :
 
 22.33.44.11
 22.33.44.22
 22.33.44.22
 22.33.44.33
 22.33.44.33
 22.33.44.44
 22.33.44.55
 
 Now I only want  the uniq times of all IP appeared,this is 5.

So, some kind of structure like

  foreach @ip {
$seen_ip{ $_ }++;
  }

And then work on the keys in the %seen_ip hash.



-- 
Chris Devers

ÒñÙ§’„ÛkN»
-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/ http://learn.perl.org/first-response


Re: How to promote the efficiency

2005-12-08 Thread John W. Krahn
Jennifer Garner wrote:
 Hi,John

Hello,

 I think you have  understanded wrongly with my meaning.
 The result of  $low{ $1 }++ is no use for me.I just want the frequency of IP
 exists.
 For example, if there are some IPs exists in '22.33.44.0' :
 
 22.33.44.11
 22.33.44.22
 22.33.44.22
 22.33.44.33
 22.33.44.33
 22.33.44.44
 22.33.44.55
 
 Now I only want  the uniq times of all IP appeared,this is 5.

Do you mean something like this:

use Socket;

my ( %seen, %total );
while ( FILE ) {
next if /unknown/;
next if /^192\.168\./;

next unless /^(\d+\.\d+\.\d+\.\d+)/;

my $ip = inet_aton $1;

$total{ $ip  \xFF\xFF\xFF\0 }++ unless $seen{ $ip }++;
}

close FILE;

for ( sort { $total{ $b } = $total{ $a } } keys %total ) {
print RESULT inet_ntoa( $_ ), \t, $total{ $_ }, \n;
}




John
-- 
use Perl;
program
fulfillment

-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/ http://learn.perl.org/first-response




Re: How to promote the efficiency

2005-12-08 Thread Jennifer Garner
Now I have resolved this problem,still using perl.
Just a little modification to that code,shown as below:


foreach my $file (@files)
{
open (FILE,$file) or die $!;
while(FILE)
{
next if /unknown/o;
next if /^192\.168\./o;
next unless /^(\d+\.\d+\.\d+\.)(\d+)/o;
if ( $2  128 ) {
$low{$1}{$2}=1;
   }

$total{$1}{$2}=1;

}
close FILE;
}


open (RESULT,,allIP.txt) or die $!;

for ( sort { scalar keys %{$total{$b}} = scalar keys %{$total{$a}} } keys
%total ) {
   print RESULT $_\t, scalar keys %{$low{$_}}, \t,
   (scalar keys %{$total{$_}}) - (scalar keys %{$low{$_}}), \t,scalar
keys %{$total{$_}}, \n;
  }

close RESULT;


Now it run very fast,get the results in 20 minutes.Thanks for all.


On 12/9/05, John W. Krahn [EMAIL PROTECTED] wrote:

 Jennifer Garner wrote:
  Hi,John

 Hello,

  I think you have  understanded wrongly with my meaning.
  The result of  $low{ $1 }++ is no use for me.I just want the frequency
 of IP
  exists.
  For example, if there are some IPs exists in '22.33.44.0' :
 
  22.33.44.11
  22.33.44.22
  22.33.44.22
  22.33.44.33
  22.33.44.33
  22.33.44.44
  22.33.44.55
 
  Now I only want  the uniq times of all IP appeared,this is 5.

 Do you mean something like this:

 use Socket;

 my ( %seen, %total );
 while ( FILE ) {
next if /unknown/;
next if /^192\.168\./;

next unless /^(\d+\.\d+\.\d+\.\d+)/;

my $ip = inet_aton $1;

$total{ $ip  \xFF\xFF\xFF\0 }++ unless $seen{ $ip }++;
}

 close FILE;

 for ( sort { $total{ $b } = $total{ $a } } keys %total ) {
print RESULT inet_ntoa( $_ ), \t, $total{ $_ }, \n;
}




 John
 --
 use Perl;
 program
 fulfillment

 --
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]
 http://learn.perl.org/ http://learn.perl.org/first-response