RE: complex data file parsing

2004-01-23 Thread Hughes, Andrew
Thanks so much.  I've been tinkering around with this all afternoon.  I
think that it is there.  I'm going to mess around with it more over the
weekend.

I'll let you know how it goes.

Thanks so much, Wolf!

Andrew

-Original Message-
From: wolf blaum [mailto:[EMAIL PROTECTED]
Sent: Friday, January 23, 2004 3:30 PM
To: Hughes, Andrew; Perl Beginners Mailing List
Subject: Re: complex data file parsing


Hi, 

> As far as your follow up question on the B lines, "only line with a B in
> the beginning in set?," I'm not sure if I understand.  If you mean that
> there will only be 1 line per order (set of lines A-T) with a B in the
> first position, you are correct.

yes, thats what I meant.
Sorry about my lazyness. Adittionally I get to correct all my embarassing 
typos...

> Also, as far as your assumption, "The way I do it assumes that the first
> and only first line of each set beginns with an A (and falsly buts that A
> at the end of the privious record, but
> doesnt matter for the aim her, does it?),"  I'm not sure what you mean by
> this either.  However, it sounds like you have it correct.  Lines that
> indicate the beginning of an order block, will only ever start with an A
in
> the first position.

Well, what that $/="\nA" does is, it changes the amount of data the while 
() reads into $_
Usually that is a line - in your case, the change of $/ gets it to read a 
whole order into $_: from A, to T,. end of line here. Thats what you

need. However, I cheat: it acctually reads from A,... to T, \nA, into
$_, 
so even the (A,) belongs to the next record, it ends up in the privious one.

Thats kind of wrong, given your record structure but does not matter for the

purpous you described. See the print $_ in the code below.

> Finally, the final assumption, that "The push assumes that there are
always
> exactly 5 records between B and email and that this is the only line with
a
> B in record (and comes before the lines
> with ADV_".  I think that this is correct.  

well good:)

> I tested the script, and I was able to output e-mail addresses.  However,
> using the data that I posted, it does not quite output exactly what I
need.
> Based on this sample of order.csv and the script that you sent me (I added
> the line "print @email" to view the output):
>
>   for (my $i=0; $i<=$#fields; $i++){
>  if ($fields[$i] eq "B") {$b_index=$i; next;}
>  elsif ($fields[$i] =~ /^ADV_.*/) {push @email, $fields[$b_index+4];
> last;}
1> print @email;
>  ):

>
> What is going wrong?  Am I trying to view the output incorrectly?

The line 1 is still in the for loop. So you print all emails seen so far for

every field the split gave you.

Code with more debug in the right place:

---

#! /usr/bin/perl
use strict;
use warnings;

my @email;
open (FH, "){   # read the next record
  print "This record holdes:\n$_ \n"; 

  my @fields = split ",|\n", $_; # split at , or \n
  my $b_index; # 0 for every new record
  for (my $i=0; $i<=$#fields; $i++){
 if ($fields[$i] eq "B") {$b_index=$i; next;}
 elsif ($fields[$i] =~ /^ADV_.*/) {push @email, $fields[$b_index+4]; 
last;}
  } # end for

print "End of record.\n\n"
} # end while

print "@email";  #last line in script

-

On my box that prints the 2 emails you wanted.
I hope I didnt get something totally screwed.

Let me know if that does it or not. Thx, 
Wolf



-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
<http://learn.perl.org/> <http://learn.perl.org/first-response>




Re: complex data file parsing

2004-01-23 Thread wolf blaum
Hi, 

> As far as your follow up question on the B lines, "only line with a B in
> the beginning in set?," I'm not sure if I understand.  If you mean that
> there will only be 1 line per order (set of lines A-T) with a B in the
> first position, you are correct.

yes, thats what I meant.
Sorry about my lazyness. Adittionally I get to correct all my embarassing 
typos...

> Also, as far as your assumption, "The way I do it assumes that the first
> and only first line of each set beginns with an A (and falsly buts that A
> at the end of the privious record, but
> doesnt matter for the aim her, does it?),"  I'm not sure what you mean by
> this either.  However, it sounds like you have it correct.  Lines that
> indicate the beginning of an order block, will only ever start with an A in
> the first position.

Well, what that $/="\nA" does is, it changes the amount of data the while 
() reads into $_
Usually that is a line - in your case, the change of $/ gets it to read a 
whole order into $_: from A, to T,. end of line here. Thats what you 
need. However, I cheat: it acctually reads from A,... to T, \nA, into $_, 
so even the (A,) belongs to the next record, it ends up in the privious one. 
Thats kind of wrong, given your record structure but does not matter for the 
purpous you described. See the print $_ in the code below.

> Finally, the final assumption, that "The push assumes that there are always
> exactly 5 records between B and email and that this is the only line with a
> B in record (and comes before the lines
> with ADV_".  I think that this is correct.  

well good:)

> I tested the script, and I was able to output e-mail addresses.  However,
> using the data that I posted, it does not quite output exactly what I need.
> Based on this sample of order.csv and the script that you sent me (I added
> the line "print @email" to view the output):
>
>   for (my $i=0; $i<=$#fields; $i++){
>  if ($fields[$i] eq "B") {$b_index=$i; next;}
>  elsif ($fields[$i] =~ /^ADV_.*/) {push @email, $fields[$b_index+4];
> last;}
1> print @email;
>  ):

>
> What is going wrong?  Am I trying to view the output incorrectly?

The line 1 is still in the for loop. So you print all emails seen so far for 
every field the split gave you.

Code with more debug in the right place:

---

#! /usr/bin/perl
use strict;
use warnings;

my @email;
open (FH, "){   # read the next record
  print "This record holdes:\n$_ \n"; 

  my @fields = split ",|\n", $_; # split at , or \n
  my $b_index; # 0 for every new record
  for (my $i=0; $i<=$#fields; $i++){
 if ($fields[$i] eq "B") {$b_index=$i; next;}
 elsif ($fields[$i] =~ /^ADV_.*/) {push @email, $fields[$b_index+4]; 
last;}
  } # end for

print "End of record.\n\n"
} # end while

print "@email";  #last line in script

-

On my box that prints the 2 emails you wanted.
I hope I didnt get something totally screwed.

Let me know if that does it or not. Thx, 
Wolf




-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
 




RE: complex data file parsing

2004-01-23 Thread Hughes, Andrew
Thanks for the information.  That was much more than I expected.

You right about the T line.  That was a typo.  The T is in the firth
position of the last line of each order block.

As far as your follow up question on the B lines, "only line with a B in the
beginning in set?," I'm not sure if I understand.  If you mean that there
will only be 1 line per order (set of lines A-T) with a B in the first
position, you are correct.

Also, as far as your assumption, "The way I do it assumes that the first and
only first line of each set beginns with an A (and falsly buts that A at the
end of the privious record, but 
doesnt matter for the aim her, does it?),"  I'm not sure what you mean by
this either.  However, it sounds like you have it correct.  Lines that
indicate the beginning of an order block, will only ever start with an A in
the first position.

Finally, the final assumption, that "The push assumes that there are always
exactly 5 records between B and email and that this is the only line with a
B in record (and comes before the lines 
with ADV_".  I think that this is correct.  An example line is
"B,W29116,test,test,[EMAIL PROTECTED],"  The positions are 0,1,2,3,4, so that
equals 5, and it will ALWAYS be five.  Finally, the B line will ALWAYS come
before the ADV_ lines.  This appears to be correct judging that the output
of the script is e-mail addresses.

I tested the script, and I was able to output e-mail addresses.  However,
using the data that I posted, it does not quite output exactly what I need.
Based on this sample of order.csv and the script that you sent me (I added
the line "print @email" to view the output):

  for (my $i=0; $i<=$#fields; $i++){
 if ($fields[$i] eq "B") {$b_index=$i; next;}
 elsif ($fields[$i] =~ /^ADV_.*/) {push @email, $fields[$b_index+4]; 
last;}
print @email;
 ):

A,W29073,Thu Apr 05 15:25:08 2001
B,W29073,Scott,S,[EMAIL PROTECTED],249 Tah Ave,,Sth San Francisco,CA,~US,5-
P,W29073,
X,W29073,Company Name,A,Department Name,San Francisco 00),Purchase Order
Number,254
S,W29073,UPS Next Day Air,Scott S,2 Tah Ave,,Sth San
Francisco,CA,~US,5-
I,W29073,AVHQ_101090lfbl,6.000,$28.50,$171.001.00,,2,0
I,W29073,AVHQ_101090xlfbl,4.000,$28.50,$114.001.00,,3,0
T,W29073,$285.00$53.09,$338.09,,10.00,
A,W29101,Wed Apr 11 07:43:33 2001
B,W29101,harold,m,[EMAIL PROTECTED],10 wind ridge parkway,,Atlanta,GA,~US,5
P,W29101,
X,W29101,Company Name,,Department Name,,Purchase Order Number,10252
S,W29101,UPS Regular Ground,harold m,10 wind ridge
parkway,,Atlanta,GA,~US,5
I,W29101,ADV_Carb-Natxxl,1.000,$16.50,$16.501.50,,4
T,W29101,$17.50,,7.000,$1.23,$9.28,$28.01,,1.50,
A,W29116,Thu Apr 12 11:42:21 2001
B,W29116,test,test,[EMAIL PROTECTED],test,,test,GA,~US,1
P,W29116,Credit,Offline,Visa,,04/04
X,W29116,Company Name,,Department Name,,Purchase Order Number,
S,W29116,UPS Regular Ground,test test,test,,test,GA,~US,1
I,W29116,ADV_1601,1.000,$14.00,$14.001.50,,3
T,W29116,$14.00,,7.000,$0.98,$9.94,$24.92,,1.50,

I would expect to see:

[EMAIL PROTECTED]@test.com

However, I see:

[EMAIL PROTECTED]@[EMAIL PROTECTED]@[EMAIL PROTECTED]@masnc.n
[EMAIL PROTECTED]@[EMAIL PROTECTED]@[EMAIL PROTECTED]@masnc
[EMAIL PROTECTED]@[EMAIL PROTECTED]@[EMAIL PROTECTED]@mas
[EMAIL PROTECTED]@[EMAIL PROTECTED]@[EMAIL PROTECTED]@m
[EMAIL PROTECTED]@[EMAIL PROTECTED]@[EMAIL PROTECTED]
@[EMAIL PROTECTED]@[EMAIL PROTECTED]@[EMAIL PROTECTED]
[EMAIL PROTECTED]@[EMAIL PROTECTED]@[EMAIL PROTECTED]@masnc.ne
[EMAIL PROTECTED]@masnc.net

What is going wrong?  Am I trying to view the output incorrectly?

Thanks for any additional direction.

Andrew



-Original Message-
From: wolf blaum [mailto:[EMAIL PROTECTED]
Sent: Thursday, January 22, 2004 3:28 PM
To: Hughes, Andrew; Perl Beginners Mailing List
Subject: Re: complex data file parsing


hi, 
> I know that each block always starts with and A in the first position of
> the first line and ends with a T in the last position of the last line.

isnt it a T in the first position of the last row of the set?

> I know that the second line starts with a B, and the data in the 5th space
> on this line is the e-mail address, which is what I ultimately want.
> However,...

only line with a B in the bigining in set?

> I am trying to get a list of email addresses for people who have ordered
> products that begin with ADV.  These can appear in multiple I lines.
> Therefore you can never predict how many lines make up 1 order block.

What about:

#! /usr/bin/perl
use strict;
use warnings;
my @email;

open (FH, "){   # read the next record
  my @fields = split ",|\n", $_;   # split at , or \n
  my $b_index;# 0 for every new record
  for (my $i=0; $i<=$#fields; $i++){
 if ($fields[$i] eq "B") {$b_index=$i; next;}
 elsif ($fields[$i] =~ /

Re: complex data file parsing

2004-01-22 Thread wolf blaum
hi, 
> I know that each block always starts with and A in the first position of
> the first line and ends with a T in the last position of the last line.

isnt it a T in the first position of the last row of the set?

> I know that the second line starts with a B, and the data in the 5th space
> on this line is the e-mail address, which is what I ultimately want.
> However,...

only line with a B in the bigining in set?

> I am trying to get a list of email addresses for people who have ordered
> products that begin with ADV.  These can appear in multiple I lines.
> Therefore you can never predict how many lines make up 1 order block.

What about:

#! /usr/bin/perl
use strict;
use warnings;
my @email;

open (FH, "){   # read the next record
  my @fields = split ",|\n", $_;   # split at , or \n
  my $b_index;# 0 for every new record
  for (my $i=0; $i<=$#fields; $i++){
 if ($fields[$i] eq "B") {$b_index=$i; next;}
 elsif ($fields[$i] =~ /^ADV_.*/) {push @email, $fields[$b_index+4]; 
last;}
  }
}

works on the sample you provided.

$/ (see perlvar) is the record seperator, usually \n.

If really T would be the last char i the last row of the set, you could use "T
\n" as $/
The way I do it assumes that the first and only first line of each set beginns 
with an A (and falsly buts that A at the end of the privious record, but 
doesnt matter for the aim her, does it?)


The push assumes that there are always exactly 5 records between B and email 
and that this is the only line with a B in record (and comes before the lines 
with ADV_

lot of assumtions.

Im sure there is better ways to do that - might be a strat, though.

> "Online ordering is now available. Visit http://insidersadvantage.com for
> details."

Uh, given from your question, I better dont,, eh?

Good luck, Wolf


-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]