Re: trying to understand how regex works

2002-08-13 Thread csaba . raduly


On 13/08/2002 06:26:59 perl-win32-users-admin wrote:

Hi all,
I guess it must be a simple problem, but it's a
mystery to me.
[snip question involving regex]

Anybody cares to explain this to me?

Try running your script with

perl -re=debug scriptname.pl 2re_debug

Make sure you redirect stderr to a file, as there's plenty of it.

--
Csaba Ráduly, Software Engineer   Sophos Anti-Virus
email: [EMAIL PROTECTED]http://www.sophos.com
US Support: +1 888 SOPHOS 9 UK Support: +44 1235 559933

___
Perl-Win32-Users mailing list
[EMAIL PROTECTED]
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs



RE: trying to understand how regex works

2002-08-13 Thread Joseph Youngquist

I'd add the check for the garbage before I split, not sure if it would
really add any time to the program running but would, I think, reduce the
amount of checking needed after the split function.

next if(/value_garbage/g);  # assuming value_garbage is the exact string.

or you can use:

while FILE {
p = N;
my @f = split /\s*\|\s*/, $_ unless(m/value_garbage/g);
if (@f != 30) { #^^
print Field count is , scalar @f,  should be 30\n;
# error processing ...
}
if ($f[1] =~ /   ...
...

This is again assuming that value_garbage is a string...if not, then well,
if, elsif away :)
But I would absolutely use the split function

Joe Youngquist

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED]]On Behalf Of
$Bill Luebkert
Sent: Tuesday, August 13, 2002 12:39 AM
To: Dan Jablonsky
Cc: [EMAIL PROTECTED]
Subject: Re: trying to understand how regex works


Dan Jablonsky wrote:
 Hi all,
 I guess it must be a simple problem, but it's a
 mystery to me.
 I got 30 fields all separated by pipes in some files
 with many many lines. Some of the fields need to be
 changed, but mostly I have to drop any line that has
 certain values in certain fields.
 So I start by skipping any field that has garbage in
 it:
 open FOUT, /some/path/outputfile.txt;
 open FILE /some/path/inputfile.txt;
 whileFILE{
 p=N;
 next if (/.*?\|value_garbage1\|.*?/ ||
 /.*?\|value_garbage2\|.*?/ ||
 /.*?\|value_garbage3\|.*?/);
   #and then I continue with an if
   if(/(.*?)\|(.*?)\|30 times/){
   $p=Y;
   do something to $1; #change field 1
   do something to $3; #change filed 3
   $fld1=$newfld1;
   $fld2=$2;
   $fld3=$newfld3;
   $fld4=$4;and so on
   }
   print FOUT $fld1|$fld2|...|$fld30|\n if ($p=Y);
 #print the whole thing to the new output
 }

 Well, it happens that some of the lines are completely
 out of whack and the regex simply stops there - it
 doesn't exit, no errors but goes into an infinite loop
 even though I don't know how exactly is this possible.
 My second if states clearly (or not so clearly) that
 if the line does not have 30 fields it should skip the
 block, it should NOT print anything at the handle and
 should get the next line.
 For whatever reason, the first time it encounters a
 line with less that 30 fields, it just loops without
 end.
 I tried to solve this by replacing the .*? in the
 references by the actual format of each field and
 suddenly it started working but now the regex is a
 hundred times slower and the only thing that speeds it
 up is to go back to the .*? that really goes fast as
 long as the regex is true. I mean if I have 30
 fields all the time, the regex works OK and it goes
 very fast.

 Anybody cares to explain this to me?

No, but I'll offer an alternative.

while FILE {
p = N;
my @f = split /\s*\|\s*/, $_;
if (@f != 30) {
print Field count is , scalar @f,  should be 30\n;
# error processing ...
}
if ($f[1] =~ /   ...
...

--
   ,-/-  __  _  _ $Bill Luebkert   ICQ=162126130
  (_/   /  )// //   DBE Collectibles   Mailto:[EMAIL PROTECTED]
   / ) /--  o // //  http://dbecoll.tripod.com/ (Free site for Perl)
-/-' /___/__/_/_ Castle of Medieval Myth  Magic
http://www.todbe.com/

___
Perl-Win32-Users mailing list
[EMAIL PROTECTED]
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs

___
Perl-Win32-Users mailing list
[EMAIL PROTECTED]
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs



Re: trying to understand how regex works

2002-08-13 Thread Thomas R Wyant_III


Ron Grabowski [EMAIL PROTECTED] wrote:

 my $regex = join '|', 'value_garbage1',
   'value_garbage2',
   'value_garbage3';

 next if /$regex/;

You might want to say next if /$regex/o to prevent Perl from compiling
every time. If you're Perl 5.6, you could even make use of the sexy new qr
{} operator, which returns a reference to a compiled regular expression:

my $regex = join '|', ...
my $re = qr{$regex};
next if /$re/;

Tom Wyant

___
Perl-Win32-Users mailing list
[EMAIL PROTECTED]
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs



trying to understand how regex works

2002-08-12 Thread Dan Jablonsky

Hi all,
I guess it must be a simple problem, but it's a
mystery to me.
I got 30 fields all separated by pipes in some files
with many many lines. Some of the fields need to be
changed, but mostly I have to drop any line that has
certain values in certain fields.
So I start by skipping any field that has garbage in
it:
open FOUT, /some/path/outputfile.txt;
open FILE /some/path/inputfile.txt;
whileFILE{
p=N;
next if (/.*?\|value_garbage1\|.*?/ ||
/.*?\|value_garbage2\|.*?/ ||
/.*?\|value_garbage3\|.*?/);
#and then I continue with an if
if(/(.*?)\|(.*?)\|30 times/){
$p=Y;
do something to $1; #change field 1
do something to $3; #change filed 3
$fld1=$newfld1;
$fld2=$2;
$fld3=$newfld3;
$fld4=$4;and so on
}
print FOUT $fld1|$fld2|...|$fld30|\n if ($p=Y);
#print the whole thing to the new output 
}

Well, it happens that some of the lines are completely
out of whack and the regex simply stops there - it
doesn't exit, no errors but goes into an infinite loop
even though I don't know how exactly is this possible.
My second if states clearly (or not so clearly) that
if the line does not have 30 fields it should skip the
block, it should NOT print anything at the handle and
should get the next line.
For whatever reason, the first time it encounters a
line with less that 30 fields, it just loops without
end.
I tried to solve this by replacing the .*? in the
references by the actual format of each field and
suddenly it started working but now the regex is a
hundred times slower and the only thing that speeds it
up is to go back to the .*? that really goes fast as
long as the regex is true. I mean if I have 30
fields all the time, the regex works OK and it goes
very fast.

Anybody cares to explain this to me?
Thanks,
Dan 


__
Do You Yahoo!?
HotJobs - Search Thousands of New Jobs
http://www.hotjobs.com
___
Perl-Win32-Users mailing list
[EMAIL PROTECTED]
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs



Re: trying to understand how regex works

2002-08-12 Thread $Bill Luebkert

Dan Jablonsky wrote:
 Hi all,
 I guess it must be a simple problem, but it's a
 mystery to me.
 I got 30 fields all separated by pipes in some files
 with many many lines. Some of the fields need to be
 changed, but mostly I have to drop any line that has
 certain values in certain fields.
 So I start by skipping any field that has garbage in
 it:
 open FOUT, /some/path/outputfile.txt;
 open FILE /some/path/inputfile.txt;
 whileFILE{
 p=N;
 next if (/.*?\|value_garbage1\|.*?/ ||
 /.*?\|value_garbage2\|.*?/ ||
 /.*?\|value_garbage3\|.*?/);
   #and then I continue with an if
   if(/(.*?)\|(.*?)\|30 times/){
   $p=Y;
   do something to $1; #change field 1
   do something to $3; #change filed 3
   $fld1=$newfld1;
   $fld2=$2;
   $fld3=$newfld3;
   $fld4=$4;and so on
   }
   print FOUT $fld1|$fld2|...|$fld30|\n if ($p=Y);
 #print the whole thing to the new output 
 }
 
 Well, it happens that some of the lines are completely
 out of whack and the regex simply stops there - it
 doesn't exit, no errors but goes into an infinite loop
 even though I don't know how exactly is this possible.
 My second if states clearly (or not so clearly) that
 if the line does not have 30 fields it should skip the
 block, it should NOT print anything at the handle and
 should get the next line.
 For whatever reason, the first time it encounters a
 line with less that 30 fields, it just loops without
 end.
 I tried to solve this by replacing the .*? in the
 references by the actual format of each field and
 suddenly it started working but now the regex is a
 hundred times slower and the only thing that speeds it
 up is to go back to the .*? that really goes fast as
 long as the regex is true. I mean if I have 30
 fields all the time, the regex works OK and it goes
 very fast.
 
 Anybody cares to explain this to me?

No, but I'll offer an alternative.

while FILE {
p = N;
my @f = split /\s*\|\s*/, $_;
if (@f != 30) {
print Field count is , scalar @f,  should be 30\n;
# error processing ...
}
if ($f[1] =~ /   ...
...

-- 
   ,-/-  __  _  _ $Bill Luebkert   ICQ=162126130
  (_/   /  )// //   DBE Collectibles   Mailto:[EMAIL PROTECTED]
   / ) /--  o // //  http://dbecoll.tripod.com/ (Free site for Perl)
-/-' /___/__/_/_ Castle of Medieval Myth  Magic http://www.todbe.com/

___
Perl-Win32-Users mailing list
[EMAIL PROTECTED]
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs



Re: trying to understand how regex works

2002-08-12 Thread Ron Grabowski

 open FOUT, /some/path/outputfile.txt;
 open FILE /some/path/inputfile.txt;

open(FOUT, /some/path/outputfile.txt) or
 die(Error: $!);

open(FILE /some/path/inputfile.txt) or
 die(Error: $!);

 whileFILE{
 p=N;
 next if (/.*?\|value_garbage1\|.*?/ ||
 /.*?\|value_garbage2\|.*?/ ||
 /.*?\|value_garbage3\|.*?/);

 my $regex = join '|', 'value_garbage1',
   'value_garbage2',
   'value_garbage3';

 next if /$regex/;

 if(/(.*?)\|(.*?)\|30 times/){
 $p=Y;
 do something to $1; #change field 1
 do something to $3; #change filed 3
 $fld1=$newfld1;
 $fld2=$2;
 $fld3=$newfld3;
 $fld4=$4;and so on
 }
 print FOUT $fld1|$fld2|...|$fld30|\n if ($p=Y);

If you put the print inside of the if(), you don't need $p. Look into
the join() function:

 print FOUT join '|', $fld1, $fld2, $fld3;
 print FOUT join '|', @array;
___
Perl-Win32-Users mailing list
[EMAIL PROTECTED]
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs