Re: trying to understand how regex works

2002-08-13 Thread Thomas R Wyant_III


Ron Grabowski <[EMAIL PROTECTED]> wrote:

> my $regex = join '|', 'value_garbage1',
>   'value_garbage2',
>   'value_garbage3';

> next if /$regex/;

You might want to say "next if /$regex/o" to prevent Perl from compiling
every time. If you're Perl 5.6, you could even make use of the sexy new qr
{} operator, which returns a reference to a compiled regular expression:

my $regex = join '|', ...
my $re = qr{$regex};
next if /$re/;

Tom Wyant

___
Perl-Win32-Users mailing list
[EMAIL PROTECTED]
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs



RE: trying to understand how regex works

2002-08-13 Thread Joseph Youngquist

I'd add the check for the garbage before I split, not sure if it would
really add any time to the program running but would, I think, reduce the
amount of checking needed after the split function.

next if(/value_garbage/g);  # assuming value_garbage is the exact string.

or you can use:

while  {
p = "N";
my @f = split /\s*\|\s*/, $_ unless(m/value_garbage/g);
if (@f != 30) { #^^
print "Field count is ", scalar @f, " should be 30\n";
# error processing ...
}
if ($f[1] =~ /   ...
...

This is again assuming that value_garbage is a string...if not, then well,
"if, elsif" away :)
But I would absolutely use the split function

Joe Youngquist

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED]]On Behalf Of
$Bill Luebkert
Sent: Tuesday, August 13, 2002 12:39 AM
To: Dan Jablonsky
Cc: [EMAIL PROTECTED]
Subject: Re: trying to understand how regex works


Dan Jablonsky wrote:
> Hi all,
> I guess it must be a simple problem, but it's a
> mystery to me.
> I got 30 "fields" all separated by pipes in some files
> with many many lines. Some of the fields need to be
> changed, but mostly I have to drop any line that has
> certain values in certain fields.
> So I start by skipping any field that has garbage in
> it:
> open FOUT, ">>/some/path/outputfile.txt";
> open FILE " while{
> p="N";
> next if (/.*?\|value_garbage1\|.*?/ ||
> /.*?\|value_garbage2\|.*?/ ||
> /.*?\|value_garbage3\|.*?/);
>   #and then I continue with an if
>   if(/(.*?)\|(.*?)\|30 times/){
>   $p="Y";
>   do something to $1; #change field 1
>   do something to $3; #change filed 3
>   $fld1=$newfld1;
>   $fld2=$2;
>   $fld3=$newfld3;
>   $fld4=$4;and so on
>   }
>   print FOUT "$fld1|$fld2|...|$fld30|\n" if ($p="Y");
> #print the whole thing to the new output
> }
>
> Well, it happens that some of the lines are completely
> out of whack and the regex simply stops there - it
> doesn't exit, no errors but goes into an infinite loop
> even though I don't know how exactly is this possible.
> My second if states clearly (or not so clearly) that
> if the line does not have 30 fields it should skip the
> block, it should NOT print anything at the handle and
> should get the next line.
> For whatever reason, the first time it encounters a
> line with less that 30 fields, it just loops without
> end.
> I tried to solve this by replacing the .*? in the
> references by the actual format of each field and
> suddenly it started working but now the regex is a
> hundred times slower and the only thing that speeds it
> up is to go back to the .*? that really goes fast as
> long as the regex "is true". I mean if I have 30
> fields all the time, the regex works OK and it goes
> very fast.
>
> Anybody cares to explain this to me?

No, but I'll offer an alternative.

while  {
p = "N";
my @f = split /\s*\|\s*/, $_;
if (@f != 30) {
print "Field count is ", scalar @f, " should be 30\n";
# error processing ...
}
if ($f[1] =~ /   ...
...

--
   ,-/-  __  _  _ $Bill Luebkert   ICQ=162126130
  (_/   /  )// //   DBE Collectibles   Mailto:[EMAIL PROTECTED]
   / ) /--<  o // //  http://dbecoll.tripod.com/ (Free site for Perl)
-/-' /___/_<_http://www.todbe.com/

___
Perl-Win32-Users mailing list
[EMAIL PROTECTED]
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs

___
Perl-Win32-Users mailing list
[EMAIL PROTECTED]
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs



Re: trying to understand how regex works

2002-08-13 Thread csaba . raduly


On 13/08/2002 06:26:59 perl-win32-users-admin wrote:

>Hi all,
>I guess it must be a simple problem, but it's a
>mystery to me.
[snip question involving regex]
>
>Anybody cares to explain this to me?

Try running your script with

perl -re=debug scriptname.pl 2>re_debug

Make sure you redirect stderr to a file, as there's plenty of it.

--
Csaba Ráduly, Software Engineer   Sophos Anti-Virus
email: [EMAIL PROTECTED]http://www.sophos.com
US Support: +1 888 SOPHOS 9 UK Support: +44 1235 559933

___
Perl-Win32-Users mailing list
[EMAIL PROTECTED]
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs



Re: trying to understand how regex works

2002-08-12 Thread Ron Grabowski

> open FOUT, ">>/some/path/outputfile.txt";
> open FILE ">/some/path/outputfile.txt") or
 die("Error: $!");

open(FILE " while{
> p="N";
> next if (/.*?\|value_garbage1\|.*?/ ||
> /.*?\|value_garbage2\|.*?/ ||
> /.*?\|value_garbage3\|.*?/);

 my $regex = join '|', 'value_garbage1',
   'value_garbage2',
   'value_garbage3';

 next if /$regex/;

> if(/(.*?)\|(.*?)\|30 times/){
> $p="Y";
> do something to $1; #change field 1
> do something to $3; #change filed 3
> $fld1=$newfld1;
> $fld2=$2;
> $fld3=$newfld3;
> $fld4=$4;and so on
> }
> print FOUT "$fld1|$fld2|...|$fld30|\n" if ($p="Y");

If you put the print inside of the if(), you don't need $p. Look into
the join() function:

 print FOUT join '|', $fld1, $fld2, $fld3;
 print FOUT join '|', @array;
___
Perl-Win32-Users mailing list
[EMAIL PROTECTED]
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs



Re: trying to understand how regex works

2002-08-12 Thread $Bill Luebkert

Dan Jablonsky wrote:
> Hi all,
> I guess it must be a simple problem, but it's a
> mystery to me.
> I got 30 "fields" all separated by pipes in some files
> with many many lines. Some of the fields need to be
> changed, but mostly I have to drop any line that has
> certain values in certain fields.
> So I start by skipping any field that has garbage in
> it:
> open FOUT, ">>/some/path/outputfile.txt";
> open FILE " while{
> p="N";
> next if (/.*?\|value_garbage1\|.*?/ ||
> /.*?\|value_garbage2\|.*?/ ||
> /.*?\|value_garbage3\|.*?/);
>   #and then I continue with an if
>   if(/(.*?)\|(.*?)\|30 times/){
>   $p="Y";
>   do something to $1; #change field 1
>   do something to $3; #change filed 3
>   $fld1=$newfld1;
>   $fld2=$2;
>   $fld3=$newfld3;
>   $fld4=$4;and so on
>   }
>   print FOUT "$fld1|$fld2|...|$fld30|\n" if ($p="Y");
> #print the whole thing to the new output 
> }
> 
> Well, it happens that some of the lines are completely
> out of whack and the regex simply stops there - it
> doesn't exit, no errors but goes into an infinite loop
> even though I don't know how exactly is this possible.
> My second if states clearly (or not so clearly) that
> if the line does not have 30 fields it should skip the
> block, it should NOT print anything at the handle and
> should get the next line.
> For whatever reason, the first time it encounters a
> line with less that 30 fields, it just loops without
> end.
> I tried to solve this by replacing the .*? in the
> references by the actual format of each field and
> suddenly it started working but now the regex is a
> hundred times slower and the only thing that speeds it
> up is to go back to the .*? that really goes fast as
> long as the regex "is true". I mean if I have 30
> fields all the time, the regex works OK and it goes
> very fast.
> 
> Anybody cares to explain this to me?

No, but I'll offer an alternative.

while  {
p = "N";
my @f = split /\s*\|\s*/, $_;
if (@f != 30) {
print "Field count is ", scalar @f, " should be 30\n";
# error processing ...
}
if ($f[1] =~ /   ...
...

-- 
   ,-/-  __  _  _ $Bill Luebkert   ICQ=162126130
  (_/   /  )// //   DBE Collectibles   Mailto:[EMAIL PROTECTED]
   / ) /--<  o // //  http://dbecoll.tripod.com/ (Free site for Perl)
-/-' /___/_<_http://www.todbe.com/

___
Perl-Win32-Users mailing list
[EMAIL PROTECTED]
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs