Re: trying to understand how regex works
On 13/08/2002 06:26:59 perl-win32-users-admin wrote: Hi all, I guess it must be a simple problem, but it's a mystery to me. [snip question involving regex] Anybody cares to explain this to me? Try running your script with perl -re=debug scriptname.pl 2re_debug Make sure you redirect stderr to a file, as there's plenty of it. -- Csaba Ráduly, Software Engineer Sophos Anti-Virus email: [EMAIL PROTECTED]http://www.sophos.com US Support: +1 888 SOPHOS 9 UK Support: +44 1235 559933 ___ Perl-Win32-Users mailing list [EMAIL PROTECTED] To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
RE: trying to understand how regex works
I'd add the check for the garbage before I split, not sure if it would really add any time to the program running but would, I think, reduce the amount of checking needed after the split function. next if(/value_garbage/g); # assuming value_garbage is the exact string. or you can use: while FILE { p = N; my @f = split /\s*\|\s*/, $_ unless(m/value_garbage/g); if (@f != 30) { #^^ print Field count is , scalar @f, should be 30\n; # error processing ... } if ($f[1] =~ / ... ... This is again assuming that value_garbage is a string...if not, then well, if, elsif away :) But I would absolutely use the split function Joe Youngquist -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]On Behalf Of $Bill Luebkert Sent: Tuesday, August 13, 2002 12:39 AM To: Dan Jablonsky Cc: [EMAIL PROTECTED] Subject: Re: trying to understand how regex works Dan Jablonsky wrote: Hi all, I guess it must be a simple problem, but it's a mystery to me. I got 30 fields all separated by pipes in some files with many many lines. Some of the fields need to be changed, but mostly I have to drop any line that has certain values in certain fields. So I start by skipping any field that has garbage in it: open FOUT, /some/path/outputfile.txt; open FILE /some/path/inputfile.txt; whileFILE{ p=N; next if (/.*?\|value_garbage1\|.*?/ || /.*?\|value_garbage2\|.*?/ || /.*?\|value_garbage3\|.*?/); #and then I continue with an if if(/(.*?)\|(.*?)\|30 times/){ $p=Y; do something to $1; #change field 1 do something to $3; #change filed 3 $fld1=$newfld1; $fld2=$2; $fld3=$newfld3; $fld4=$4;and so on } print FOUT $fld1|$fld2|...|$fld30|\n if ($p=Y); #print the whole thing to the new output } Well, it happens that some of the lines are completely out of whack and the regex simply stops there - it doesn't exit, no errors but goes into an infinite loop even though I don't know how exactly is this possible. My second if states clearly (or not so clearly) that if the line does not have 30 fields it should skip the block, it should NOT print anything at the handle and should get the next line. For whatever reason, the first time it encounters a line with less that 30 fields, it just loops without end. I tried to solve this by replacing the .*? in the references by the actual format of each field and suddenly it started working but now the regex is a hundred times slower and the only thing that speeds it up is to go back to the .*? that really goes fast as long as the regex is true. I mean if I have 30 fields all the time, the regex works OK and it goes very fast. Anybody cares to explain this to me? No, but I'll offer an alternative. while FILE { p = N; my @f = split /\s*\|\s*/, $_; if (@f != 30) { print Field count is , scalar @f, should be 30\n; # error processing ... } if ($f[1] =~ / ... ... -- ,-/- __ _ _ $Bill Luebkert ICQ=162126130 (_/ / )// // DBE Collectibles Mailto:[EMAIL PROTECTED] / ) /-- o // // http://dbecoll.tripod.com/ (Free site for Perl) -/-' /___/__/_/_ Castle of Medieval Myth Magic http://www.todbe.com/ ___ Perl-Win32-Users mailing list [EMAIL PROTECTED] To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs ___ Perl-Win32-Users mailing list [EMAIL PROTECTED] To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
Re: trying to understand how regex works
Ron Grabowski [EMAIL PROTECTED] wrote: my $regex = join '|', 'value_garbage1', 'value_garbage2', 'value_garbage3'; next if /$regex/; You might want to say next if /$regex/o to prevent Perl from compiling every time. If you're Perl 5.6, you could even make use of the sexy new qr {} operator, which returns a reference to a compiled regular expression: my $regex = join '|', ... my $re = qr{$regex}; next if /$re/; Tom Wyant ___ Perl-Win32-Users mailing list [EMAIL PROTECTED] To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
trying to understand how regex works
Hi all, I guess it must be a simple problem, but it's a mystery to me. I got 30 fields all separated by pipes in some files with many many lines. Some of the fields need to be changed, but mostly I have to drop any line that has certain values in certain fields. So I start by skipping any field that has garbage in it: open FOUT, /some/path/outputfile.txt; open FILE /some/path/inputfile.txt; whileFILE{ p=N; next if (/.*?\|value_garbage1\|.*?/ || /.*?\|value_garbage2\|.*?/ || /.*?\|value_garbage3\|.*?/); #and then I continue with an if if(/(.*?)\|(.*?)\|30 times/){ $p=Y; do something to $1; #change field 1 do something to $3; #change filed 3 $fld1=$newfld1; $fld2=$2; $fld3=$newfld3; $fld4=$4;and so on } print FOUT $fld1|$fld2|...|$fld30|\n if ($p=Y); #print the whole thing to the new output } Well, it happens that some of the lines are completely out of whack and the regex simply stops there - it doesn't exit, no errors but goes into an infinite loop even though I don't know how exactly is this possible. My second if states clearly (or not so clearly) that if the line does not have 30 fields it should skip the block, it should NOT print anything at the handle and should get the next line. For whatever reason, the first time it encounters a line with less that 30 fields, it just loops without end. I tried to solve this by replacing the .*? in the references by the actual format of each field and suddenly it started working but now the regex is a hundred times slower and the only thing that speeds it up is to go back to the .*? that really goes fast as long as the regex is true. I mean if I have 30 fields all the time, the regex works OK and it goes very fast. Anybody cares to explain this to me? Thanks, Dan __ Do You Yahoo!? HotJobs - Search Thousands of New Jobs http://www.hotjobs.com ___ Perl-Win32-Users mailing list [EMAIL PROTECTED] To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
Re: trying to understand how regex works
Dan Jablonsky wrote: Hi all, I guess it must be a simple problem, but it's a mystery to me. I got 30 fields all separated by pipes in some files with many many lines. Some of the fields need to be changed, but mostly I have to drop any line that has certain values in certain fields. So I start by skipping any field that has garbage in it: open FOUT, /some/path/outputfile.txt; open FILE /some/path/inputfile.txt; whileFILE{ p=N; next if (/.*?\|value_garbage1\|.*?/ || /.*?\|value_garbage2\|.*?/ || /.*?\|value_garbage3\|.*?/); #and then I continue with an if if(/(.*?)\|(.*?)\|30 times/){ $p=Y; do something to $1; #change field 1 do something to $3; #change filed 3 $fld1=$newfld1; $fld2=$2; $fld3=$newfld3; $fld4=$4;and so on } print FOUT $fld1|$fld2|...|$fld30|\n if ($p=Y); #print the whole thing to the new output } Well, it happens that some of the lines are completely out of whack and the regex simply stops there - it doesn't exit, no errors but goes into an infinite loop even though I don't know how exactly is this possible. My second if states clearly (or not so clearly) that if the line does not have 30 fields it should skip the block, it should NOT print anything at the handle and should get the next line. For whatever reason, the first time it encounters a line with less that 30 fields, it just loops without end. I tried to solve this by replacing the .*? in the references by the actual format of each field and suddenly it started working but now the regex is a hundred times slower and the only thing that speeds it up is to go back to the .*? that really goes fast as long as the regex is true. I mean if I have 30 fields all the time, the regex works OK and it goes very fast. Anybody cares to explain this to me? No, but I'll offer an alternative. while FILE { p = N; my @f = split /\s*\|\s*/, $_; if (@f != 30) { print Field count is , scalar @f, should be 30\n; # error processing ... } if ($f[1] =~ / ... ... -- ,-/- __ _ _ $Bill Luebkert ICQ=162126130 (_/ / )// // DBE Collectibles Mailto:[EMAIL PROTECTED] / ) /-- o // // http://dbecoll.tripod.com/ (Free site for Perl) -/-' /___/__/_/_ Castle of Medieval Myth Magic http://www.todbe.com/ ___ Perl-Win32-Users mailing list [EMAIL PROTECTED] To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
Re: trying to understand how regex works
open FOUT, /some/path/outputfile.txt; open FILE /some/path/inputfile.txt; open(FOUT, /some/path/outputfile.txt) or die(Error: $!); open(FILE /some/path/inputfile.txt) or die(Error: $!); whileFILE{ p=N; next if (/.*?\|value_garbage1\|.*?/ || /.*?\|value_garbage2\|.*?/ || /.*?\|value_garbage3\|.*?/); my $regex = join '|', 'value_garbage1', 'value_garbage2', 'value_garbage3'; next if /$regex/; if(/(.*?)\|(.*?)\|30 times/){ $p=Y; do something to $1; #change field 1 do something to $3; #change filed 3 $fld1=$newfld1; $fld2=$2; $fld3=$newfld3; $fld4=$4;and so on } print FOUT $fld1|$fld2|...|$fld30|\n if ($p=Y); If you put the print inside of the if(), you don't need $p. Look into the join() function: print FOUT join '|', $fld1, $fld2, $fld3; print FOUT join '|', @array; ___ Perl-Win32-Users mailing list [EMAIL PROTECTED] To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs