RE: regexp question
Hello all, Thank you for the time. I guess that this foray into lookaround assertions was bust. ;0) I am using Parse::RecDescent and I realized that I could use the order of Rules to ensure that if I *wasn't* looking at either a \\n or a \ then I would have a bare \ that a simple Production could deal with. Thanks (Mr $) to all! chahn -Original Message- From: $Bill Luebkert [mailto:[EMAIL PROTECTED] Sent: Tuesday, December 14, 2004 7:37 PM To: Christopher Hahn Cc: [EMAIL PROTECTED] Subject: Re: regexp question Christopher Hahn wrote: $Bill, The (?: ) construct may be non-capturing, but it does eat text from the buffer (sic) ...and, besides, when I ran it I saw this: = 1: asd asdf adf asd \n asd adf 2: asd asdf adf asd \n asd adf 3: asd asdf adf asd asd adf = where I need to see something like this: = 1: asd asdf adf asd 2: asd asdf adf asd \n asd adf 3: asd asdf adf asd asd adf 4: asd asdf adf asd asd adf ad dasf \n dsaf = i.e. \n should pass through, where \\n or \ should not. Does a non-greedy match help (also - I was missing an escape or two on the \\n): foreach ( '1: asd asdf adf asd n asd adf \ ad dasf dsaf ', '2: asd asdf adf asd \n asd adf \ ad dasf dsaf ', '3: asd asdf adf asd asd adf \ ad dasf n dsaf ', '4: asd asdf adf asd asd adf ad dasf \n dsaf ', ) { if (/^(.*?)(?:n|\\)/) { print $1\n; } } What about trying something like: $strval =~ m/^(( [^\\] | \\ (?! \\ (?= n)) | \\ (?! \) )*)/x; which (or so I think ;0) collects as many characters from the beginning of $str that meet these conditions (bs = backslash): = 1) not a bs or 2) a bs that is *not* followed by a ( bs that *is* followed by a n ) or 3) a bs that is *not* followed by a = -- ,-/- __ _ _ $Bill LuebkertMailto:[EMAIL PROTECTED] (_/ / )// // DBE CollectiblesMailto:[EMAIL PROTECTED] / ) /-- o // // Castle of Medieval Myth Magic http://www.todbe.com/ -/-' /___/__/_/_http://dbecoll.tripod.com/ (My Perl/Lakers stuff) ___ Perl-Win32-Users mailing list [EMAIL PROTECTED] To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
Re: regexp question
I seem to be missing a piece of the puzzleI want to define a character class ([]) with atoms to (not) match involve negative look-ahead assertions.but no joy. I am reading a stream of text that may contain the two segments \\n and \" I want to define a regexp that will match up to the first of either of these. I would suggest a slightly different approach, especially since you apparently might not have either sequence. Try if ($str =~ /n|\\"/) { $mytext = substr($str, 0, $-[0]); # do whatever with $mytext } The arrays @- and @+ contain the starting and ending offsets within a string of the portion successfully matched by a regular _expression_; the 0th elements refer to the whole regex, and other elements to the portions captured by parentheses. See perlvar in the docs and the description of substr under perlfunc. --Eric AmickColumbia, MD ___ Perl-Win32-Users mailing list [EMAIL PROTECTED] To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
regexp question
Hey, I seem to be missing a piece of the puzzleI want to define a character class ([]) with atoms to (not) match involve negative look-ahead assertions.but no joy. I am reading a stream of text that may contain the two segments \\n and \ I want to define a regexp that will match up to the first of either of these. ...ie. something like ([^]*) where the character class is just the two sequences above. ...but they are not characters at all, but strings, and so I wonder how to approach this. Question: how best to do something to set $1 == every character in the string up to and not including the first of either a \\n or a \ That is all. (something like $strval =~ m/ (.* (?! \\ (?= \ | \\ (?= n) ) ) )/x;) I am going to use this regexp in a Parse::RecDescent Production, and have other Rules to deal with the \\n and \ strings. I am banging on this and will report when something good comes out of it, but please do chime in with any best practices that suggest themselves to you. TIA! Christopher -- Realisant mon espoir, je me lance vers la gloire Christopher Kenneth Hahn -- [EMAIL PROTECTED] ___ Perl-Win32-Users mailing list [EMAIL PROTECTED] To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
Re: regexp question
Christopher Hahn wrote: Hey, I seem to be missing a piece of the puzzleI want to define a character class ([]) with atoms to (not) match involve negative look-ahead assertions.but no joy. I am reading a stream of text that may contain the two segments \\n and \ I want to define a regexp that will match up to the first of either of these. ...ie. something like ([^]*) where the character class is just the two sequences above. ...but they are not characters at all, but strings, and so I wonder how to approach this. Question: how best to do something to set $1 == every character in the string up to and not including the first of either a \\n or a \ That is all. (something like $strval =~ m/ (.* (?! \\ (?= \ | \\ (?= n) ) ) )/x;) I am going to use this regexp in a Parse::RecDescent Production, and have other Rules to deal with the \\n and \ strings. I am banging on this and will report when something good comes out of it, but please do chime in with any best practices that suggest themselves to you. I assume that \\n is actually 3 characters and not a newline. It should be as simple as : use strict; foreach ( '1: asd asdf adf asd \\n asd adf \ ad dasf dsaf ', '2: asd asdf adf asd \n asd adf \ ad dasf dsaf ', '3: asd asdf adf asd asd adf \ ad dasf \\n dsaf ', '4: asd asdf adf asd asd adf ad dasf \n dsaf ', ) { if (/^(.*)(?:n|\\)/) { print $1\n; } } __END__ -- ,-/- __ _ _ $Bill LuebkertMailto:[EMAIL PROTECTED] (_/ / )// // DBE CollectiblesMailto:[EMAIL PROTECTED] / ) /-- o // // Castle of Medieval Myth Magic http://www.todbe.com/ -/-' /___/__/_/_http://dbecoll.tripod.com/ (My Perl/Lakers stuff) ___ Perl-Win32-Users mailing list [EMAIL PROTECTED] To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
RE: regexp question
$Bill, The (?: ) construct may be non-capturing, but it does eat text from the buffer (sic) ...and, besides, when I ran it I saw this: = 1: asd asdf adf asd \n asd adf 2: asd asdf adf asd \n asd adf 3: asd asdf adf asd asd adf = where I need to see something like this: = 1: asd asdf adf asd 2: asd asdf adf asd \n asd adf 3: asd asdf adf asd asd adf 4: asd asdf adf asd asd adf ad dasf \n dsaf = i.e. \n should pass through, where \\n or \ should not. What about trying something like: $strval =~ m/^(( [^\\] | \\ (?! \\ (?= n)) | \\ (?! \) )*)/x; which (or so I think ;0) collects as many characters from the beginning of $str that meet these conditions (bs = backslash): = 1) not a bs or 2) a bs that is *not* followed by a ( bs that *is* followed by a n ) or 3) a bs that is *not* followed by a = Am I making any sense? ;0) Thank you for taking the time in any case! Christopher -Original Message- From: $Bill Luebkert [mailto:[EMAIL PROTECTED] Sent: Tuesday, December 14, 2004 1:41 PM To: Christopher Hahn Cc: [EMAIL PROTECTED] Subject: Re: regexp question Christopher Hahn wrote: Hey, I seem to be missing a piece of the puzzleI want to define a character class ([]) with atoms to (not) match involve negative look-ahead assertions.but no joy. I am reading a stream of text that may contain the two segments \\n and \ I want to define a regexp that will match up to the first of either of these. ...ie. something like ([^]*) where the character class is just the two sequences above. ...but they are not characters at all, but strings, and so I wonder how to approach this. Question: how best to do something to set $1 == every character in the string up to and not including the first of either a \\n or a \ That is all. (something like $strval =~ m/ (.* (?! \\ (?= \ | \\ (?= n) ) ) )/x;) I am going to use this regexp in a Parse::RecDescent Production, and have other Rules to deal with the \\n and \ strings. I am banging on this and will report when something good comes out of it, but please do chime in with any best practices that suggest themselves to you. I assume that \\n is actually 3 characters and not a newline. It should be as simple as : use strict; foreach ( '1: asd asdf adf asd \\n asd adf \ ad dasf dsaf ', '2: asd asdf adf asd \n asd adf \ ad dasf dsaf ', '3: asd asdf adf asd asd adf \ ad dasf \\n dsaf ', '4: asd asdf adf asd asd adf ad dasf \n dsaf ', ) { if (/^(.*)(?:n|\\)/) { print $1\n; } } __END__ -- ,-/- __ _ _ $Bill LuebkertMailto:[EMAIL PROTECTED] (_/ / )// // DBE CollectiblesMailto:[EMAIL PROTECTED] / ) /-- o // // Castle of Medieval Myth Magic http://www.todbe.com/ -/-' /___/__/_/_http://dbecoll.tripod.com/ (My Perl/Lakers stuff) ___ Perl-Win32-Users mailing list [EMAIL PROTECTED] To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
Re: regexp question
Christopher Hahn wrote: $Bill, The (?: ) construct may be non-capturing, but it does eat text from the buffer (sic) ...and, besides, when I ran it I saw this: = 1: asd asdf adf asd \n asd adf 2: asd asdf adf asd \n asd adf 3: asd asdf adf asd asd adf = where I need to see something like this: = 1: asd asdf adf asd 2: asd asdf adf asd \n asd adf 3: asd asdf adf asd asd adf 4: asd asdf adf asd asd adf ad dasf \n dsaf = i.e. \n should pass through, where \\n or \ should not. Does a non-greedy match help (also - I was missing an escape or two on the \\n): foreach ( '1: asd asdf adf asd n asd adf \ ad dasf dsaf ', '2: asd asdf adf asd \n asd adf \ ad dasf dsaf ', '3: asd asdf adf asd asd adf \ ad dasf n dsaf ', '4: asd asdf adf asd asd adf ad dasf \n dsaf ', ) { if (/^(.*?)(?:n|\\)/) { print $1\n; } } What about trying something like: $strval =~ m/^(( [^\\] | \\ (?! \\ (?= n)) | \\ (?! \) )*)/x; which (or so I think ;0) collects as many characters from the beginning of $str that meet these conditions (bs = backslash): = 1) not a bs or 2) a bs that is *not* followed by a ( bs that *is* followed by a n ) or 3) a bs that is *not* followed by a = -- ,-/- __ _ _ $Bill LuebkertMailto:[EMAIL PROTECTED] (_/ / )// // DBE CollectiblesMailto:[EMAIL PROTECTED] / ) /-- o // // Castle of Medieval Myth Magic http://www.todbe.com/ -/-' /___/__/_/_http://dbecoll.tripod.com/ (My Perl/Lakers stuff) ___ Perl-Win32-Users mailing list [EMAIL PROTECTED] To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
Re: RegExp Question
John Giordano wrote: $grep_deferred = system ('findstr DeferredStatus response1'); print "$grep_deferred\n\n"; [snip] $grep_deferred has this in it: a href="/c9410ee04de9845704db8951dfde015b/DeferredStatus"img src="/images/btnstats.gif" width=120 height=40 border=0 alt="Mail Status"/a Are you sure? Try printing out "bloop\nblip\n$grep_deferred\nblep\n" and see whether this a href line comes between "blip" and "blep". From `perldoc -f system`: The return value is the exit status of the program as returned by the `wait' call. To get the actual exit value divide by 256. So if $grep_deferred is the return value from system, it's probably a number. The line you're seeing on the screen is presumably the output of findstr, which went to STDOUT; since you didn't redirect STDOUT, it went the same place your print went. You probably want `` (backticks). ObTMTOWTDI: if you're looking for a fixed substring such as 'a ', then consider using index() rather than regular expressions. See useless benchmark at the end. Cheers, Philip #!perl -w use strict; use Benchmark 'cmpthese'; cmpthese(5_000_000, { 'index.start.found'= sub { $a = index 'a href="/c9410ee04de9845704db8951dfde015b/DeferredStatus"img src="/images/btnstats.gif" width=120 height=40 border=0 alt="Mail Status"/a', 'a '; }, 'index.start.notfound' = sub { $a = index '[a href="/c9410ee04de9845704db8951dfde015b/DeferredStatus"img src="/images/btnstats.gif" width=120 height=40 border=0 alt="Mail Status"/a', 'a '; }, 'regex.start.found'= sub { $a = 'a href="/c9410ee04de9845704db8951dfde015b/DeferredStatus"img src="/images/btnstats.gif" width=120 height=40 border=0 alt="Mail Status"/a' =~ /a /; }, 'regex.start.notfound' = sub { $a = '[a href="/c9410ee04de9845704db8951dfde015b/DeferredStatus"img src="/images/btnstats.gif" width=120 height=40 border=0 alt="Mail Status"/a' =~ /a /; }, 'index.end.found' = sub { $a = index 'fhuhge8a goija fgoja w04gua roigj oöijf g9a8u onigö lfsdkj gölija osij ga0 jügojar öoigj öosijd ü08g9u gaji fäöigjah+w r0g9j aäüjäfpoiajs öoijfd +ajsd fäüpioaj öwoeijf öoasuidhf p9iawh üefhaj woeijg awopihg apiuehrg üaiowj eoöfigjaw ejifa href="/c9410ee04de9845704db8951dfde015b/DeferredStatus"img src="/images/btnstats.gif" width=120 height=40 border=0 alt="Mail Status"/a', 'a '; }, 'index.end.notfound' = sub { $a = index 'fhuhge8a goija fgoja w04gua roigj oöijf g9a8u onigö lfsdkj gölija osij ga0 jügojar öoigj öosijd ü08g9u gaji fäöigjah+w r0g9j aäüjäfpoiajs öoijfd +ajsd fäüpioaj öwoeijf öoasuidhf p9iawh üefhaj woeijg awopihg apiuehrg üaiowj eoöfigjaw ejif[a href="/c9410ee04de9845704db8951dfde015b/DeferredStatus"img src="/images/btnstats.gif" width=120 height=40 border=0 alt="Mail Status"/a', 'a '; }, 'regex.end.found' = sub { $a = 'fhuhge8a goija fgoja w04gua roigj oöijf g9a8u onigö lfsdkj gölija osij ga0 jügojar öoigj öosijd ü08g9u gaji fäöigjah+w r0g9j aäüjäfpoiajs öoijfd +ajsd fäüpioaj öwoeijf öoasuidhf p9iawh üefhaj woeijg awopihg apiuehrg üaiowj eoöfigjaw ejifa href="/c9410ee04de9845704db8951dfde015b/DeferredStatus"img src="/images/btnstats.gif" width=120 height=40 border=0 alt="Mail Status"/a' =~ /a /; }, 'regex.end.notfound' = sub { $a = 'fhuhge8a goija fgoja w04gua roigj oöijf g9a8u onigö lfsdkj gölija osij ga0 jügojar öoigj öosijd ü08g9u gaji fäöigjah+w r0g9j aäüjäfpoiajs öoijfd +ajsd fäüpioaj öwoeijf öoasuidhf p9iawh üefhaj woeijg awopihg apiuehrg üaiowj eoöfigjaw ejif[a href="/c9410ee04de9845704db8951dfde015b/DeferredStatus"img src="/images/btnstats.gif" width=120 height=40 border=0 alt="Mail Status"/a' =~ /a /; }, 'assign' = sub { $a = 1; }, }); __END__ Output: Benchmark: timing 500 iterations of assign, index.end.found, index.end.notfound, index.start.found, inde x.start.notfound, regex.end.found, regex.end.notfound, regex.start.found, regex.start.notfound... assign: 0 wallclock secs ( 1.02 usr + 0.00 sys = 1.02 CPU) @ 4897159.65/s (n=500) index.end.found: 9 wallclock secs ( 9.79 usr + 0.00 sys = 9.79 CPU) @ 510516.64/s (n=500) index.end.notfound: 12 wallclock secs (12.38 usr + 0.00 sys = 12.38 CPU) @ 404007.76/s (n=500) index.start.found: 3 wallclock secs ( 2.49 usr + 0.00 sys = 2.49 CPU) @ 2004811.55/s (n=500) index.start.notfound: 5 wallclock secs ( 5.42 usr + 0.00 sys = 5.42 CPU) @ 922849.76/s (n=500) regex.end.found: 12 wallclock secs (11.42 usr + 0.00 sys = 11.42 CPU) @ 437943.42/s (n=500) regex.end.notfound: 13 wallclock secs (13.34 usr + 0.00 sys = 13.34 CPU) @ 374840.69/s (n=500) regex.start.found: 5 wallclock secs ( 3.91 usr + 0.00 sys = 3.91 CPU) @ 1280081.93/s (n=500) regex.start.notfound: 7 wallclock secs ( 6.38 usr + 0.00 sys = 6.38 CPU) @ 783821.92/s (n=500) Rate regex.end.notfound index.end.notfound regex.end.found index.end.found regex.start.notfound index.start.notfound regex.start.found index.start.found assign regex.end.notfound374841/s