Re: regexp question
> I seem to be missing a piece of the puzzleI want to define a character > class ([]) > with atoms to (not) match involve negative look-ahead assertions.but no > joy. > > I am reading a stream of text that may contain the two segments \\n and \" > > I want to define a regexp that will match up to the first of either of > these. I would suggest a slightly different approach, especially since you apparently might not have either sequence. Try if ($str =~ /n|\\"/) { $mytext = substr($str, 0, $-[0]); # do whatever with $mytext } The arrays @- and @+ contain the starting and ending offsets within a string of the portion successfully matched by a regular _expression_; the 0th elements refer to the whole regex, and other elements to the portions captured by parentheses. See perlvar in the docs and the description of substr under perlfunc. --Eric AmickColumbia, MD ___ Perl-Win32-Users mailing list [EMAIL PROTECTED] To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
RE: regexp question
Hello all, Thank you for the time. I guess that this foray into lookaround assertions was bust. ;0) I am using Parse::RecDescent and I realized that I could use the order of Rules to ensure that if I *wasn't* looking at either a \\n or a \" then I would have a bare \ that a simple Production could deal with. Thanks (Mr $) to all! chahn -Original Message- From: $Bill Luebkert [mailto:[EMAIL PROTECTED] Sent: Tuesday, December 14, 2004 7:37 PM To: Christopher Hahn Cc: [EMAIL PROTECTED] Subject: Re: regexp question Christopher Hahn wrote: > $Bill, > > The (?: ) construct may be non-capturing, but it does eat text from > the buffer (sic) > > ...and, besides, when I ran it I saw this: > = > 1: asd asdf adf asd \n asd adf > 2: asd asdf adf asd \n asd adf > 3: asd asdf adf asd asd adf > = > > where I need to see something like this: > = > 1: asd asdf adf asd > 2: asd asdf adf asd \n asd adf > 3: asd asdf adf asd asd adf > 4: asd asdf adf asd asd adf " ad dasf \n dsaf > = > > i.e. \n should pass through, where \\n or \" should not. Does a non-greedy match help (also - I was missing an escape or two on the \\n): foreach ( '1: asd asdf adf asd n asd adf \" ad dasf dsaf ', '2: asd asdf adf asd \n asd adf \" ad dasf dsaf ', '3: asd asdf adf asd asd adf \" ad dasf n dsaf ', '4: asd asdf adf asd asd adf " ad dasf \n dsaf ', ) { if (/^(.*?)(?:n|\\")/) { print "$1\n"; } } > > What about trying something like: > > $strval =~ m/^(( [^\\] | \\ (?! \\ (?= n)) | \\ (?! \") )*)/x; > > which (or so I think ;0) collects as many characters from the > beginning of $str that meet these conditions (bs = backslash): > = >1) not a bs > or 2) a bs that is *not* followed by a ( bs that *is* followed by a n > ) or 3) a bs that is *not* followed by a " > = -- ,-/- __ _ _ $Bill LuebkertMailto:[EMAIL PROTECTED] (_/ / )// // DBE CollectiblesMailto:[EMAIL PROTECTED] / ) /--< o // // Castle of Medieval Myth & Magic http://www.todbe.com/ -/-' /___/_<_http://dbecoll.tripod.com/ (My Perl/Lakers stuff) ___ Perl-Win32-Users mailing list [EMAIL PROTECTED] To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
Re: regexp question
Christopher Hahn wrote: > $Bill, > > The (?: ) construct may be non-capturing, but it does eat text from the > buffer (sic) > > ...and, besides, when I ran it I saw this: > = > 1: asd asdf adf asd \n asd adf > 2: asd asdf adf asd \n asd adf > 3: asd asdf adf asd asd adf > = > > where I need to see something like this: > = > 1: asd asdf adf asd > 2: asd asdf adf asd \n asd adf > 3: asd asdf adf asd asd adf > 4: asd asdf adf asd asd adf " ad dasf \n dsaf > = > > i.e. \n should pass through, where \\n or \" should not. Does a non-greedy match help (also - I was missing an escape or two on the \\n): foreach ( '1: asd asdf adf asd n asd adf \" ad dasf dsaf ', '2: asd asdf adf asd \n asd adf \" ad dasf dsaf ', '3: asd asdf adf asd asd adf \" ad dasf n dsaf ', '4: asd asdf adf asd asd adf " ad dasf \n dsaf ', ) { if (/^(.*?)(?:n|\\")/) { print "$1\n"; } } > > What about trying something like: > > $strval =~ m/^(( [^\\] | \\ (?! \\ (?= n)) | \\ (?! \") )*)/x; > > which (or so I think ;0) collects as many characters from the beginning of > $str that > meet these conditions (bs = backslash): > = >1) not a bs > or 2) a bs that is *not* followed by a ( bs that *is* followed by a n ) > or 3) a bs that is *not* followed by a " > = -- ,-/- __ _ _ $Bill LuebkertMailto:[EMAIL PROTECTED] (_/ / )// // DBE CollectiblesMailto:[EMAIL PROTECTED] / ) /--< o // // Castle of Medieval Myth & Magic http://www.todbe.com/ -/-' /___/_<_http://dbecoll.tripod.com/ (My Perl/Lakers stuff) ___ Perl-Win32-Users mailing list [EMAIL PROTECTED] To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
RE: regexp question
$Bill, The (?: ) construct may be non-capturing, but it does eat text from the buffer (sic) ...and, besides, when I ran it I saw this: = 1: asd asdf adf asd \n asd adf 2: asd asdf adf asd \n asd adf 3: asd asdf adf asd asd adf = where I need to see something like this: = 1: asd asdf adf asd 2: asd asdf adf asd \n asd adf 3: asd asdf adf asd asd adf 4: asd asdf adf asd asd adf " ad dasf \n dsaf = i.e. \n should pass through, where \\n or \" should not. What about trying something like: $strval =~ m/^(( [^\\] | \\ (?! \\ (?= n)) | \\ (?! \") )*)/x; which (or so I think ;0) collects as many characters from the beginning of $str that meet these conditions (bs = backslash): = 1) not a bs or 2) a bs that is *not* followed by a ( bs that *is* followed by a n ) or 3) a bs that is *not* followed by a " = Am I making any sense? ;0) Thank you for taking the time in any case! Christopher -Original Message- From: $Bill Luebkert [mailto:[EMAIL PROTECTED] Sent: Tuesday, December 14, 2004 1:41 PM To: Christopher Hahn Cc: [EMAIL PROTECTED] Subject: Re: regexp question Christopher Hahn wrote: > Hey, > > I seem to be missing a piece of the puzzleI want to define a > character class ([]) with atoms to (not) match involve negative > look-ahead assertions.but no joy. > > I am reading a stream of text that may contain the two segments \\n and \" > > I want to define a regexp that will match up to the first of either of > these. > > ...ie. something like ([^]*) where the character class is just the two > sequences above. > > ...but they are not characters at all, but strings, and so I wonder > how to approach this. > > Question: how best to do something to set > >$1 == every character in the string up to and not including the > first of either a \\n or a \" > > That is all. (something like $strval =~ m/ (.* (?! \\ (?= \" | \\ (?= > n) ) ) > )/x;) > > I am going to use this regexp in a Parse::RecDescent Production, and > have other Rules to deal with the \\n and \" strings. > > I am banging on this and will report when something good comes out of > it, but please do chime in with any "best practices" that suggest > themselves to you. I assume that \\n is actually 3 characters and not a newline. It should be as simple as : use strict; foreach ( '1: asd asdf adf asd \\n asd adf \" ad dasf dsaf ', '2: asd asdf adf asd \n asd adf \" ad dasf dsaf ', '3: asd asdf adf asd asd adf \" ad dasf \\n dsaf ', '4: asd asdf adf asd asd adf " ad dasf \n dsaf ', ) { if (/^(.*)(?:n|\\")/) { print "$1\n"; } } __END__ -- ,-/- __ _ _ $Bill LuebkertMailto:[EMAIL PROTECTED] (_/ / )// // DBE CollectiblesMailto:[EMAIL PROTECTED] / ) /--< o // // Castle of Medieval Myth & Magic http://www.todbe.com/ -/-' /___/_<_http://dbecoll.tripod.com/ (My Perl/Lakers stuff) ___ Perl-Win32-Users mailing list [EMAIL PROTECTED] To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
Re: regexp question
Christopher Hahn wrote: > Hey, > > I seem to be missing a piece of the puzzleI want to define a character > class ([]) > with atoms to (not) match involve negative look-ahead assertions.but no > joy. > > I am reading a stream of text that may contain the two segments \\n and \" > > I want to define a regexp that will match up to the first of either of > these. > > ...ie. something like ([^]*) where the character class is just the two > sequences above. > > ...but they are not characters at all, but strings, and so I wonder how to > approach this. > > Question: how best to do something to set > >$1 == every character in the string up to and not including the first of > either a \\n or a \" > > That is all. (something like $strval =~ m/ (.* (?! \\ (?= \" | \\ (?= n) ) ) > )/x;) > > I am going to use this regexp in a Parse::RecDescent Production, and have > other Rules to > deal with the \\n and \" strings. > > I am banging on this and will report when something good comes out of it, > but please do > chime in with any "best practices" that suggest themselves to you. I assume that \\n is actually 3 characters and not a newline. It should be as simple as : use strict; foreach ( '1: asd asdf adf asd \\n asd adf \" ad dasf dsaf ', '2: asd asdf adf asd \n asd adf \" ad dasf dsaf ', '3: asd asdf adf asd asd adf \" ad dasf \\n dsaf ', '4: asd asdf adf asd asd adf " ad dasf \n dsaf ', ) { if (/^(.*)(?:n|\\")/) { print "$1\n"; } } __END__ -- ,-/- __ _ _ $Bill LuebkertMailto:[EMAIL PROTECTED] (_/ / )// // DBE CollectiblesMailto:[EMAIL PROTECTED] / ) /--< o // // Castle of Medieval Myth & Magic http://www.todbe.com/ -/-' /___/_<_http://dbecoll.tripod.com/ (My Perl/Lakers stuff) ___ Perl-Win32-Users mailing list [EMAIL PROTECTED] To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
regexp question
Hey, I seem to be missing a piece of the puzzleI want to define a character class ([]) with atoms to (not) match involve negative look-ahead assertions.but no joy. I am reading a stream of text that may contain the two segments \\n and \" I want to define a regexp that will match up to the first of either of these. ...ie. something like ([^]*) where the character class is just the two sequences above. ...but they are not characters at all, but strings, and so I wonder how to approach this. Question: how best to do something to set $1 == every character in the string up to and not including the first of either a \\n or a \" That is all. (something like $strval =~ m/ (.* (?! \\ (?= \" | \\ (?= n) ) ) )/x;) I am going to use this regexp in a Parse::RecDescent Production, and have other Rules to deal with the \\n and \" strings. I am banging on this and will report when something good comes out of it, but please do chime in with any "best practices" that suggest themselves to you. TIA! Christopher -- Realisant mon espoir, je me lance vers la gloire Christopher Kenneth Hahn -- [EMAIL PROTECTED] ___ Perl-Win32-Users mailing list [EMAIL PROTECTED] To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
Re: RegExp Question
John Giordano wrote: > $grep_deferred = system ('findstr DeferredStatus response1'); > > print "$grep_deferred\n\n"; [snip] > $grep_deferred has this in it: > > src="/images/btnstats.gif" width=120 height=40 border=0 alt="Mail > Status"> Are you sure? Try printing out "bloop\nblip\n$grep_deferred\nblep\n" and see whether this line comes between "blip" and "blep". >From `perldoc -f system`: The return value is the exit status of the program as returned by the `wait' call. To get the actual exit value divide by 256. So if $grep_deferred is the return value from system, it's probably a number. The line you're seeing on the screen is presumably the output of findstr, which went to STDOUT; since you didn't redirect STDOUT, it went the same place your print went. You probably want `` (backticks). ObTMTOWTDI: if you're looking for a fixed substring such as ' sub { $a = index '', ' sub { $a = index '[a href="/c9410ee04de9845704db8951dfde015b/DeferredStatus">', ' sub { $a = '' =~ / sub { $a = '[a href="/c9410ee04de9845704db8951dfde015b/DeferredStatus">' =~ / sub { $a = index 'fhuhge8a goija fgoja w04gua roigj oöijf g9a8u onigö lfsdkj gölija osij ga0 jügojar öoigj öosijd ü08g9u gaji fäöigjah+w r0g9j aäüjäfpoiajs öoijfd +ajsd fäüpioaj öwoeijf öoasuidhf p9iawh üefhaj woeijg awopihg apiuehrg üaiowj eoöfigjaw ejif', ' sub { $a = index 'fhuhge8a goija fgoja w04gua roigj oöijf g9a8u onigö lfsdkj gölija osij ga0 jügojar öoigj öosijd ü08g9u gaji fäöigjah+w r0g9j aäüjäfpoiajs öoijfd +ajsd fäüpioaj öwoeijf öoasuidhf p9iawh üefhaj woeijg awopihg apiuehrg üaiowj eoöfigjaw ejif[a href="/c9410ee04de9845704db8951dfde015b/DeferredStatus">', ' sub { $a = 'fhuhge8a goija fgoja w04gua roigj oöijf g9a8u onigö lfsdkj gölija osij ga0 jügojar öoigj öosijd ü08g9u gaji fäöigjah+w r0g9j aäüjäfpoiajs öoijfd +ajsd fäüpioaj öwoeijf öoasuidhf p9iawh üefhaj woeijg awopihg apiuehrg üaiowj eoöfigjaw ejif' =~ / sub { $a = 'fhuhge8a goija fgoja w04gua roigj oöijf g9a8u onigö lfsdkj gölija osij ga0 jügojar öoigj öosijd ü08g9u gaji fäöigjah+w r0g9j aäüjäfpoiajs öoijfd +ajsd fäüpioaj öwoeijf öoasuidhf p9iawh üefhaj woeijg awopihg apiuehrg üaiowj eoöfigjaw ejif[a href="/c9410ee04de9845704db8951dfde015b/DeferredStatus">' =~ / sub { $a = 1; }, }); __END__ Output: Benchmark: timing 500 iterations of assign, index.end.found, index.end.notfound, index.start.found, inde x.start.notfound, regex.end.found, regex.end.notfound, regex.start.found, regex.start.notfound... assign: 0 wallclock secs ( 1.02 usr + 0.00 sys = 1.02 CPU) @ 4897159.65/s (n=500) index.end.found: 9 wallclock secs ( 9.79 usr + 0.00 sys = 9.79 CPU) @ 510516.64/s (n=500) index.end.notfound: 12 wallclock secs (12.38 usr + 0.00 sys = 12.38 CPU) @ 404007.76/s (n=500) index.start.found: 3 wallclock secs ( 2.49 usr + 0.00 sys = 2.49 CPU) @ 2004811.55/s (n=500) index.start.notfound: 5 wallclock secs ( 5.42 usr + 0.00 sys = 5.42 CPU) @ 922849.76/s (n=500) regex.end.found: 12 wallclock secs (11.42 usr + 0.00 sys = 11.42 CPU) @ 437943.42/s (n=500) regex.end.notfound: 13 wallclock secs (13.34 usr + 0.00 sys = 13.34 CPU) @ 374840.69/s (n=500) regex.start.found: 5 wallclock secs ( 3.91 usr + 0.00 sys = 3.91 CPU) @ 1280081.93/s (n=500) regex.start.notfound: 7 wallclock secs ( 6.38 usr + 0.00 sys = 6.38 CPU) @ 783821.92/s (n=500) Rate regex.end.notfound index.end.notfound regex.end.found index.end.found regex.start.notfound index.start.notfound regex.start.found index.start.found assign regex.end.notfound374841/s ---7% -14%-27% -52% -59% -71% -81% -92% index.end.notfound404008/s 8% -- -8%-21% -48% -56% -68% -80% -92% regex.end.found 437943/s17% 8% ---14% -44% -53% -66% -78% -91% index.end.found 510517/s36%26% 17% -- -35% -45% -60% -75% -90% regex.start.notfound 783822/s 109%94% 79% 54% -- -15% -39% -61% -84% index.start.notfound 922850/s 146% 128% 111% 81% 18% -- -28% -54% -81% regex.start.found1280082/s 242% 217% 192%151% 63% 39%-- -36% -74% index.start.found2004812/s 435% 396% 358%293% 156% 117% 57% -- -59% assign 4897160/s 1206% 1112% 1018%859% 525% 431% 283% 144% -- ___ Perl-Win32-
Re: RegExp Question
John Giordano wrote: > > Hello, > > Could someone please tell me why this: > > ### > $grep_deferred = system ('findstr DeferredStatus response1'); > > print "$grep_deferred\n\n"; > > if ($grep_deferred =~ / > print "It contains } else { > > print "It doesn't contain } > - > doesn't find this: > > > The above code may need some further explanation. I am trying to extract > the > $grep_deferred has this in it: > > src="/images/btnstats.gif" width=120 height=40 border=0 alt="Mail > Status"> > > < doesn't need a backslash in front of it right? Correct, it doesn't. This test snippet works fine for me (I modified your RE, but it works your way too): use strict; my $grep_deferred = q{} . q{}; print "$grep_deferred\n\n"; if ($grep_deferred =~ /http://www.todbe.com/ / ) /--< o // // Mailto:[EMAIL PROTECTED] http://dbecoll.webjump.com/ -/-' /___/_<_http://www.freeyellow.com/members/dbecoll/ ___ Perl-Win32-Users mailing list [EMAIL PROTECTED] http://listserv.ActiveState.com/mailman/listinfo/perl-win32-users