RE: regexp question

2004-12-15 Thread Christopher Hahn
 
Hello all,

Thank you for the time.  I guess that this foray into lookaround assertions
was bust.  ;0) I am using Parse::RecDescent and I realized that I could use
the
order of Rules to ensure that if I *wasn't* looking at either a \\n or a \
then I would have a bare \ that a simple Production could deal with.

Thanks (Mr $) to all!

chahn

-Original Message-
From: $Bill Luebkert [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, December 14, 2004 7:37 PM
To: Christopher Hahn
Cc: [EMAIL PROTECTED]
Subject: Re: regexp question

Christopher Hahn wrote:

 $Bill,
 
 The (?: ) construct may be non-capturing, but it does eat text from 
 the buffer (sic)
 
 ...and, besides, when I ran it I saw this:
 =
 1: asd asdf adf asd \n asd adf
 2: asd asdf adf asd \n asd adf
 3: asd asdf adf asd  asd adf
 =
 
 where I need to see something like this:
 =
 1: asd asdf adf asd
 2: asd asdf adf asd \n asd adf
 3: asd asdf adf asd  asd adf
 4: asd asdf adf asd  asd adf  ad dasf \n dsaf 
 =
 
 i.e. \n should pass through, where \\n or \ should not.

Does a non-greedy match help (also - I was missing an escape or two on the
\\n):

foreach (
  '1: asd asdf adf asd n asd adf \ ad dasf dsaf ',
  '2: asd asdf adf asd \n asd adf \ ad dasf dsaf ',
  '3: asd asdf adf asd  asd adf \ ad dasf n dsaf ',
  '4: asd asdf adf asd  asd adf  ad dasf \n dsaf ',
  ) {
if (/^(.*?)(?:n|\\)/) {
print $1\n;
}
}

 
 What about trying something like:
 
   $strval =~ m/^(( [^\\] | \\ (?! \\ (?= n)) | \\ (?! \) )*)/x;
 
 which (or so I think ;0) collects as many characters from the 
 beginning of $str that meet these conditions (bs = backslash):
 =
1) not a bs
 or 2) a bs that is *not* followed by a ( bs that *is* followed by a n 
 ) or 3) a bs that is *not* followed by a 
 =

-- 
  ,-/-  __  _  _ $Bill LuebkertMailto:[EMAIL PROTECTED]
 (_/   /  )// //   DBE CollectiblesMailto:[EMAIL PROTECTED]
  / ) /--  o // //  Castle of Medieval Myth  Magic
http://www.todbe.com/
-/-' /___/__/_/_http://dbecoll.tripod.com/ (My Perl/Lakers stuff)
___
Perl-Win32-Users mailing list
[EMAIL PROTECTED]
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs


Re: regexp question

2004-12-15 Thread eric-amick


 I seem to be missing a piece of the puzzleI want to define a character  class ([])  with atoms to (not) match involve negative look-ahead assertions.but no  joy.   I am reading a stream of text that may contain the two segments \\n and \"   I want to define a regexp that will match up to the first of either of  these. 
I would suggest a slightly different approach, especially since you apparently might not have either sequence. Try

if ($str =~ /n|\\"/)
{
$mytext = substr($str, 0, $-[0]);
# do whatever with $mytext
}

The arrays @- and @+ contain the starting and ending offsets within a string of the portion successfully matched by a regular _expression_; the 0th elements refer to the whole regex, and other elements to the portions captured by parentheses.

See perlvar in the docs and the description of substr under perlfunc.

--Eric AmickColumbia, MD
___
Perl-Win32-Users mailing list
[EMAIL PROTECTED]
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs


regexp question

2004-12-14 Thread Christopher Hahn

Hey,

I seem to be missing a piece of the puzzleI want to define a character
class ([])
with atoms to (not) match involve negative look-ahead assertions.but no
joy.

I am reading a stream of text that may contain the two segments \\n and \

I want to define a regexp that will match up to the first of either of
these.

...ie. something like ([^]*) where the character class is just the two
sequences above.

...but they are not characters at all, but strings, and so I wonder how to
approach this.

Question: how best to do something to set 

   $1 == every character in the string up to and not including the first of
either a \\n or a \

That is all. (something like $strval =~ m/ (.* (?! \\ (?= \ | \\ (?= n) ) )
)/x;)

I am going to use this regexp in a Parse::RecDescent Production, and have
other Rules to 
deal with the \\n and \ strings.

I am banging on this and will report when something good comes out of it,
but please do
chime in with any best practices that suggest themselves to you.  

TIA!

Christopher

--
Realisant mon espoir, je me lance vers la gloire
Christopher Kenneth Hahn -- [EMAIL PROTECTED]
___
Perl-Win32-Users mailing list
[EMAIL PROTECTED]
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs


Re: regexp question

2004-12-14 Thread $Bill Luebkert
Christopher Hahn wrote:

 Hey,
 
 I seem to be missing a piece of the puzzleI want to define a character
 class ([])
 with atoms to (not) match involve negative look-ahead assertions.but no
 joy.
 
 I am reading a stream of text that may contain the two segments \\n and \
 
 I want to define a regexp that will match up to the first of either of
 these.
 
 ...ie. something like ([^]*) where the character class is just the two
 sequences above.
 
 ...but they are not characters at all, but strings, and so I wonder how to
 approach this.
 
 Question: how best to do something to set 
 
$1 == every character in the string up to and not including the first of
 either a \\n or a \
 
 That is all. (something like $strval =~ m/ (.* (?! \\ (?= \ | \\ (?= n) ) )
 )/x;)
 
 I am going to use this regexp in a Parse::RecDescent Production, and have
 other Rules to 
 deal with the \\n and \ strings.
 
 I am banging on this and will report when something good comes out of it,
 but please do
 chime in with any best practices that suggest themselves to you.  

I assume that \\n is actually 3 characters and not a newline.
It should be as simple as :

use strict;

foreach (
  '1: asd asdf adf asd \\n asd adf \ ad dasf dsaf ',
  '2: asd asdf adf asd \n asd adf \ ad dasf dsaf ',
  '3: asd asdf adf asd  asd adf \ ad dasf \\n dsaf ',
  '4: asd asdf adf asd  asd adf  ad dasf \n dsaf ',
  ) {
if (/^(.*)(?:n|\\)/) {
print $1\n;
}
}

__END__


-- 
  ,-/-  __  _  _ $Bill LuebkertMailto:[EMAIL PROTECTED]
 (_/   /  )// //   DBE CollectiblesMailto:[EMAIL PROTECTED]
  / ) /--  o // //  Castle of Medieval Myth  Magic http://www.todbe.com/
-/-' /___/__/_/_http://dbecoll.tripod.com/ (My Perl/Lakers stuff)
___
Perl-Win32-Users mailing list
[EMAIL PROTECTED]
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs


RE: regexp question

2004-12-14 Thread Christopher Hahn

$Bill,

The (?: ) construct may be non-capturing, but it does eat text from the
buffer (sic)

...and, besides, when I ran it I saw this:
=
1: asd asdf adf asd \n asd adf 
2: asd asdf adf asd \n asd adf 
3: asd asdf adf asd  asd adf 
=

where I need to see something like this:
=
1: asd asdf adf asd 
2: asd asdf adf asd \n asd adf 
3: asd asdf adf asd  asd adf 
4: asd asdf adf asd  asd adf  ad dasf \n dsaf 
=

i.e. \n should pass through, where \\n or \ should not.


What about trying something like:

  $strval =~ m/^(( [^\\] | \\ (?! \\ (?= n)) | \\ (?! \) )*)/x;

which (or so I think ;0) collects as many characters from the beginning of
$str that
meet these conditions (bs = backslash):
=
   1) not a bs
or 2) a bs that is *not* followed by a ( bs that *is* followed by a n )
or 3) a bs that is *not* followed by a 
=

Am I making any sense?  ;0)

Thank you for taking the time in any case!

Christopher

-Original Message-
From: $Bill Luebkert [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, December 14, 2004 1:41 PM
To: Christopher Hahn
Cc: [EMAIL PROTECTED]
Subject: Re: regexp question

Christopher Hahn wrote:

 Hey,
 
 I seem to be missing a piece of the puzzleI want to define a 
 character class ([]) with atoms to (not) match involve negative 
 look-ahead assertions.but no joy.
 
 I am reading a stream of text that may contain the two segments \\n and \
 
 I want to define a regexp that will match up to the first of either of 
 these.
 
 ...ie. something like ([^]*) where the character class is just the two 
 sequences above.
 
 ...but they are not characters at all, but strings, and so I wonder 
 how to approach this.
 
 Question: how best to do something to set
 
$1 == every character in the string up to and not including the 
 first of either a \\n or a \
 
 That is all. (something like $strval =~ m/ (.* (?! \\ (?= \ | \\ (?= 
 n) ) )
 )/x;)
 
 I am going to use this regexp in a Parse::RecDescent Production, and 
 have other Rules to deal with the \\n and \ strings.
 
 I am banging on this and will report when something good comes out of 
 it, but please do chime in with any best practices that suggest 
 themselves to you.

I assume that \\n is actually 3 characters and not a newline.
It should be as simple as :

use strict;

foreach (
  '1: asd asdf adf asd \\n asd adf \ ad dasf dsaf ',
  '2: asd asdf adf asd \n asd adf \ ad dasf dsaf ',
  '3: asd asdf adf asd  asd adf \ ad dasf \\n dsaf ',
  '4: asd asdf adf asd  asd adf  ad dasf \n dsaf ',
  ) {
if (/^(.*)(?:n|\\)/) {
print $1\n;
}
}

__END__


-- 
  ,-/-  __  _  _ $Bill LuebkertMailto:[EMAIL PROTECTED]
 (_/   /  )// //   DBE CollectiblesMailto:[EMAIL PROTECTED]
  / ) /--  o // //  Castle of Medieval Myth  Magic
http://www.todbe.com/
-/-' /___/__/_/_http://dbecoll.tripod.com/ (My Perl/Lakers stuff)
___
Perl-Win32-Users mailing list
[EMAIL PROTECTED]
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs


Re: regexp question

2004-12-14 Thread $Bill Luebkert
Christopher Hahn wrote:

 $Bill,
 
 The (?: ) construct may be non-capturing, but it does eat text from the
 buffer (sic)
 
 ...and, besides, when I ran it I saw this:
 =
 1: asd asdf adf asd \n asd adf 
 2: asd asdf adf asd \n asd adf 
 3: asd asdf adf asd  asd adf 
 =
 
 where I need to see something like this:
 =
 1: asd asdf adf asd 
 2: asd asdf adf asd \n asd adf 
 3: asd asdf adf asd  asd adf 
 4: asd asdf adf asd  asd adf  ad dasf \n dsaf 
 =
 
 i.e. \n should pass through, where \\n or \ should not.

Does a non-greedy match help (also - I was missing an escape
or two on the \\n):

foreach (
  '1: asd asdf adf asd n asd adf \ ad dasf dsaf ',
  '2: asd asdf adf asd \n asd adf \ ad dasf dsaf ',
  '3: asd asdf adf asd  asd adf \ ad dasf n dsaf ',
  '4: asd asdf adf asd  asd adf  ad dasf \n dsaf ',
  ) {
if (/^(.*?)(?:n|\\)/) {
print $1\n;
}
}

 
 What about trying something like:
 
   $strval =~ m/^(( [^\\] | \\ (?! \\ (?= n)) | \\ (?! \) )*)/x;
 
 which (or so I think ;0) collects as many characters from the beginning of
 $str that
 meet these conditions (bs = backslash):
 =
1) not a bs
 or 2) a bs that is *not* followed by a ( bs that *is* followed by a n )
 or 3) a bs that is *not* followed by a 
 =

-- 
  ,-/-  __  _  _ $Bill LuebkertMailto:[EMAIL PROTECTED]
 (_/   /  )// //   DBE CollectiblesMailto:[EMAIL PROTECTED]
  / ) /--  o // //  Castle of Medieval Myth  Magic http://www.todbe.com/
-/-' /___/__/_/_http://dbecoll.tripod.com/ (My Perl/Lakers stuff)
___
Perl-Win32-Users mailing list
[EMAIL PROTECTED]
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs


Re: RegExp Question

2001-02-05 Thread Philip Newton

John Giordano wrote:
 $grep_deferred = system ('findstr DeferredStatus response1');
 
 print "$grep_deferred\n\n";
[snip]
 $grep_deferred has this in it:
 
 a href="/c9410ee04de9845704db8951dfde015b/DeferredStatus"img
 src="/images/btnstats.gif" width=120 height=40 border=0 alt="Mail
 Status"/a

Are you sure? Try printing out "bloop\nblip\n$grep_deferred\nblep\n" and see
whether this a href line comes between "blip" and "blep".

From `perldoc -f system`:

 The return value is the exit status of the program as returned
 by the `wait' call. To get the actual exit value divide by 256.

So if $grep_deferred is the return value from system, it's probably a
number. The line you're seeing on the screen is presumably the output of
findstr, which went to STDOUT; since you didn't redirect STDOUT, it went the
same place your print went.

You probably want `` (backticks).

ObTMTOWTDI: if you're looking for a fixed substring such as 'a ', then
consider using index() rather than regular expressions. See useless
benchmark at the end.

Cheers,
Philip

#!perl -w

use strict;
use Benchmark 'cmpthese';

cmpthese(5_000_000, {
  'index.start.found'= sub { $a = index 'a
href="/c9410ee04de9845704db8951dfde015b/DeferredStatus"img
src="/images/btnstats.gif" width=120 height=40 border=0 alt="Mail
Status"/a', 'a '; },
  'index.start.notfound' = sub { $a = index '[a
href="/c9410ee04de9845704db8951dfde015b/DeferredStatus"img
src="/images/btnstats.gif" width=120 height=40 border=0 alt="Mail
Status"/a', 'a '; },
  'regex.start.found'= sub { $a = 'a
href="/c9410ee04de9845704db8951dfde015b/DeferredStatus"img
src="/images/btnstats.gif" width=120 height=40 border=0 alt="Mail
Status"/a' =~ /a /; },
  'regex.start.notfound' = sub { $a = '[a
href="/c9410ee04de9845704db8951dfde015b/DeferredStatus"img
src="/images/btnstats.gif" width=120 height=40 border=0 alt="Mail
Status"/a' =~ /a /; },
  'index.end.found'  = sub { $a = index 'fhuhge8a goija fgoja w04gua
roigj oöijf g9a8u onigö lfsdkj gölija osij ga0 jügojar öoigj öosijd ü08g9u
gaji fäöigjah+w r0g9j aäüjäfpoiajs öoijfd +ajsd fäüpioaj öwoeijf öoasuidhf
p9iawh üefhaj woeijg awopihg apiuehrg üaiowj eoöfigjaw ejifa
href="/c9410ee04de9845704db8951dfde015b/DeferredStatus"img
src="/images/btnstats.gif" width=120 height=40 border=0 alt="Mail
Status"/a', 'a '; },
  'index.end.notfound'   = sub { $a = index 'fhuhge8a goija fgoja w04gua
roigj oöijf g9a8u onigö lfsdkj gölija osij ga0 jügojar öoigj öosijd ü08g9u
gaji fäöigjah+w r0g9j aäüjäfpoiajs öoijfd +ajsd fäüpioaj öwoeijf öoasuidhf
p9iawh üefhaj woeijg awopihg apiuehrg üaiowj eoöfigjaw ejif[a
href="/c9410ee04de9845704db8951dfde015b/DeferredStatus"img
src="/images/btnstats.gif" width=120 height=40 border=0 alt="Mail
Status"/a', 'a '; },
  'regex.end.found'  = sub { $a = 'fhuhge8a goija fgoja w04gua roigj
oöijf g9a8u onigö lfsdkj gölija osij ga0 jügojar öoigj öosijd ü08g9u gaji
fäöigjah+w r0g9j aäüjäfpoiajs öoijfd +ajsd fäüpioaj öwoeijf öoasuidhf p9iawh
üefhaj woeijg awopihg apiuehrg üaiowj eoöfigjaw ejifa
href="/c9410ee04de9845704db8951dfde015b/DeferredStatus"img
src="/images/btnstats.gif" width=120 height=40 border=0 alt="Mail
Status"/a' =~ /a /; },
  'regex.end.notfound'   = sub { $a = 'fhuhge8a goija fgoja w04gua roigj
oöijf g9a8u onigö lfsdkj gölija osij ga0 jügojar öoigj öosijd ü08g9u gaji
fäöigjah+w r0g9j aäüjäfpoiajs öoijfd +ajsd fäüpioaj öwoeijf öoasuidhf p9iawh
üefhaj woeijg awopihg apiuehrg üaiowj eoöfigjaw ejif[a
href="/c9410ee04de9845704db8951dfde015b/DeferredStatus"img
src="/images/btnstats.gif" width=120 height=40 border=0 alt="Mail
Status"/a' =~ /a /; },
  'assign'   = sub { $a = 1; },
});

__END__

Output:
Benchmark: timing 500 iterations of assign, index.end.found,
index.end.notfound, index.start.found, inde
x.start.notfound, regex.end.found, regex.end.notfound, regex.start.found,
regex.start.notfound...
assign:  0 wallclock secs ( 1.02 usr +  0.00 sys =  1.02 CPU) @
4897159.65/s (n=500)
index.end.found:  9 wallclock secs ( 9.79 usr +  0.00 sys =  9.79 CPU) @
510516.64/s (n=500)
index.end.notfound: 12 wallclock secs (12.38 usr +  0.00 sys = 12.38 CPU) @
404007.76/s (n=500)
index.start.found:  3 wallclock secs ( 2.49 usr +  0.00 sys =  2.49 CPU) @
2004811.55/s (n=500)
index.start.notfound:  5 wallclock secs ( 5.42 usr +  0.00 sys =  5.42 CPU)
@ 922849.76/s (n=500)
regex.end.found: 12 wallclock secs (11.42 usr +  0.00 sys = 11.42 CPU) @
437943.42/s (n=500)
regex.end.notfound: 13 wallclock secs (13.34 usr +  0.00 sys = 13.34 CPU) @
374840.69/s (n=500)
regex.start.found:  5 wallclock secs ( 3.91 usr +  0.00 sys =  3.91 CPU) @
1280081.93/s (n=500)
regex.start.notfound:  7 wallclock secs ( 6.38 usr +  0.00 sys =  6.38 CPU)
@ 783821.92/s (n=500)
  Rate regex.end.notfound index.end.notfound
regex.end.found index.end.found regex.start.notfound index.start.notfound
regex.start.found index.start.found assign
regex.end.notfound374841/s