Re: regexp question

2004-12-15 Thread eric-amick


> I seem to be missing a piece of the puzzleI want to define a character > class ([]) > with atoms to (not) match involve negative look-ahead assertions.but no > joy. > > I am reading a stream of text that may contain the two segments \\n and \" > > I want to define a regexp that will match up to the first of either of > these. 
I would suggest a slightly different approach, especially since you apparently might not have either sequence. Try
 
if ($str =~ /n|\\"/)
{
$mytext = substr($str, 0, $-[0]);
# do whatever with $mytext
}
 
The arrays @- and @+ contain the starting and ending offsets within a string of the portion successfully matched by a regular _expression_; the 0th elements refer to the whole regex, and other elements to the portions captured by parentheses.
 
See perlvar in the docs and the description of substr under perlfunc.
 
--Eric AmickColumbia, MD
___
Perl-Win32-Users mailing list
[EMAIL PROTECTED]
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs


RE: regexp question

2004-12-15 Thread Christopher Hahn
 
Hello all,

Thank you for the time.  I guess that this foray into lookaround assertions
was bust.  ;0) I am using Parse::RecDescent and I realized that I could use
the
order of Rules to ensure that if I *wasn't* looking at either a \\n or a \"
then I would have a bare \ that a simple Production could deal with.

Thanks (Mr $) to all!

chahn

-Original Message-
From: $Bill Luebkert [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, December 14, 2004 7:37 PM
To: Christopher Hahn
Cc: [EMAIL PROTECTED]
Subject: Re: regexp question

Christopher Hahn wrote:

> $Bill,
> 
> The (?: ) construct may be non-capturing, but it does eat text from 
> the buffer (sic)
> 
> ...and, besides, when I ran it I saw this:
> =
> 1: asd asdf adf asd \n asd adf
> 2: asd asdf adf asd \n asd adf
> 3: asd asdf adf asd  asd adf
> =
> 
> where I need to see something like this:
> =
> 1: asd asdf adf asd
> 2: asd asdf adf asd \n asd adf
> 3: asd asdf adf asd  asd adf
> 4: asd asdf adf asd  asd adf " ad dasf \n dsaf 
> =
> 
> i.e. \n should pass through, where \\n or \" should not.

Does a non-greedy match help (also - I was missing an escape or two on the
\\n):

foreach (
  '1: asd asdf adf asd n asd adf \" ad dasf dsaf ',
  '2: asd asdf adf asd \n asd adf \" ad dasf dsaf ',
  '3: asd asdf adf asd  asd adf \" ad dasf n dsaf ',
  '4: asd asdf adf asd  asd adf " ad dasf \n dsaf ',
  ) {
if (/^(.*?)(?:n|\\")/) {
print "$1\n";
}
}

> 
> What about trying something like:
> 
>   $strval =~ m/^(( [^\\] | \\ (?! \\ (?= n)) | \\ (?! \") )*)/x;
> 
> which (or so I think ;0) collects as many characters from the 
> beginning of $str that meet these conditions (bs = backslash):
> =
>1) not a bs
> or 2) a bs that is *not* followed by a ( bs that *is* followed by a n 
> ) or 3) a bs that is *not* followed by a "
> =

-- 
  ,-/-  __  _  _ $Bill LuebkertMailto:[EMAIL PROTECTED]
 (_/   /  )// //   DBE CollectiblesMailto:[EMAIL PROTECTED]
  / ) /--<  o // //  Castle of Medieval Myth & Magic
http://www.todbe.com/
-/-' /___/_<_http://dbecoll.tripod.com/ (My Perl/Lakers stuff)
___
Perl-Win32-Users mailing list
[EMAIL PROTECTED]
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs


Re: regexp question

2004-12-14 Thread $Bill Luebkert
Christopher Hahn wrote:

> $Bill,
> 
> The (?: ) construct may be non-capturing, but it does eat text from the
> buffer (sic)
> 
> ...and, besides, when I ran it I saw this:
> =
> 1: asd asdf adf asd \n asd adf 
> 2: asd asdf adf asd \n asd adf 
> 3: asd asdf adf asd  asd adf 
> =
> 
> where I need to see something like this:
> =
> 1: asd asdf adf asd 
> 2: asd asdf adf asd \n asd adf 
> 3: asd asdf adf asd  asd adf 
> 4: asd asdf adf asd  asd adf " ad dasf \n dsaf 
> =
> 
> i.e. \n should pass through, where \\n or \" should not.

Does a non-greedy match help (also - I was missing an escape
or two on the \\n):

foreach (
  '1: asd asdf adf asd n asd adf \" ad dasf dsaf ',
  '2: asd asdf adf asd \n asd adf \" ad dasf dsaf ',
  '3: asd asdf adf asd  asd adf \" ad dasf n dsaf ',
  '4: asd asdf adf asd  asd adf " ad dasf \n dsaf ',
  ) {
if (/^(.*?)(?:n|\\")/) {
print "$1\n";
}
}

> 
> What about trying something like:
> 
>   $strval =~ m/^(( [^\\] | \\ (?! \\ (?= n)) | \\ (?! \") )*)/x;
> 
> which (or so I think ;0) collects as many characters from the beginning of
> $str that
> meet these conditions (bs = backslash):
> =
>1) not a bs
> or 2) a bs that is *not* followed by a ( bs that *is* followed by a n )
> or 3) a bs that is *not* followed by a "
> =

-- 
  ,-/-  __  _  _ $Bill LuebkertMailto:[EMAIL PROTECTED]
 (_/   /  )// //   DBE CollectiblesMailto:[EMAIL PROTECTED]
  / ) /--<  o // //  Castle of Medieval Myth & Magic http://www.todbe.com/
-/-' /___/_<_http://dbecoll.tripod.com/ (My Perl/Lakers stuff)
___
Perl-Win32-Users mailing list
[EMAIL PROTECTED]
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs


RE: regexp question

2004-12-14 Thread Christopher Hahn

$Bill,

The (?: ) construct may be non-capturing, but it does eat text from the
buffer (sic)

...and, besides, when I ran it I saw this:
=
1: asd asdf adf asd \n asd adf 
2: asd asdf adf asd \n asd adf 
3: asd asdf adf asd  asd adf 
=

where I need to see something like this:
=
1: asd asdf adf asd 
2: asd asdf adf asd \n asd adf 
3: asd asdf adf asd  asd adf 
4: asd asdf adf asd  asd adf " ad dasf \n dsaf 
=

i.e. \n should pass through, where \\n or \" should not.


What about trying something like:

  $strval =~ m/^(( [^\\] | \\ (?! \\ (?= n)) | \\ (?! \") )*)/x;

which (or so I think ;0) collects as many characters from the beginning of
$str that
meet these conditions (bs = backslash):
=
   1) not a bs
or 2) a bs that is *not* followed by a ( bs that *is* followed by a n )
or 3) a bs that is *not* followed by a "
=

Am I making any sense?  ;0)

Thank you for taking the time in any case!

Christopher

-Original Message-
From: $Bill Luebkert [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, December 14, 2004 1:41 PM
To: Christopher Hahn
Cc: [EMAIL PROTECTED]
Subject: Re: regexp question

Christopher Hahn wrote:

> Hey,
> 
> I seem to be missing a piece of the puzzleI want to define a 
> character class ([]) with atoms to (not) match involve negative 
> look-ahead assertions.but no joy.
> 
> I am reading a stream of text that may contain the two segments \\n and \"
> 
> I want to define a regexp that will match up to the first of either of 
> these.
> 
> ...ie. something like ([^]*) where the character class is just the two 
> sequences above.
> 
> ...but they are not characters at all, but strings, and so I wonder 
> how to approach this.
> 
> Question: how best to do something to set
> 
>$1 == every character in the string up to and not including the 
> first of either a \\n or a \"
> 
> That is all. (something like $strval =~ m/ (.* (?! \\ (?= \" | \\ (?= 
> n) ) )
> )/x;)
> 
> I am going to use this regexp in a Parse::RecDescent Production, and 
> have other Rules to deal with the \\n and \" strings.
> 
> I am banging on this and will report when something good comes out of 
> it, but please do chime in with any "best practices" that suggest 
> themselves to you.

I assume that \\n is actually 3 characters and not a newline.
It should be as simple as :

use strict;

foreach (
  '1: asd asdf adf asd \\n asd adf \" ad dasf dsaf ',
  '2: asd asdf adf asd \n asd adf \" ad dasf dsaf ',
  '3: asd asdf adf asd  asd adf \" ad dasf \\n dsaf ',
  '4: asd asdf adf asd  asd adf " ad dasf \n dsaf ',
  ) {
if (/^(.*)(?:n|\\")/) {
print "$1\n";
}
}

__END__


-- 
  ,-/-  __  _  _ $Bill LuebkertMailto:[EMAIL PROTECTED]
 (_/   /  )// //   DBE CollectiblesMailto:[EMAIL PROTECTED]
  / ) /--<  o // //  Castle of Medieval Myth & Magic
http://www.todbe.com/
-/-' /___/_<_http://dbecoll.tripod.com/ (My Perl/Lakers stuff)
___
Perl-Win32-Users mailing list
[EMAIL PROTECTED]
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs


Re: regexp question

2004-12-14 Thread $Bill Luebkert
Christopher Hahn wrote:

> Hey,
> 
> I seem to be missing a piece of the puzzleI want to define a character
> class ([])
> with atoms to (not) match involve negative look-ahead assertions.but no
> joy.
> 
> I am reading a stream of text that may contain the two segments \\n and \"
> 
> I want to define a regexp that will match up to the first of either of
> these.
> 
> ...ie. something like ([^]*) where the character class is just the two
> sequences above.
> 
> ...but they are not characters at all, but strings, and so I wonder how to
> approach this.
> 
> Question: how best to do something to set 
> 
>$1 == every character in the string up to and not including the first of
> either a \\n or a \"
> 
> That is all. (something like $strval =~ m/ (.* (?! \\ (?= \" | \\ (?= n) ) )
> )/x;)
> 
> I am going to use this regexp in a Parse::RecDescent Production, and have
> other Rules to 
> deal with the \\n and \" strings.
> 
> I am banging on this and will report when something good comes out of it,
> but please do
> chime in with any "best practices" that suggest themselves to you.  

I assume that \\n is actually 3 characters and not a newline.
It should be as simple as :

use strict;

foreach (
  '1: asd asdf adf asd \\n asd adf \" ad dasf dsaf ',
  '2: asd asdf adf asd \n asd adf \" ad dasf dsaf ',
  '3: asd asdf adf asd  asd adf \" ad dasf \\n dsaf ',
  '4: asd asdf adf asd  asd adf " ad dasf \n dsaf ',
  ) {
if (/^(.*)(?:n|\\")/) {
print "$1\n";
}
}

__END__


-- 
  ,-/-  __  _  _ $Bill LuebkertMailto:[EMAIL PROTECTED]
 (_/   /  )// //   DBE CollectiblesMailto:[EMAIL PROTECTED]
  / ) /--<  o // //  Castle of Medieval Myth & Magic http://www.todbe.com/
-/-' /___/_<_http://dbecoll.tripod.com/ (My Perl/Lakers stuff)
___
Perl-Win32-Users mailing list
[EMAIL PROTECTED]
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs


regexp question

2004-12-14 Thread Christopher Hahn

Hey,

I seem to be missing a piece of the puzzleI want to define a character
class ([])
with atoms to (not) match involve negative look-ahead assertions.but no
joy.

I am reading a stream of text that may contain the two segments \\n and \"

I want to define a regexp that will match up to the first of either of
these.

...ie. something like ([^]*) where the character class is just the two
sequences above.

...but they are not characters at all, but strings, and so I wonder how to
approach this.

Question: how best to do something to set 

   $1 == every character in the string up to and not including the first of
either a \\n or a \"

That is all. (something like $strval =~ m/ (.* (?! \\ (?= \" | \\ (?= n) ) )
)/x;)

I am going to use this regexp in a Parse::RecDescent Production, and have
other Rules to 
deal with the \\n and \" strings.

I am banging on this and will report when something good comes out of it,
but please do
chime in with any "best practices" that suggest themselves to you.  

TIA!

Christopher

--
Realisant mon espoir, je me lance vers la gloire
Christopher Kenneth Hahn -- [EMAIL PROTECTED]
___
Perl-Win32-Users mailing list
[EMAIL PROTECTED]
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs


Re: RegExp Question

2001-02-05 Thread Philip Newton

John Giordano wrote:
> $grep_deferred = system ('findstr DeferredStatus response1');
> 
> print "$grep_deferred\n\n";
[snip]
> $grep_deferred has this in it:
> 
>  src="/images/btnstats.gif" width=120 height=40 border=0 alt="Mail
> Status">

Are you sure? Try printing out "bloop\nblip\n$grep_deferred\nblep\n" and see
whether this  line comes between "blip" and "blep".

>From `perldoc -f system`:

 The return value is the exit status of the program as returned
 by the `wait' call. To get the actual exit value divide by 256.

So if $grep_deferred is the return value from system, it's probably a
number. The line you're seeing on the screen is presumably the output of
findstr, which went to STDOUT; since you didn't redirect STDOUT, it went the
same place your print went.

You probably want `` (backticks).

ObTMTOWTDI: if you're looking for a fixed substring such as ' sub { $a = index '', ' sub { $a = index '[a
href="/c9410ee04de9845704db8951dfde015b/DeferredStatus">', ' sub { $a = '' =~ / sub { $a = '[a
href="/c9410ee04de9845704db8951dfde015b/DeferredStatus">' =~ / sub { $a = index 'fhuhge8a goija fgoja w04gua
roigj oöijf g9a8u onigö lfsdkj gölija osij ga0 jügojar öoigj öosijd ü08g9u
gaji fäöigjah+w r0g9j aäüjäfpoiajs öoijfd +ajsd fäüpioaj öwoeijf öoasuidhf
p9iawh üefhaj woeijg awopihg apiuehrg üaiowj eoöfigjaw ejif', ' sub { $a = index 'fhuhge8a goija fgoja w04gua
roigj oöijf g9a8u onigö lfsdkj gölija osij ga0 jügojar öoigj öosijd ü08g9u
gaji fäöigjah+w r0g9j aäüjäfpoiajs öoijfd +ajsd fäüpioaj öwoeijf öoasuidhf
p9iawh üefhaj woeijg awopihg apiuehrg üaiowj eoöfigjaw ejif[a
href="/c9410ee04de9845704db8951dfde015b/DeferredStatus">', ' sub { $a = 'fhuhge8a goija fgoja w04gua roigj
oöijf g9a8u onigö lfsdkj gölija osij ga0 jügojar öoigj öosijd ü08g9u gaji
fäöigjah+w r0g9j aäüjäfpoiajs öoijfd +ajsd fäüpioaj öwoeijf öoasuidhf p9iawh
üefhaj woeijg awopihg apiuehrg üaiowj eoöfigjaw ejif' =~ / sub { $a = 'fhuhge8a goija fgoja w04gua roigj
oöijf g9a8u onigö lfsdkj gölija osij ga0 jügojar öoigj öosijd ü08g9u gaji
fäöigjah+w r0g9j aäüjäfpoiajs öoijfd +ajsd fäüpioaj öwoeijf öoasuidhf p9iawh
üefhaj woeijg awopihg apiuehrg üaiowj eoöfigjaw ejif[a
href="/c9410ee04de9845704db8951dfde015b/DeferredStatus">' =~ / sub { $a = 1; },
});

__END__

Output:
Benchmark: timing 500 iterations of assign, index.end.found,
index.end.notfound, index.start.found, inde
x.start.notfound, regex.end.found, regex.end.notfound, regex.start.found,
regex.start.notfound...
assign:  0 wallclock secs ( 1.02 usr +  0.00 sys =  1.02 CPU) @
4897159.65/s (n=500)
index.end.found:  9 wallclock secs ( 9.79 usr +  0.00 sys =  9.79 CPU) @
510516.64/s (n=500)
index.end.notfound: 12 wallclock secs (12.38 usr +  0.00 sys = 12.38 CPU) @
404007.76/s (n=500)
index.start.found:  3 wallclock secs ( 2.49 usr +  0.00 sys =  2.49 CPU) @
2004811.55/s (n=500)
index.start.notfound:  5 wallclock secs ( 5.42 usr +  0.00 sys =  5.42 CPU)
@ 922849.76/s (n=500)
regex.end.found: 12 wallclock secs (11.42 usr +  0.00 sys = 11.42 CPU) @
437943.42/s (n=500)
regex.end.notfound: 13 wallclock secs (13.34 usr +  0.00 sys = 13.34 CPU) @
374840.69/s (n=500)
regex.start.found:  5 wallclock secs ( 3.91 usr +  0.00 sys =  3.91 CPU) @
1280081.93/s (n=500)
regex.start.notfound:  7 wallclock secs ( 6.38 usr +  0.00 sys =  6.38 CPU)
@ 783821.92/s (n=500)
  Rate regex.end.notfound index.end.notfound
regex.end.found index.end.found regex.start.notfound index.start.notfound
regex.start.found index.start.found assign
regex.end.notfound374841/s ---7%
-14%-27% -52% -59%  -71%
-81%   -92%
index.end.notfound404008/s 8% --
-8%-21% -48% -56%  -68%
-80%   -92%
regex.end.found   437943/s17% 8%
---14% -44% -53%  -66%
-78%   -91%
index.end.found   510517/s36%26%
17%  -- -35% -45%  -60%
-75%   -90%
regex.start.notfound  783822/s   109%94%
79% 54%   -- -15%  -39%
-61%   -84%
index.start.notfound  922850/s   146%   128%
111% 81%  18%   --  -28%
-54%   -81%
regex.start.found1280082/s   242%   217%
192%151%  63%  39%--
-36%   -74%
index.start.found2004812/s   435%   396%
358%293% 156% 117%   57%
--   -59%
assign   4897160/s  1206%  1112%
1018%859% 525% 431%  283%
144% --
___
Perl-Win32-

Re: RegExp Question

2001-02-04 Thread $Bill Luebkert

John Giordano wrote:
> 
> Hello,
> 
> Could someone please tell me why this:
> 
> ###
> $grep_deferred = system ('findstr DeferredStatus response1');
> 
> print "$grep_deferred\n\n";
> 
> if ($grep_deferred =~ / 
> print "It contains  } else {
> 
> print "It doesn't contain  }
> -
> doesn't find this:
> 
>  
> The above code may need some further explanation.  I am trying to extract
> the  
> $grep_deferred has this in it:
> 
>  src="/images/btnstats.gif" width=120 height=40 border=0 alt="Mail
> Status">
> 
> < doesn't need a backslash in front of it right?

Correct, it doesn't.  This test snippet works fine for me (I modified 
your RE, but it works your way too):

use strict;

my $grep_deferred = 
  q{} . 
  q{};


print "$grep_deferred\n\n";

if ($grep_deferred =~ /http://www.todbe.com/
  / ) /--<  o // //  Mailto:[EMAIL PROTECTED] http://dbecoll.webjump.com/
-/-' /___/_<_http://www.freeyellow.com/members/dbecoll/
___
Perl-Win32-Users mailing list
[EMAIL PROTECTED]
http://listserv.ActiveState.com/mailman/listinfo/perl-win32-users