Re: multiple named captures with a single regexp

2017-03-01 Thread Chas. Owens
/(\w+)/g gets the command as well and only the args are wanted, so it would
need to be

my @args = $s =~ / (\w+)/g;
shift @args;

also,

my VAR if TEST;

is deprecated IIRC and slated to be removed soon (as it's behavior is
surprising).  It would probably be better to say

my @args = $s =~ /^\w+\s/ && $s =~ /(?:\s+(\w+))/g;

or (if you don't like using && like that)

my @args = $s =~ /^\w+\s/ ? $s =~ /(?:\s+(\w+))/g : ();



On Wed, Mar 1, 2017 at 9:34 AM X Dungeness  wrote:

> On Wed, Mar 1, 2017 at 2:52 AM, Chas. Owens  wrote:
> > Sadly, Perl will only capture the last match of capture with a
> qualifier, so
> > that just won't work.  The split function really is the simplest and most
> > elegant solution for this sort of problem (you have a string with a
> > delimiter and you want the pieces).  All of that said, if you are
> willing to
> > modify the regex you can say
> >
> > my $s = "command arg1 arg2 arg3 arg4";
> > my @args = $s =~ /(?:\s+(\w+))/g;
> >
>
> Hm, I'd write it as:
>  my @args = $s =~ / (\w+)/g;
>
> or, if the command check isn't too inelegant:
>
>  my @args =  $s =~ / (\w+)/g if $str =~ /^command\s/;
>
>
> > for my $arg (@args) {
> > print "$arg\n";
> > }
> >
> > However, this does not allow you to check the command is correct.
> >
>


Re: multiple named captures with a single regexp

2017-03-01 Thread X Dungeness
On Wed, Mar 1, 2017 at 2:52 AM, Chas. Owens  wrote:
> Sadly, Perl will only capture the last match of capture with a qualifier, so
> that just won't work.  The split function really is the simplest and most
> elegant solution for this sort of problem (you have a string with a
> delimiter and you want the pieces).  All of that said, if you are willing to
> modify the regex you can say
>
> my $s = "command arg1 arg2 arg3 arg4";
> my @args = $s =~ /(?:\s+(\w+))/g;
>

Hm, I'd write it as:
 my @args = $s =~ / (\w+)/g;

or, if the command check isn't too inelegant:

 my @args =  $s =~ / (\w+)/g if $str =~ /^command\s/;


> for my $arg (@args) {
> print "$arg\n";
> }
>
> However, this does not allow you to check the command is correct.
>

-- 
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/




Re: multiple named captures with a single regexp

2017-03-01 Thread Chas. Owens
Sadly, Perl will only capture the last match of capture with a qualifier,
so that just won't work.  The split function really is the simplest and
most elegant solution for this sort of problem (you have a string with a
delimiter and you want the pieces).  All of that said, if you are willing
to modify the regex you can say

my $s = "command arg1 arg2 arg3 arg4";
my @args = $s =~ /(?:\s+(\w+))/g;

for my $arg (@args) {
print "$arg\n";
}

However, this does not allow you to check the command is correct.

Another option, and I would in no way claim this is an elegant solution, is
to use code execution in the middle of the regex with (?{}) to pull out the
matched fields:

@args = ();

my $start;
$s =~ m{
\w+ # command
\s
(?{$start = pos;}) # capture the first start position
(?:
\w+ # the argument
# capture the argument
(?{ push @args, substr $s, $start, pos() - $start; })
# optional delimiter and capture the next start
(?: \s+ (?{ $start = pos; }))?
)+
}x;

for my $arg (@args) {
print "$arg\n";
}

Of course, all of these solutions are bound to fail when you hit the real
world (assuming the command is a Unix command) as arguments are allowed to
have spaces in them if they are quoted.  There is a way to do this with
regex, but balancing the quotes is far more pain than it is worth.  A
simple regex to tokenize the string plus some logic to put the quoted
sections back together will allow you to extract the arguments from the
string:

#!/usr/bin/perl

use strict;
use warnings;

my $s = qq("command with space" arg1 "arg 2" "arg3");

my @parts = $s =~ /([ ]+|"|\w+)/g;

my @args;
my $in_string = 0;
my $buf = "";
while (@parts) {
my $part = shift @parts;

# ditch the delimiters if not in a string
next if not $in_string and $part =~ / /;

# in strings, a " means end the string
# otherwise, just build up a buffer of the things
# in the string
if ($in_string) {
if ($part eq '"') {
$in_string = 0;
push @args, $buf;
$buf = "";
} else {
$buf .= $part;
}
next;
}

# if not in a string, " means start a string
if ($part eq '"') {
$in_string = 1;
next;
}

# if not a delimiter or a ", then this is just a normal token
push @args, $part;
}

shift @args; #ditch the command

for my $arg (@args) {
print "$arg\n";
}

Of course, this still doesn't handle Unix commands properly as you can
escape " and use ' to create strings, but those details are left as an
exercise for the reader.




On Wed, Mar 1, 2017 at 4:04 AM Luca Ferrari <fluca1...@infinito.it> wrote:

> Hi all,
> I'm not sure if this is possible, but imagine I've got a line as follows:
>
> command arg1 arg2 arg3 arg4 ...
>
> I would like to capture all args with a single regexp, possibly with a
> named capture, but I don't know exactly how to do:
>
> my $re = qr/command\s+(?\w+)+/;
>
> the above of course is going to capture only the first one (one shoot)
> or the last one within a loop.
> How can I extract the whole array of arguments?
>
> Please note, a raw solution is to remove the command and split, but
> I'm asking for a more elegant solution.
>
> Thanks,
> Luca
>
> --
> To unsubscribe, e-mail: beginners-unsubscr...@perl.org
> For additional commands, e-mail: beginners-h...@perl.org
> http://learn.perl.org/
>
>
>


Re: multiple named captures with a single regexp

2017-03-01 Thread Shlomi Fish
Hi Luca,

On Wed, 1 Mar 2017 10:01:34 +0100
Luca Ferrari <fluca1...@infinito.it> wrote:

> Hi all,
> I'm not sure if this is possible, but imagine I've got a line as follows:
> 
> command arg1 arg2 arg3 arg4 ...
> 
> I would like to capture all args with a single regexp, possibly with a
> named capture, but I don't know exactly how to do:
> 
> my $re = qr/command\s+(?\w+)+/;
> 
> the above of course is going to capture only the first one (one shoot)
> or the last one within a loop.
> How can I extract the whole array of arguments?
> 

Perhaps try using \G and the /g and possibly /o flags , see:

http://perl-begin.org/uses/text-parsing/

(Note that perl-begin is a site that I maintain).

Regards,

Shlomi Fish


> Please note, a raw solution is to remove the command and split, but
> I'm asking for a more elegant solution.
> 
> Thanks,
> Luca
> 



-- 
-
Shlomi Fish   http://www.shlomifish.org/
Freecell Solver - http://fc-solve.shlomifish.org/

It is a good idea to stop worrying about problems (or “problems” in quotes)
that cannot be fixed.

--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/




multiple named captures with a single regexp

2017-03-01 Thread Luca Ferrari
Hi all,
I'm not sure if this is possible, but imagine I've got a line as follows:

command arg1 arg2 arg3 arg4 ...

I would like to capture all args with a single regexp, possibly with a
named capture, but I don't know exactly how to do:

my $re = qr/command\s+(?\w+)+/;

the above of course is going to capture only the first one (one shoot)
or the last one within a loop.
How can I extract the whole array of arguments?

Please note, a raw solution is to remove the command and split, but
I'm asking for a more elegant solution.

Thanks,
Luca

-- 
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/




Re: Regexp under PERL

2015-07-08 Thread Kent Fredric
On 8 July 2015 at 19:12, Nagy Tamas (TVI-GmbH) tamas.n...@tvi-gmbh.de wrote:
 This is the code:

 } elsif (defined($row)  ($row =~ m/\(\*[ ]+\\@PATH\[ ]+:=[ 
 ]+'(\/)?([\*A-Za-z_ ]*(\/)?)+'[ ]\*\)?/)) {
  # PATH  first version:   \(\*[ ]+@PATH[ ]+:=[ ]+'(\\/)?([\*A-Za-z_ 
 ]*(\\/)?)+'[ ]\*\)?

  my @path = split(':=', $row, 2);
  $temppath = $path[1];
  my trimmedpath = split(''', $temppath, 3);

  $currentpath = trimmedpath[1];

 The last )) ist he closing of the elsif. Sorry. Still no idea.

 Tamas Nagy

Again, you're just bolting stuff together in the email client thinking
its the code. There's no way that can work. The most obvious here you
have three quote marks in split() meaning everything after that is
nonsense.

Then you use variables without sigils ( which is also nonsense under strict )

And you entirely forget to declare variables ( again, nonsense under strict ).

When you eliminate all those superficial defects, the code has no
bugs, and executes silently without so much as a squeak.

Attached is what I have, and it doesn't replicate the problem.

-- 
Kent

KENTNL - https://metacpan.org/author/KENTNL


x.pl
Description: Perl program
-- 
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/


AW: Regexp under PERL

2015-07-08 Thread Nagy Tamas (TVI-GmbH)
Hi,

This is the code:

} elsif (defined($row)  ($row =~ m/\(\*[ ]+\\@PATH\[ ]+:=[ 
]+'(\/)?([\*A-Za-z_ ]*(\/)?)+'[ ]\*\)?/)) {
 # PATH  first version:   \(\*[ ]+@PATH[ ]+:=[ ]+'(\\/)?([\*A-Za-z_ 
]*(\\/)?)+'[ ]\*\)?
 
 my @path = split(':=', $row, 2);
 $temppath = $path[1];
 my trimmedpath = split(''', $temppath, 3);
 
 $currentpath = trimmedpath[1];

The last )) ist he closing of the elsif. Sorry. Still no idea.

Tamas Nagy

 
 

-Ursprüngliche Nachricht-
Von: Kent Fredric [mailto:kentfred...@gmail.com] 
Gesendet: Dienstag, 7. Juli 2015 19:03
An: Nagy Tamas (TVI-GmbH)
Cc: beginners@perl.org
Betreff: Re: Regexp under PERL

On 8 July 2015 at 04:40, Nagy Tamas (TVI-GmbH) tamas.n...@tvi-gmbh.de wrote:
 m/\(\*[ ]+\\@PATH\[ ]+:=[ ]+'(\/)?([\*A-Za-z_ ]*(\/)?)+'[ ]\*\)?/))

This is not the exact code you 're using obviously, because the last 2 ) 
marks are actually outside the regex.

Removing those ))'s makes the regex compile just fine.

So we need the code, not just the regex.

Ideally, if you can give some perl code that is minimal that replicates your 
problem exactly, then that would be very helpful in us helping you.

Ideally, your code should be reduced as far as possible till you have the least 
possible amount of code that demonstrates your problem.

Additional notes:  Values in @PATH are not relevant to your expression, because 
you explicitly escape the @ to mean a literal @.
If you did not escape it, it would have interpolated.

But even then, I'd still have no idea what you are doing :)

--
Kent

KENTNL - https://metacpan.org/author/KENTNL


Regexp under PERL

2015-07-07 Thread Nagy Tamas (TVI-GmbH)
Hi,

PERL shows this line ok, but for the next lines it tells: String found where 
operator expected at line...

m/\(\*[ ]+\\@PATH\[ ]+:=[ ]+'(\/)?([\*A-Za-z_ ]*(\/)?)+'[ ]\*\)?/))

So it seems that it is not ok.

I have the proper regexp that was tested at  http://www.regexr.com/

# Tested version:   \(\*[ ]+@PATH[ ]+:=[ ]+'(\\/)?([\*A-Za-z_ ]*(\\/)?)+'[ 
]\*\)?

Input data:

(* @PATH := '\/ph\/** Forest\/Apple' *)
(* @PATH := '\/ph\/** Forest\/Pear' *)
(* @PATH := '\/ph\/** Forest\/Tree\/Plum' *)
(* @PATH := '\/ph\/** Forest\/Oaktree\/Oak' *)

If I use the tested version, it tells: Unmatched ( in regex; marked by -- HERE 
in
m/..:=[ ]+'(  -- HERE at . line

Tamas



Re: Regexp under PERL

2015-07-07 Thread Kent Fredric
On 8 July 2015 at 04:40, Nagy Tamas (TVI-GmbH) tamas.n...@tvi-gmbh.de wrote:
 m/\(\*[ ]+\\@PATH\[ ]+:=[ ]+'(\/)?([\*A-Za-z_ ]*(\/)?)+'[ ]\*\)?/))

This is not the exact code you 're using obviously, because the last 2
) marks are actually outside the regex.

Removing those ))'s makes the regex compile just fine.

So we need the code, not just the regex.

Ideally, if you can give some perl code that is minimal that
replicates your problem exactly, then that would be very helpful in us
helping you.

Ideally, your code should be reduced as far as possible till you have
the least possible amount of code that demonstrates your problem.

Additional notes:  Values in @PATH are not relevant to your
expression, because you explicitly escape the @ to mean a literal @.
If you did not escape it, it would have interpolated.

But even then, I'd still have no idea what you are doing :)

-- 
Kent

KENTNL - https://metacpan.org/author/KENTNL

-- 
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/




Re: is faster a regexp with multiple choices or a single one with lower case?

2015-01-08 Thread Luca Ferrari
Hi Bill,

On Thu, Jan 8, 2015 at 1:36 AM, $Bill n...@todbe.com wrote:
 Why not just ignore the case ?

Sure it's an option.

 Why does the script care what the case is ?  Is there a rationale for
 checking it ?

Of course there's, and of course my script does different things
depending on what I'm looking at.
I have just posted a short example to discuss about regular
expressions, not about the particular case in my script (that is, by
the way, quite simple).

Thanks,
Luca

-- 
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/




Re: is faster a regexp with multiple choices or a single one with lower case?

2015-01-08 Thread David Precious
On Wed, 7 Jan 2015 10:59:07 +0200
Shlomi Fish shlo...@shlomifish.org wrote:
 Anyway, one can use the Benchmark.pm module to determine which
 alternative is faster, but I suspect their speeds are not going to be
 that much different. See:
 
 http://perl-begin.org/topics/optimising-and-profiling/
 
 (Note: perl-begin.org is a site I originated and maintain).

And this is the answer I'd give - if you're curious as to which of two
approaches will be faster, benchmark it and find out.  It's often
better to do this yourself, as the results may in some cases vary widely
depending on the system you're running it on, the perl version, how
Perl was built, etc.

The sure-fire way to see which of multiple options is faster is to use
Benchmark.pm to try them and find out :)

For an example, I used the following (dirty) short script to set up
1,000 test filenames with random lengths and capitalisation, half of
which should match the pattern, and testing each approach against all
of those test filenames, 10,000 times:


[davidp@supernova:~]$ cat tmp/benchmark_lc.pl 
#!/usr/bin/perl

use strict;
use Benchmark;

# Put together an array of various test strings, with random
# lengths and case
my @valid_chars = ('a'..'z', 'A'..'Z');
my @test_data = map { 
join('', map { $valid_chars[int rand @valid_chars] } 1..rand(10))
. (rand  0.5 ? '.bat' : '.bar')
} (1..1000);

Benchmark::cmpthese(10_000,
{
lc_first = sub {
for my $string (@test_data) {
$string = lc $string;
if ($string =~ /\.bat$/) {
}
}
},
regex_nocase = sub {
for my $string (@test_data) {
if ($string =~ /\.bat$/i) {
}
}
},
},
);




And my results suggest that, for me, using lc() on the string first
before attempting to match was around 30% faster:


[davidp@supernova:~]$ perl tmp/benchmark_lc.pl 
   Rate regex_nocase lc_first
regex_nocase 2674/s   -- -24%
lc_first 3509/s  31%   --


Of course, YMMV.



-- 
David Precious (bigpresh) dav...@preshweb.co.uk
http://www.preshweb.co.uk/ www.preshweb.co.uk/twitter
www.preshweb.co.uk/linkedinwww.preshweb.co.uk/facebook
www.preshweb.co.uk/cpanwww.preshweb.co.uk/github



-- 
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/




Re: is faster a regexp with multiple choices or a single one with lower case?

2015-01-07 Thread Shlomi Fish
On Wed, 7 Jan 2015 07:56:18 +
Andrew Solomon and...@geekuni.com wrote:

 Hi Luca,
 
 I haven't tested it, but my suspicion is that your first solution will
 be faster because regular expressions (which don't contain variables)
 are only compiled once, while you have a function call for every use
 of lc.
 
 By the way another alternative might be:
 
 $extention =~ /\.bat/i
 
 (which would also match BaT, BAt...)
 

The second code excerpt that was given will also match all that:

«
$extension = lc $extension;
$extension =~ / \.bat /x;
»

Anyway, one can use the Benchmark.pm module to determine which alternative is
faster, but I suspect their speeds are not going to be that much different. See:

http://perl-begin.org/topics/optimising-and-profiling/

(Note: perl-begin.org is a site I originated and maintain).

Regards,

Shlomi Fish

-- 
-
Shlomi Fish   http://www.shlomifish.org/
Perl Humour - http://perl-begin.org/humour/

John: Hey, we are completely non-violent vampires. We don’t suck blood.
Selina: I thought all vampires suck blood.
John: Bullocks, hen. Vampires come in all shapes and sizes.
— http://www.shlomifish.org/humour/Selina-Mandrake/

Please reply to list if it's a mailing list post - http://shlom.in/reply .

--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/




is faster a regexp with multiple choices or a single one with lower case?

2015-01-06 Thread Luca Ferrari
Hi all,
this could be trivial, and I suspect the answer is that the regexp
engine is smart enough, but suppose I want to test the following:

$extention =~ / \.bat | \.BAT /x;

is the following a better solution?

$extension = lc $extension;
$extension =~ / \.bat /x;

In other words, when testing for all-lower or all-upper cases should I
first trasnform to one of them or use a regexp with alternatives?
Any suggestion?

Thanks,
Luca

-- 
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/




Re: is faster a regexp with multiple choices or a single one with lower case?

2015-01-06 Thread Andrew Solomon
Hi Luca,

I haven't tested it, but my suspicion is that your first solution will
be faster because regular expressions (which don't contain variables)
are only compiled once, while you have a function call for every use
of lc.

By the way another alternative might be:

$extention =~ /\.bat/i

(which would also match BaT, BAt...)

Andrew

On Wed, Jan 7, 2015 at 7:45 AM, Luca Ferrari fluca1...@infinito.it wrote:
 Hi all,
 this could be trivial, and I suspect the answer is that the regexp
 engine is smart enough, but suppose I want to test the following:

 $extention =~ / \.bat | \.BAT /x;

 is the following a better solution?

 $extension = lc $extension;
 $extension =~ / \.bat /x;

 In other words, when testing for all-lower or all-upper cases should I
 first trasnform to one of them or use a regexp with alternatives?
 Any suggestion?

 Thanks,
 Luca

 --
 To unsubscribe, e-mail: beginners-unsubscr...@perl.org
 For additional commands, e-mail: beginners-h...@perl.org
 http://learn.perl.org/





-- 
Andrew Solomon

Mentor@Geekuni http://geekuni.com/
http://www.linkedin.com/in/asolomon

-- 
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/




RegExp

2014-03-08 Thread rakesh sharma
Hi all,
how do you get all words starting with letter 'r' in a string.
thanks,rakesh
  

Re: RegExp

2014-03-08 Thread Shawn H Corey
On Sat, 8 Mar 2014 18:20:48 +0530
rakesh sharma rakeshsharm...@hotmail.com wrote:

 Hi all,
 how do you get all words starting with letter 'r' in a string.
 thanks,rakesh
 

/\br/


-- 
Don't stop where the ink does.
Shawn

-- 
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/




Re: RegExp

2014-03-08 Thread Shlomi Fish
Hello Rakesh,

On Sat, 8 Mar 2014 18:20:48 +0530
rakesh sharma rakeshsharm...@hotmail.com wrote:

 Hi all,
 how do you get all words starting with letter 'r' in a string.
 thanks,rakesh
 

1. Find all words in the sentence. Your idea of what is a word will need to be
specified.

2. Put them in an array - let's say @words.

3. Use « grep { /\Ar/i } @words » . See: 

* http://perldoc.perl.org/functions/grep.html

* https://metacpan.org/pod/List::MoreUtils

* https://metacpan.org/pod/List::Util

Regards,

— Shlomi Fish

-- 
-
Shlomi Fish   http://www.shlomifish.org/
Escape from GNU Autohell - http://www.shlomifish.org/open-source/anti/autohell/

There is an IGLU Cabal, but its only purpose is to deny the existence of an
IGLU Cabal.
— Martha Greenberg

Please reply to list if it's a mailing list post - http://shlom.in/reply .

--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/




Re: RegExp

2014-03-08 Thread Janek Schleicher

Am 08.03.2014 13:50, schrieb rakesh sharma:

how do you get all words starting with letter 'r' in a string.


What have you tried so far?


Greetings,
Janek


--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/




Re: RegExp

2014-03-08 Thread Jim Gibson

On Mar 8, 2014, at 4:50 AM, rakesh sharma rakeshsharm...@hotmail.com wrote:

 Hi all,
 
 how do you get all words starting with letter 'r' in a string.

Try

  my @rwords = $string =~ /\br\w*?\b/g;

-- 
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/




Re: regexp puzzle

2014-03-08 Thread Bill McCormick

On 3/8/2014 12:05 AM, Bill McCormick wrote:

I have the following string I want to extract from:

my $str = foo (3 bar): baz;

and I want to to extract to end up with

$p1 = foo;
$p2 = 3;
$p3 = baz;

the complication is that the \s(\d\s.+) is optional, so in then $p2 may
not be set.

getting close was

my ($p1, $p3) = $str =~ /^(.+):\s(.*)$/;

How can I make the  (3 bar) optional.



Here's what I came up with:

($key, $lines, $value) = $_ =~ /^(.+?)(?:\s\((\d)\s.+\))?:\s(.*)$/;


---
This email is free from viruses and malware because avast! Antivirus protection 
is active.
http://www.avast.com



--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/




regexp puzzle

2014-03-07 Thread Bill McCormick

I have the following string I want to extract from:

my $str = foo (3 bar): baz;

and I want to to extract to end up with

$p1 = foo;
$p2 = 3;
$p3 = baz;

the complication is that the \s(\d\s.+) is optional, so in then $p2 may 
not be set.


getting close was

my ($p1, $p3) = $str =~ /^(.+):\s(.*)$/;

How can I make the  (3 bar) optional.

Thanks!

---
This email is free from viruses and malware because avast! Antivirus protection 
is active.
http://www.avast.com



--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/




Re: regexp puzzle

2014-03-07 Thread shawn wilson
([^]+) \(([0-9]+).*\) ([a-z]+)
On Mar 8, 2014 1:07 AM, Bill McCormick wpmccorm...@gmail.com wrote:

 I have the following string I want to extract from:

 my $str = foo (3 bar): baz;

 and I want to to extract to end up with

 $p1 = foo;
 $p2 = 3;
 $p3 = baz;

 the complication is that the \s(\d\s.+) is optional, so in then $p2 may
 not be set.

 getting close was

 my ($p1, $p3) = $str =~ /^(.+):\s(.*)$/;

 How can I make the  (3 bar) optional.

 Thanks!

 ---
 This email is free from viruses and malware because avast! Antivirus
 protection is active.
 http://www.avast.com



 --
 To unsubscribe, e-mail: beginners-unsubscr...@perl.org
 For additional commands, e-mail: beginners-h...@perl.org
 http://learn.perl.org/





Re: regexp puzzle

2014-03-07 Thread Bill McCormick

On 3/8/2014 12:41 AM, shawn wilson wrote:

my $str = foo (3 bar): baz;


my $test = foo (3 bar): baz;
my ($p1, $p2, $p3) = $test =~ /([^]+) \(([0-9]+).*\) ([a-z]+)/;
print p1=[$p1] p2=[$p2] p3=[$p3]\n;

Use of uninitialized value $p1 in concatenation (.) or string at 
./lock_report.pl line 11.
Use of uninitialized value $p2 in concatenation (.) or string at 
./lock_report.pl line 11.
Use of uninitialized value $p3 in concatenation (.) or string at 
./lock_report.pl line 11.

p1=[] p2=[] p3=[]
P

---
This email is free from viruses and malware because avast! Antivirus protection 
is active.
http://www.avast.com



--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/




Re: regexp puzzle

2014-03-07 Thread shawn wilson
On Mar 8, 2014 1:41 AM, shawn wilson ag4ve...@gmail.com wrote:


Oh and per optional, just do (?:\([0-9]+).*\)?
You should probably use do
my @match = $str =~ / ([^]+)  (?:\([0-9]+).*\)? ([a-z]+)/;
my ($a, $b, $c) = (scalar(@match) == 3 ? @match : $match[0], undef,
$match[1]);

 ([^]+) \(([0-9]+).*\) ([a-z]+)

 On Mar 8, 2014 1:07 AM, Bill McCormick wpmccorm...@gmail.com wrote:

 I have the following string I want to extract from:

 my $str = foo (3 bar): baz;

 and I want to to extract to end up with

 $p1 = foo;
 $p2 = 3;
 $p3 = baz;

 the complication is that the \s(\d\s.+) is optional, so in then $p2 may
not be set.

 getting close was

 my ($p1, $p3) = $str =~ /^(.+):\s(.*)$/;

 How can I make the  (3 bar) optional.

 Thanks!

 ---
 This email is free from viruses and malware because avast! Antivirus
protection is active.
 http://www.avast.com



 --
 To unsubscribe, e-mail: beginners-unsubscr...@perl.org
 For additional commands, e-mail: beginners-h...@perl.org
 http://learn.perl.org/




Re: regexp puzzle

2014-03-07 Thread Jim Gibson

On Mar 7, 2014, at 10:05 PM, Bill McCormick wpmccorm...@gmail.com wrote:

 I have the following string I want to extract from:
 
 my $str = foo (3 bar): baz;
 
 and I want to to extract to end up with
 
 $p1 = foo;
 $p2 = 3;
 $p3 = baz;
 
 the complication is that the \s(\d\s.+) is optional, so in then $p2 may not 
 be set.
 
 getting close was
 
 my ($p1, $p3) = $str =~ /^(.+):\s(.*)$/;


You can make a substring optional by following it with the ? quantifier. If you 
substring is more than one character, you can group it with capturing 
parentheses or a non-capturing grouping construct (?: ).

Here is a sample, using the extended regular expression syntax with the x 
option:

my( $p1, $p2, $p3 ) = $str =~ m{ \A (\w+) \s+ (?: \( (\d+) \s+ \w+ \) )? : \s 
(\w+) }x;
if( $p1  $p3 ) {
print “p1=$p1, p2=$p2, p3=$p3\n”;
}else{
print “No match\n”;
}

Always test the returned values to see if the match succeeded.

So if '(3 bar)’ is not present, does the colon still remain? That will 
determine if the colon should be inside or outside the optional substring part.


--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/




perl regexp performance - architecture?

2014-02-17 Thread Phil Smith
I'm currently loading some new servers with CentOS6 on which perl5.10 is
the standard version of perl provided. However, I've also loaded perl5.18
and I don't think the version of perl is significant in the results I'm
seeing. Basically, I'm seeing perl performance significantly slower on my
new systems than on my 6 year old systems.

Here's some of the relevant details:

+ 6 year old server, 32 bit architecture, CentOS5 perl5.8
perl, and in particular regexp operations, perform reasonably fast.

+ Very new server, 64 bit architecture, CentOS6, perl5.10 (and have tried
perl5.18)
perl, and in particular regexp operations, perform significantly slower
than on the 6 year old server. That struck me as odd right off. I though
surely, perl running on a modern high-end cpu is going to beat out my code
running on 6 year old hardware.

I've compared CPU models at various CPU benchmarking sites and the new
CPUs, as you would expect, are ranked significantly higher in performance
than the old.

I've also installed perl5.8 on the new 64bit servers and the performance is
similar to that of perl5.10 and perl5.18 on the same 64bit servers. Given
that, I don't think perl version plays a significant factor is the
performance diffs.

Is it an accepted fact that perl performance takes a hit on 64 bit
architecture?

I've tried comparing some of the perl -V and Config.pm results looking for
significant differences. That output is pretty verbose and the most
significant difference is the architecture.

I could provide some of my benchmarking code if that would be of help. The
differences are significant. The only reason I'm looking at this is because
I could see right off that some of my code is taking 30-40% longer to run
in the new environment. Once I started putting in some timing
with Time::HiRes I could see the delay involved large amounts of regexp
processing.

Right now, I'm just looking for any opinions on what I'm seeing so that I
know the architecture is the significant factor in the performance
degradation and then consider any recommendations for improvements. I'm
happy to provide further relevant details.

Thanks,
Phil


Re: perl regexp performance - architecture?

2014-02-17 Thread Charles DeRykus
On Mon, Feb 17, 2014 at 12:41 PM, Phil Smith philbo...@gmail.com wrote:

 I'm currently loading some new servers with CentOS6 on which perl5.10 is
 the standard version of perl provided. However, I've also loaded perl5.18
 and I don't think the version of perl is significant in the results I'm
 seeing. Basically, I'm seeing perl performance significantly slower on my
 new systems than on my 6 year old systems.

 Here's some of the relevant details:

 + 6 year old server, 32 bit architecture, CentOS5 perl5.8
 perl, and in particular regexp operations, perform reasonably fast.

 + Very new server, 64 bit architecture, CentOS6, perl5.10 (and have tried
 perl5.18)
 perl, and in particular regexp operations, perform significantly slower
 than on the 6 year old server. That struck me as odd right off. I though
 surely, perl running on a modern high-end cpu is going to beat out my code
 running on 6 year old hardware.

 I've compared CPU models at various CPU benchmarking sites and the new
 CPUs, as you would expect, are ranked significantly higher in performance
 than the old.

 I've also installed perl5.8 on the new 64bit servers and the performance
 is similar to that of perl5.10 and perl5.18 on the same 64bit servers.
 Given that, I don't think perl version plays a significant factor is the
 performance diffs.

 Is it an accepted fact that perl performance takes a hit on 64 bit
 architecture?

 I've tried comparing some of the perl -V and Config.pm results looking for
 significant differences. That output is pretty verbose and the most
 significant difference is the architecture.

 I could provide some of my benchmarking code if that would be of help. The
 differences are significant. The only reason I'm looking at this is because
 I could see right off that some of my code is taking 30-40% longer to run
 in the new environment. Once I started putting in some timing
 with Time::HiRes I could see the delay involved large amounts of regexp
 processing.

 Right now, I'm just looking for any opinions on what I'm seeing so that I
 know the architecture is the significant factor in the performance
 degradation and then consider any recommendations for improvements. I'm
 happy to provide further relevant details.


This sounds like it  could be something OS-specific and, googling
CentOS regex performance generates hits, eg,



 http://pkgs.org/centos-5/puias-computational-x86_64/boost141-regex-1.4.0-2.el5.x86_64.rpm.html


HTH,
Charles DeRykus


Re: perl regexp performance - architecture?

2014-02-17 Thread Phil Smith
On Mon, Feb 17, 2014 at 6:16 PM, Charles DeRykus dery...@gmail.com wrote:


 On Mon, Feb 17, 2014 at 12:41 PM, Phil Smith philbo...@gmail.com wrote:

 I'm currently loading some new servers with CentOS6 on which perl5.10 is
 the standard version of perl provided. However, I've also loaded perl5.18
 and I don't think the version of perl is significant in the results I'm
 seeing. Basically, I'm seeing perl performance significantly slower on my
 new systems than on my 6 year old systems.

 Here's some of the relevant details:

 + 6 year old server, 32 bit architecture, CentOS5 perl5.8
 perl, and in particular regexp operations, perform reasonably fast.

 + Very new server, 64 bit architecture, CentOS6, perl5.10 (and have tried
 perl5.18)
 perl, and in particular regexp operations, perform significantly slower
 than on the 6 year old server. That struck me as odd right off. I though
 surely, perl running on a modern high-end cpu is going to beat out my code
 running on 6 year old hardware.

 I've compared CPU models at various CPU benchmarking sites and the new
 CPUs, as you would expect, are ranked significantly higher in performance
 than the old.

 I've also installed perl5.8 on the new 64bit servers and the performance
 is similar to that of perl5.10 and perl5.18 on the same 64bit servers.
 Given that, I don't think perl version plays a significant factor is the
 performance diffs.

 Is it an accepted fact that perl performance takes a hit on 64 bit
 architecture?

 I've tried comparing some of the perl -V and Config.pm results looking
 for significant differences. That output is pretty verbose and the most
 significant difference is the architecture.

 I could provide some of my benchmarking code if that would be of help.
 The differences are significant. The only reason I'm looking at this is
 because I could see right off that some of my code is taking 30-40% longer
 to run in the new environment. Once I started putting in some timing
 with Time::HiRes I could see the delay involved large amounts of regexp
 processing.

 Right now, I'm just looking for any opinions on what I'm seeing so that I
 know the architecture is the significant factor in the performance
 degradation and then consider any recommendations for improvements. I'm
 happy to provide further relevant details.


 This sounds like it  could be something OS-specific and, googling
 CentOS regex performance generates hits, eg,



 http://pkgs.org/centos-5/puias-computational-x86_64/boost141-regex-1.4.0-2.el5.x86_64.rpm.html


 No, I really don't think it is specific to a version of CentOS. I've
 installed various permutations of 32 and 64 bit CentOS 5 and 6. The better
 performance seems to follow the 32 bit architecture rather than a specific
 Perl version or CentOS version.


Phil







Fwd: perl regexp performance - architecture?

2014-02-17 Thread Charles DeRykus
On Mon, Feb 17, 2014 at 4:25 PM, Phil Smith philbo...@gmail.com wrote:

 On Mon, Feb 17, 2014 at 6:16 PM, Charles DeRykus dery...@gmail.comwrote:


 On Mon, Feb 17, 2014 at 12:41 PM, Phil Smith philbo...@gmail.com wrote:

 I'm currently loading some new servers with CentOS6 on which perl5.10 is
 the standard version of perl provided. However, I've also loaded perl5.18
 and I don't think the version of perl is significant in the results I'm
 seeing. Basically, I'm seeing perl performance significantly slower on my
 new systems than on my 6 year old systems.

 Here's some of the relevant details:

 + 6 year old server, 32 bit architecture, CentOS5 perl5.8
 perl, and in particular regexp operations, perform reasonably fast.

 + Very new server, 64 bit architecture, CentOS6, perl5.10 (and have
 tried perl5.18)
 perl, and in particular regexp operations, perform significantly slower
 than on the 6 year old server. That struck me as odd right off. I though
 surely, perl running on a modern high-end cpu is going to beat out my code
 running on 6 year old hardware.

 I've compared CPU models at various CPU benchmarking sites and the new
 CPUs, as you would expect, are ranked significantly higher in performance
 than the old.

 I've also installed perl5.8 on the new 64bit servers and the performance
 is similar to that of perl5.10 and perl5.18 on the same 64bit servers.
 Given that, I don't think perl version plays a significant factor is the
 performance diffs.

 Is it an accepted fact that perl performance takes a hit on 64 bit
 architecture?

 I've tried comparing some of the perl -V and Config.pm results looking
 for significant differences. That output is pretty verbose and the most
 significant difference is the architecture.

 I could provide some of my benchmarking code if that would be of help.
 The differences are significant. The only reason I'm looking at this is
 because I could see right off that some of my code is taking 30-40% longer
 to run in the new environment. Once I started putting in some timing
 with Time::HiRes I could see the delay involved large amounts of regexp
 processing.

 Right now, I'm just looking for any opinions on what I'm seeing so that
 I know the architecture is the significant factor in the performance
 degradation and then consider any recommendations for improvements. I'm
 happy to provide further relevant details.


 This sounds like it  could be something OS-specific and, googling
 CentOS regex performance generates hits, eg,



 http://pkgs.org/centos-5/puias-computational-x86_64/boost141-regex-1.4.0-2.el5.x86_64.rpm.html


 No, I really don't think it is specific to a version of CentOS. I've
 installed various permutations of 32 and 64 bit CentOS 5 and 6. The better
 performance seems to follow the 32 bit architecture rather than a specific
 Perl version or CentOS version.


Newer perl regex engines have added Unicode support which can
add drag. I'd be surprised though if just the 64-bit architecture itself
was totally responsible for major slowdowns.  Some of the issues are
mentioned here:

http://stackoverflow.com/questions/17800112/upgraded-from-perl-5-8-32bit-to-5-16-64bit-regex-performance-hit

Per above, some of the items, you'll need to be careful with:

were both Perls compiled with the same flags?
are both perls threaded perls (disabling threading support makes it
faster)
how big are your integers? 64 bit or 32 bit?
what compiler optimizations were chosen?
did your previous Perl have some distribution-specific patches
applied?
Basically, you have to compare the whole perl -V output

-- 
Charles DeRykus



As you can see,  you need to be carefully examining the comparison
scenarios.

-- 
Charles DeRykus


Re: perl regexp performance - architecture?

2014-02-17 Thread Phil Smith
On Mon, Feb 17, 2014 at 9:10 PM, Charles DeRykus dery...@gmail.com wrote:


 On Mon, Feb 17, 2014 at 4:25 PM, Phil Smith philbo...@gmail.com wrote:

 On Mon, Feb 17, 2014 at 6:16 PM, Charles DeRykus dery...@gmail.comwrote:


 On Mon, Feb 17, 2014 at 12:41 PM, Phil Smith philbo...@gmail.comwrote:

 I'm currently loading some new servers with CentOS6 on which perl5.10
 is the standard version of perl provided. However, I've also loaded
 perl5.18 and I don't think the version of perl is significant in the
 results I'm seeing. Basically, I'm seeing perl performance significantly
 slower on my new systems than on my 6 year old systems.

 Here's some of the relevant details:

 + 6 year old server, 32 bit architecture, CentOS5 perl5.8
 perl, and in particular regexp operations, perform reasonably fast.

 + Very new server, 64 bit architecture, CentOS6, perl5.10 (and have
 tried perl5.18)
 perl, and in particular regexp operations, perform significantly slower
 than on the 6 year old server. That struck me as odd right off. I though
 surely, perl running on a modern high-end cpu is going to beat out my code
 running on 6 year old hardware.

 I've compared CPU models at various CPU benchmarking sites and the new
 CPUs, as you would expect, are ranked significantly higher in performance
 than the old.

 I've also installed perl5.8 on the new 64bit servers and the
 performance is similar to that of perl5.10 and perl5.18 on the same 64bit
 servers. Given that, I don't think perl version plays a significant factor
 is the performance diffs.

 Is it an accepted fact that perl performance takes a hit on 64 bit
 architecture?

 I've tried comparing some of the perl -V and Config.pm results looking
 for significant differences. That output is pretty verbose and the most
 significant difference is the architecture.

 I could provide some of my benchmarking code if that would be of help.
 The differences are significant. The only reason I'm looking at this is
 because I could see right off that some of my code is taking 30-40% longer
 to run in the new environment. Once I started putting in some timing
 with Time::HiRes I could see the delay involved large amounts of regexp
 processing.

 Right now, I'm just looking for any opinions on what I'm seeing so that
 I know the architecture is the significant factor in the performance
 degradation and then consider any recommendations for improvements. I'm
 happy to provide further relevant details.


 This sounds like it  could be something OS-specific and, googling
 CentOS regex performance generates hits, eg,



 http://pkgs.org/centos-5/puias-computational-x86_64/boost141-regex-1.4.0-2.el5.x86_64.rpm.html


 No, I really don't think it is specific to a version of CentOS. I've
 installed various permutations of 32 and 64 bit CentOS 5 and 6. The better
 performance seems to follow the 32 bit architecture rather than a specific
 Perl version or CentOS version.


 Newer perl regex engines have added Unicode support which can
 add drag. I'd be surprised though if just the 64-bit architecture itself
 was totally responsible for major slowdowns.  Some of the issues are
 mentioned here:


 http://stackoverflow.com/questions/17800112/upgraded-from-perl-5-8-32bit-to-5-16-64bit-regex-performance-hit

 Per above, some of the items, you'll need to be careful with:

 were both Perls compiled with the same flags?
 are both perls threaded perls (disabling threading support makes it
 faster)
 how big are your integers? 64 bit or 32 bit?
 what compiler optimizations were chosen?
 did your previous Perl have some distribution-specific patches
 applied?
 Basically, you have to compare the whole perl -V output

 --
 Charles DeRykus



 As you can see,  you need to be carefully examining the comparison
 scenarios.

 --
 Charles DeRykus



Yes... I saw that link as well, Charles.

I mentioned in my original post that I was looking at the diffs in perl -V
output. The output is pretty verbose, but the differences seem to focus on
32bit vs 64bit architecture and configs that you would expect related to
that as in various byte size definitions.

Like many people (and that's an assumption), I don't build perl. I take
what comes with a given distribution as with CentOS5 and CentOS6 (and soon
to be CentOS7). Yes, I realize they provide versions well earlier than what
is the most recent.

Given that mode, which again I assume to be a common practice, I would
expect the performance degradation to be something many people would
commonly notice when they moved from 32bit to 64bit machines.

I've tried perl5.8.8 on both 32bit and 64bit where the -V output seems
limited to arch differences, so based on that the only common thread in the
performance tests is the architecture and the better performance seems to
follow 32 bit.

Thanks,
Phil


regexp as hash value?

2014-01-25 Thread Luca Ferrari
Hi,
I'm just wondering if it is possible to place a regexp as a value into
an hash so to use it later as something like:

my $string =~ $hash_ref-{ $key };

Is it possible? Should I take into account something special?

Thanks,
Luca

-- 
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/




Re: regexp as hash value?

2014-01-25 Thread Paul Johnson
On Sat, Jan 25, 2014 at 03:51:53PM +0100, Luca Ferrari wrote:
 Hi,
 I'm just wondering if it is possible to place a regexp as a value into
 an hash so to use it later as something like:
 
 my $string =~ $hash_ref-{ $key };
 
 Is it possible? Should I take into account something special?

Yes, this is possible.  You need to use qr// to construct your RE:

$ perl -E '$h = { a = qr/y/ }; say $_ =~ $h-{a} for qw(x y z)'

1

$

-- 
Paul Johnson - p...@pjcj.net
http://www.pjcj.net

-- 
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/




Re: regexp as hash value?

2014-01-25 Thread Luca Ferrari
On Sat, Jan 25, 2014 at 4:12 PM, Paul Johnson p...@pjcj.net wrote:

 $ perl -E '$h = { a = qr/y/ }; say $_ =~ $h-{a} for qw(x y z)'

Thanks, but then another doubt: having a look at
http://perldoc.perl.org/perlop.html#Regexp-Quote-Like-Operators I dont
understand how I can use the regexp for substitution, that is s/// as
an hash value. The following is not working:
my $hash = { q/regexp/ = qr/s,from,to,/ };

any clue?

THanks,
Luca

-- 
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/




Re: regexp as hash value?

2014-01-25 Thread Paul Johnson
On Sat, Jan 25, 2014 at 06:41:00PM +0100, Luca Ferrari wrote:
 On Sat, Jan 25, 2014 at 4:12 PM, Paul Johnson p...@pjcj.net wrote:
 
  $ perl -E '$h = { a = qr/y/ }; say $_ =~ $h-{a} for qw(x y z)'
 
 Thanks, but then another doubt: having a look at
 http://perldoc.perl.org/perlop.html#Regexp-Quote-Like-Operators I dont
 understand how I can use the regexp for substitution, that is s/// as
 an hash value. The following is not working:
 my $hash = { q/regexp/ = qr/s,from,to,/ };

You won't be able to do the full substitution this way, but you can use
the RE in the substitution:

$ perl -E '$h = { a = qr/y/ }; say $_ =~ s/$h-{a}/p/r for qw(x y z)'
x
p
z
$

If the replacement text is different for each substitution then you may
be better served storing anonymous subs in your hash:

$ perl -E '$h = { a = sub { s/y/p/r } }; say $h-{a}-() for qw(x y z)'
x
p
z
$

Good luck.

-- 
Paul Johnson - p...@pjcj.net
http://www.pjcj.net

-- 
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/




Re: regexp and parsing assistance

2013-06-09 Thread Jim Gibson

On Jun 8, 2013, at 8:06 PM, Noah wrote:

 Hi there,
 
 I am attempting to parse the following output and not quite sure how to do 
 it.   The text is in columns and spaced out that way regardless if there are 
 0 numbers in say col5 or Col 6 or not.  If the column has an entry then I 
 want to save it to a variable if there is no entry then that variable will be 
 equal to 'blank'
 
 The first line is a header and can be ignored.
 
 
 C Col2   C Col4  Col5   Col6  Col7Col8 
 new_line
 * 123.456.789.101/85 A 803Reject  new_line
 B 804 76 10 800.99.999.098765 78910 
 I new_line
 O 805   1234  1 800.9.999.1 98765 78910 
 I new_line

If your data consists of constant-width fields, then the best approach is to 
use the unpack function. See 'perldoc -f unpack' for how to use it and 'perldoc 
-f pack' for the template parameters that describe your data.

This statement will unpack the second and third data lines you have shown, 
presuming that you have read the lines into the variable $line:

  my @fields = unpack('A2 A19 A2 A3 A11 A11 A18 A5 A5 A1',$line);

However, your data as shown has variable data in the first or second column. If 
that is really the case, then you will have to look at the first twenty columns 
of your data and determine where column three starts. Then you can use the 
unpack function to parse the rest of the columns. Maybe something like this:

  if( $line =~ /^\s{20}/ ) {
# no data in first 20 columns, unpack remainder
$line = substr($line,20);
  }else{
# data in first 20 columns -- remove first two fields
$line =~ s/\S+\s\S+\s//;
  }
  my @fields = unpack('A2 A3 A11 A11 A18 A5 A5 A1',$line);

Exactly what you need to do depends upon the exact nature of your data and how 
much it varies from line to line.

Good luck!


--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/




Re: regexp and parsing assistance

2013-06-09 Thread Noah

On 6/9/13 9:00 AM, Jim Gibson wrote:


On Jun 8, 2013, at 8:06 PM, Noah wrote:


Hi there,

I am attempting to parse the following output and not quite sure how to do it.  
 The text is in columns and spaced out that way regardless if there are 0 
numbers in say col5 or Col 6 or not.  If the column has an entry then I want to 
save it to a variable if there is no entry then that variable will be equal to 
'blank'

The first line is a header and can be ignored.


C Col2   C Col4  Col5   Col6  Col7Col8 
new_line
* 123.456.789.101/85 A 803Reject  new_line
 B 804 76 10 800.99.999.098765 78910 I 
new_line
 O 805   1234  1 800.9.999.1 98765 78910 I 
new_line


If your data consists of constant-width fields, then the best approach is to 
use the unpack function. See 'perldoc -f unpack' for how to use it and 'perldoc 
-f pack' for the template parameters that describe your data.

This statement will unpack the second and third data lines you have shown, 
presuming that you have read the lines into the variable $line:

   my @fields = unpack('A2 A19 A2 A3 A11 A11 A18 A5 A5 A1',$line);

However, your data as shown has variable data in the first or second column. If 
that is really the case, then you will have to look at the first twenty columns 
of your data and determine where column three starts. Then you can use the 
unpack function to parse the rest of the columns. Maybe something like this:

   if( $line =~ /^\s{20}/ ) {
 # no data in first 20 columns, unpack remainder
 $line = substr($line,20);
   }else{
 # data in first 20 columns -- remove first two fields
 $line =~ s/\S+\s\S+\s//;
   }
   my @fields = unpack('A2 A3 A11 A11 A18 A5 A5 A1',$line);

Exactly what you need to do depends upon the exact nature of your data and how 
much it varies from line to line.

Good luck!





Thanks Jim,

Here is the regexp I came up with.  Works really well

if ($line =~ 
/^\*\s(\S+)\s+(\S)\s+(\d+)\s{6}([\s\d]{0,5})\s{6}([\s\d]{0,5})[\s\]+(\S+)(.*)/) 
{


then I strip the start and trailing spaces for the scalars I collect.

Cheers,
Noah


--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/




regexp and parsing assistance

2013-06-08 Thread Noah

Hi there,

I am attempting to parse the following output and not quite sure how to 
do it.   The text is in columns and spaced out that way regardless if 
there are 0 numbers in say col5 or Col 6 or not.  If the column has an 
entry then I want to save it to a variable if there is no entry then 
that variable will be equal to 'blank'


The first line is a header and can be ignored.


C Col2   C Col4  Col5   Col6  Col7Col8 
new_line

* 123.456.789.101/85 A 803Reject  new_line
 B 804 76 10 800.99.999.098765 
78910 I new_line
 O 805   1234  1 800.9.999.1 98765 
78910 I new_line




Any assistance is helpful.

Cheers

--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/




Re: character setts in a regexp

2013-01-14 Thread John SJ Anderson
On Fri, Jan 11, 2013 at 2:01 PM, Christer Palm b...@bredband.net wrote:
 Do you have suggestions on this character issue? Is it possible to determine 
 the character set of a text efficiently? Is it other ways to solve the 
 problem?

As far as other ways to solve the problem, my suggestion would be to
not use regexps to parse XML, use an XML parser. For example,
something like https://metacpan.org/module/XML::Feed .

chrs,
john.

-- 
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/




Re: character setts in a regexp

2013-01-14 Thread Charles DeRykus
On Sat, Jan 12, 2013 at 12:56 PM, Charles DeRykus dery...@gmail.com wrote:
 On Fri, Jan 11, 2013 at 2:01 PM, Christer Palm b...@bredband.net wrote:
 Hi!

 I have a perl script that parses RSS streams from different news sources and 
 experience problems with national characters in a regexp function used for 
 matching a keyword list with the RSS data.

 Everything works fine with a simple regexp for plain english i.e. words 
 containing the letters A-Z, a-z, 0-9.

 if ( $description =~ m/\b$key/i ) {….}

 Keywords or RSS data with national characters don’t work at all. I’m not 
 really surprised this was expected as character sets used in the different 
 RSS streams are outside my control.

 I am have the ”use utf8;” function activated but I’m not really sure if it 
 is needed. I can’t see any difference used or not.

 If a convert all the national characters used in the keyword list to html 
 type ”aring” and so on. Changes every occurrence of octal, unicode 
 characters used i.e. decimal and hex to html type in the RSS data in a 
 character parser everything works fine but takes time that I don’t what to 
 avoid.

 Do you have suggestions on this character issue? Is it possible to determine 
 the character set of a text efficiently? Is it other ways to solve the 
 problem?

...

 #!/usr/bin/perl
 use strict;
 use warnings;

 binmode(STDOUT, :utf8);
 $cosa = my \x{263a};
 print cosa=$cosa\n;

 print found smiley at \\b\n if $cosa =~ /\b\x{263a}/;
 print found smiley (no \\b)  if $cosa =~ /\x{263a}/;

 The output:
 cosa=my ☺
 found smiley (no \b)


From: http://www.unicode.org/reports/tr18/#Simple_Word_Boundaries
---
Most regular expression engines allow a test for word boundaries (such
as by \b in Perl). They generally use a very simple mechanism for
determining word boundaries: one example of that would be having word
boundaries between any pair of characters where one is a
word_character and the other is not, or at the start and end of a
string. This is not adequate for Unicode regular expressions.
-

Based on the above, Perl's \b semantics appear to be not adequate
for Unicode regular expressions since, it doesn't address extended
code points of Unicode, only values in the alphanumeric range and
underscore.

So, you may possibly want to try a preceding space to delimit the
keyword

print match if my \x{263a}=~ / \x{263a}/;   # matches!
#print match if my \b\x{263a} =~ /\b\x{263a/;   # would not match

-- 
Charles DeRykus

--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/




Re: character setts in a regexp

2013-01-12 Thread Charles DeRykus
On Fri, Jan 11, 2013 at 2:01 PM, Christer Palm b...@bredband.net wrote:
 Hi!

 I have a perl script that parses RSS streams from different news sources and 
 experience problems with national characters in a regexp function used for 
 matching a keyword list with the RSS data.

 Everything works fine with a simple regexp for plain english i.e. words 
 containing the letters A-Z, a-z, 0-9.

 if ( $description =~ m/\b$key/i ) {….}

 Keywords or RSS data with national characters don’t work at all. I’m not 
 really surprised this was expected as character sets used in the different 
 RSS streams are outside my control.

 I am have the ”use utf8;” function activated but I’m not really sure if it is 
 needed. I can’t see any difference used or not.

 If a convert all the national characters used in the keyword list to html 
 type ”aring” and so on. Changes every occurrence of octal, unicode 
 characters used i.e. decimal and hex to html type in the RSS data in a 
 character parser everything works fine but takes time that I don’t what to 
 avoid.

 Do you have suggestions on this character issue? Is it possible to determine 
 the character set of a text efficiently? Is it other ways to solve the 
 problem?

 /Christer
 --
 To unsubscribe, e-mail: beginners-unsubscr...@perl.org
 For additional commands, e-mail: beginners-h...@perl.org
 http://learn.perl.org/


On Fri, Jan 11, 2013 at 2:01 PM, Christer Palm b...@bredband.net wrote:
 Hi!

 I have a perl script that parses RSS streams from different news sources and 
 experience problems with national characters in a regexp function used for 
 matching a keyword list with the RSS data.

 Everything works fine with a simple regexp for plain english i.e. words 
 containing the letters A-Z, a-z, 0-9.

 if ( $description =~ m/\b$key/i ) {….}

 Keywords or RSS data with national characters don’t work at all. I’m not 
 really surprised this was expected as character sets used in the different 
 RSS streams are outside my control.

 I am have the ”use utf8;” function activated but I’m not really sure if it is 
 needed. I can’t see any difference used or not.

 If a convert all the national characters used in the keyword list to html 
 type ”aring” and so on. Changes every occurrence of octal, unicode 
 characters used i.e. decimal and hex to html type in the RSS data in a 
 character parser everything works fine but takes time that I don’t what to 
 avoid.

 Do you have suggestions on this character issue? Is it possible to determine 
 the character set of a text efficiently? Is it other ways to solve the 
 problem?


I'm not sure if this is related but the docs mention some character
and byte semantics overlap.

*** START perlunicode:
..As discussed elsewhere, Perl has one foot (two hooves?) planted in
each of two worlds: the old world of bytes and the new world of
characters, upgrading from bytes to characters when necessary. If your
legacy code does not explicitly use Unicode, no automatic switch-over
to characters should happen. Characters shouldn't get downgraded to
bytes, either. It is possible to accidentally mix bytes and
characters, however (see perluniintro), in which case \w in regular
expressions might start behaving differently (unless the /a modifier
is in effect). Review your code. Use warnings and the strict pragma.
*** END perlunicode

speculate
Perhaps, although not explicit,  this downgrading might potentially
impact \b as well as \w.   Here's an example which appears to
support this  since adding \b causes the match to fail.  (There may
workaround via the character properties mentioned in perlunicode)
/speculate

#!/usr/bin/perl
use strict;
use warnings;

binmode(STDOUT, :utf8);
$cosa = my \x{263a};
print cosa=$cosa\n;

print found smiley at \\b\n if $cosa =~ /\b\x{263a}/;
print found smiley (no \\b)  if $cosa =~ /\x{263a}/;

The output:
cosa=my ☺
found smiley (no \b)

-- 
Charles DeRykus

--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/




character setts in a regexp

2013-01-11 Thread Christer Palm
Hi!

I have a perl script that parses RSS streams from different news sources and 
experience problems with national characters in a regexp function used for 
matching a keyword list with the RSS data. 

Everything works fine with a simple regexp for plain english i.e. words 
containing the letters A-Z, a-z, 0-9.

if ( $description =~ m/\b$key/i ) {….}

Keywords or RSS data with national characters don’t work at all. I’m not really 
surprised this was expected as character sets used in the different RSS streams 
are outside my control.

I am have the ”use utf8;” function activated but I’m not really sure if it is 
needed. I can’t see any difference used or not. 

If a convert all the national characters used in the keyword list to html type 
”aring” and so on. Changes every occurrence of octal, unicode characters used 
i.e. decimal and hex to html type in the RSS data in a character parser 
everything works fine but takes time that I don’t what to avoid.   

Do you have suggestions on this character issue? Is it possible to determine 
the character set of a text efficiently? Is it other ways to solve the problem?

/Christer
--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/




Re: character setts in a regexp

2013-01-11 Thread Jim Gibson

On Jan 11, 2013, at 2:01 PM, Christer Palm wrote:

 Hi!
 
 I have a perl script that parses RSS streams from different news sources and 
 experience problems with national characters in a regexp function used for 
 matching a keyword list with the RSS data. 
 
 Everything works fine with a simple regexp for plain english i.e. words 
 containing the letters A-Z, a-z, 0-9.
 
 if ( $description =~ m/\b$key/i ) {….}
 
 Keywords or RSS data with national characters don’t work at all. I’m not 
 really surprised this was expected as character sets used in the different 
 RSS streams are outside my control.
 
 I am have the ”use utf8;” function activated but I’m not really sure if it is 
 needed. I can’t see any difference used or not. 

The 'use utf8;' is necessary if you have UTF-8 characters in your Perl source 
file that you want interpreted correctly, e.g., in string literals or variable 
names.

 
 If a convert all the national characters used in the keyword list to html 
 type ”aring” and so on. Changes every occurrence of octal, unicode 
 characters used i.e. decimal and hex to html type in the RSS data in a 
 character parser everything works fine but takes time that I don’t what to 
 avoid.   
 
 Do you have suggestions on this character issue? Is it possible to determine 
 the character set of a text efficiently? Is it other ways to solve the 
 problem?

Have you read the following?

  perldoc perlunitut
  perldoc perlunicode
  perldoc perlunifaq


--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/




Re: character setts in a regexp

2013-01-11 Thread Brandon McCaig
On Fri, Jan 11, 2013 at 11:01:45PM +0100, Christer Palm wrote:
 Hi!

Hello,

 I have a perl script that parses RSS streams from different
 news sources and experience problems with national characters
 in a regexp function used for matching a keyword list with the
 RSS data. 
 
 Everything works fine with a simple regexp for plain english
 i.e. words containing the letters A-Z, a-z, 0-9.
 
 if ( $description =~ m/\b$key/i ) {….}
 
 Keywords or RSS data with national characters don’t work at
 all. I’m not really surprised this was expected as character
 sets used in the different RSS streams are outside my control.

The XML standard provides a way to specify the character set in
the XML document.

?xml version=1.0 encoding=utf-8?


Are you parsing the XML unintelligently (e.g., regex) or are you
using an XML parser to do it? I have done limited XML parsing in
Perl, but I would seek an API that supports the XML standards for
encodings and ideally just does the Right Thing(tm). In theory,
it should Just Work(tm) if you can find an appropriate family of
modules.

 I am have the ”use utf8;” function activated but I’m not really
 sure if it is needed. I can’t see any difference used or not. 

As mentioned, the utf8 pragma basically just tells perl that the
source file is UTF-8 encoded (and so literal strings should be
considered UTF-8 text, for example). The Encode module can be
used to manually decode and encode strings between various
encodings. E.g., if you know the text is UTF-16LE then you can do
this:

  use Encode;

  my $input = getRssStream();

  my $text = Encode::decode('UTF-16LE', $input);

Encodings are also supported at the IO layer, so depending on
where you're getting it from you might be able to just inform
said layers of the encoding and have the rest automatic. E.g.,

  # Something like this:
  binmode $socket, ':encoding(UTF-16LE)';

 Do you have suggestions on this character issue? Is it possible
 to determine the character set of a text efficiently? Is it
 other ways to solve the problem?

There are some modules to guess encodings (e.g., File::BOM). Of
course, it's impossible to be certain. It's best to use the
standards in the transport protocol or data format to define the
encoding so that you know for sure what is expected and don't
have to guess (because it isn't always possible to detect it
correctly).

Regards,


-- 
Brandon McCaig bamcc...@gmail.com bamcc...@castopulence.org
Castopulence Software https://www.castopulence.org/
Blog http://www.bamccaig.com/
perl -E '$_=q{V zrna gur orfg jvgu jung V fnl. }.
q{Vg qbrfa'\''g nyjnlf fbhaq gung jnl.};
tr/A-Ma-mN-Zn-z/N-Zn-zA-Ma-m/;say'



signature.asc
Description: Digital signature


Re: question of regexp or (another solution)

2012-12-15 Thread Dr.Ruud

On 2012-12-15 06:13, timothy adigun wrote:


Using Dr., Ruud's data. This is another way of doing it:

[solution using a hash]


Realize that with keys(), the input order is not preserved.

Another difference is that when a key comes back later,
the hash solution will collide those, which is either
wanted of unwanted.

So it all depends on what the *real* specifications are.

A combined approach is to use the hash, and also push
new keys in a side array. Then you can use that array
to restore order later.

The hash solution is not good with huge files.

The code pattern I showed, is most used in map-reduce,
where the input file is ordered (at least) on key,
so you don't have to check that anymore.

--
Ruud


--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/




question of regexp or (another solution)

2012-12-14 Thread samuel desseaux
Hi!

I work in a library and i need to have several fields in one line

Example

I have this

=995  \\$xPR$wLivre
=995 \\$bECAM$cECAM
=995  \\$n
=995  \\$oDisponible
=995  \\$kG1 42171


and i want in one line

=995 \\$bECAM$cECAM$kG1 42171$n$oDisponible$xPR$wLivre


Re: question of regexp or (another solution)

2012-12-14 Thread samuel desseaux
i complete my email

Hi!

I work in a library and i need to have several fields in one line

Example

I have this

=995  \\$xPR$wLivre
=995 \\$bECAM$cECAM
=995  \\$n
=995  \\$oDisponible
=995  \\$kG1 42171


and i want in one line

=995 \\$bECAM$cECAM$kG1 42171$n$oDisponible$xPR$wLivre

How could i do with a script perl?

Many thanks

samuel


2012/12/14 samuel desseaux sdesse...@gmail.com

 Hi!

 I work in a library and i need to have several fields in one line

 Example

 I have this

 =995  \\$xPR$wLivre
 =995 \\$bECAM$cECAM
 =995  \\$n
 =995  \\$oDisponible
 =995  \\$kG1 42171


 and i want in one line

 =995 \\$bECAM$cECAM$kG1 42171$n$oDisponible$xPR$wLivre



Re: question of regexp or (another solution)

2012-12-14 Thread Dr.Ruud

On 2012-12-14 14:54, samuel desseaux wrote:


=995  \\$xPR$wLivre
=995 \\$bECAM$cECAM
=995  \\$n
=995  \\$oDisponible
=995  \\$kG1 42171

and i want in one line

=995 \\$bECAM$cECAM$kG1 42171$n$oDisponible$xPR$wLivre


echo -n '1   a
1  b
1 c
2  x
=995  \\$xPR$wLivre
=995 \\$bECAM$cECAM
=995  \\$n
=995  \\$oDisponible
=995  \\$kG1 42171
zz 1
zz  2
zz   3
' |perl -Mstrict -wle '
  my ($key, $value);
  while ( my $line =  ) {
chomp $line;
my ($k, $v) = split  , $line, 2;
if ( defined $key and $key eq $k ) {
  $value .= $v;
} else {
  print $key\t$value if defined $key;
  ($key, $value) = ($k, $v);
}
  }
  print $key\t$value if defined $key;
'
1   abc
2   x
=995\\$xPR$wLivre\\$bECAM$cECAM\\$n\\$oDisponible\\$kG1 42171
zz  123

--
Ruud


--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/




Re: question of regexp or (another solution)

2012-12-14 Thread timothy adigun
Hi,

On Fri, Dec 14, 2012 at 2:53 PM, samuel desseaux sdesse...@gmail.comwrote:

 Hi!

 I work in a library and i need to have several fields in one line

 Example

 I have this

 =995  \\$xPR$wLivre
 =995 \\$bECAM$cECAM
 =995  \\$n
 =995  \\$oDisponible
 =995  \\$kG1 42171


 and i want in one line

 =995 \\$bECAM$cECAM$kG1 42171$n$oDisponible$xPR$wLivre


Using Dr., Ruud's data. This is another way of doing it:

use warnings;
use strict;

my %data_collection;

while (DATA) {
chomp;
my ( $key, $value ) = split /\s+/, $_, 2;
push @{ $data_collection{$key} }, $value;
}

print $_,  , @{ $data_collection{$_} }, $/ for keys %data_collection;

__DATA__

=995  \\$xPR$wLivre
=995 \\$bECAM$cECAM
=995  \\$n
=995  \\$oDisponible
=995  \\$kG1 42171
zz 1
zz  2
zz   3

-- 
Tim


Re: question of regexp or (another solution)

2012-12-14 Thread *Shaji Kalidasan*
Hi,

On Fri, Dec 14, 2012 at 2:53 PM, samuel desseaux sdesse...@gmail.comwrote:

 Hi!

 I work in a library and i need to have several fields in one line

 Example

 I have this

 =995  \\$xPR$wLivre
 =995 \\$bECAM$cECAM
 =995  \\$n
 =995  \\$oDisponible
 =995  \\$kG1 42171


 and i want in one line

 =995 \\$bECAM$cECAM$kG1 42171$n$oDisponible$xPR$wLivre


Adding to Tim's wisdom

Here is another way of doing it.

use warnings;
use strict;

my %data_collection;

while (DATA) {
    chomp;
    my ( $key, $value ) = split /\s+/, $_, 2;
    $data_collection{$key} .= $value;
}

print $data_collection{$_}, $/ for sort keys %data_collection;

__DATA__
=995  \\$xPR$wLivre
=995 \\$bECAM$cECAM
=995  \\$n
=995  \\$oDisponible
=995  \\$kG1 42171
zz 1
zz 2
zz 3
rms 0xcafebabe
rms 0xfed
 
best,
Shaji 
---
Your talent is God's gift to you. What you do with it is your gift back to God.
---



 From: timothy adigun 2teezp...@gmail.com
To: samuel desseaux sdesse...@gmail.com 
Cc: beginners@perl.org 
Sent: Saturday, 15 December 2012 10:43 AM
Subject: Re: question of regexp or (another solution)
 
Hi,

On Fri, Dec 14, 2012 at 2:53 PM, samuel desseaux sdesse...@gmail.comwrote:

 Hi!

 I work in a library and i need to have several fields in one line

 Example

 I have this

 =995  \\$xPR$wLivre
 =995 \\$bECAM$cECAM
 =995  \\$n
 =995  \\$oDisponible
 =995  \\$kG1 42171


 and i want in one line

 =995 \\$bECAM$cECAM$kG1 42171$n$oDisponible$xPR$wLivre


Using Dr., Ruud's data. This is another way of doing it:

use warnings;
use strict;

my %data_collection;

while (DATA) {
    chomp;
    my ( $key, $value ) = split /\s+/, $_, 2;
    push @{ $data_collection{$key} }, $value;
}

print $_,  , @{ $data_collection{$_} }, $/ for keys %data_collection;

__DATA__

=995  \\$xPR$wLivre
=995 \\$bECAM$cECAM
=995  \\$n
=995  \\$oDisponible
=995  \\$kG1 42171
zz 1
zz  2
zz   3

-- 
Tim

Re: Scalar::Util::blessed() considers Regexp references to be blessed?

2012-01-23 Thread Brian Fraser
On Mon, Jan 23, 2012 at 2:12 AM, David Christensen 
dpchr...@holgerdanske.com wrote:

 beginners@perl.org:

 While coding some tests tonight, I discovered that Scalar::Util::blessed()
 considers Regexp references to be blessed.


 Is this a bug or a feature?


Implementation detail. Internally, regexen have their own data type, REGEXP
(which you can see with Scalar::Util::reftype), which is then 'blessed'
into the Regexp class. This actually allows some very cool tricks, but
nothing that should ever see the light of a production server, or show up
in a beginners mailing list : )


Scalar::Util::blessed() considers Regexp references to be blessed?

2012-01-22 Thread David Christensen

beginners@perl.org:

While coding some tests tonight, I discovered that 
Scalar::Util::blessed() considers Regexp references to be blessed.



Is this a bug or a feature?


TIA,

David



2012-01-22 21:07:57 dpchrist@p43400e ~/sandbox/perl
$ cat blessed
#! /usr/bin/perl
# $Id: blessed,v 1.1 2012-01-23 05:06:51 dpchrist Exp $
use strict;
use warnings;
use Test::More  tests = 11;
use Scalar::Utilqw( blessed );
our $foo;
ok( !blessed(undef),'undefined value'   ); #  1
ok( !blessed(''),   'empty string'  ); #  2
ok( !blessed(0),'zero'  ); #  3
ok( !blessed(1),'one'   ); #  4
ok( !blessed(\0),   'scalar reference'  ); #  5
ok( !blessed([]),   'array reference'   ); #  6
ok( !blessed({}),   'hash reference'); #  7
ok( !blessed(sub {}),   'code reference'); #  8
ok( !blessed(*foo), 'glob reference'); #  9
ok( ref(qr//) eq 'Regexp', 'qr// creates Regexp reference'  ); # 10
ok( !blessed(qr//), 'Regexp reference'  ); # 11

2012-01-22 21:08:27 dpchrist@p43400e ~/sandbox/perl
$ perl blessed
1..11
ok 1 - undefined value
ok 2 - empty string
ok 3 - zero
ok 4 - one
ok 5 - scalar reference
ok 6 - array reference
ok 7 - hash reference
ok 8 - code reference
ok 9 - glob reference
ok 10 - qr// creates Regexp reference
not ok 11 - Regexp reference
#   Failed test 'Regexp reference'
#   at blessed line 18.
# Looks like you failed 1 test of 11.

2012-01-22 21:08:30 dpchrist@p43400e ~/sandbox/perl
$ perl -MScalar::Util -e 'print $Scalar::Util::VERSION, \n'
1.23

2012-01-22 21:08:34 dpchrist@p43400e ~/sandbox/perl
$ perl -v

This is perl, v5.10.1 (*) built for i486-linux-gnu-thread-multi
(with 53 registered patches, see perl -V for more detail)

Copyright 1987-2009, Larry Wall

Perl may be copied only under the terms of either the Artistic License 
or the

GNU General Public License, which may be found in the Perl 5 source kit.

Complete documentation for Perl, including FAQ lists, should be found on
this system using man perl or perldoc perl.  If you have access to the
Internet, point your browser at http://www.perl.org/, the Perl Home Page.


2012-01-22 21:08:37 dpchrist@p43400e ~/sandbox/perl
$ cat /etc/debian_version
6.0.3

--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/




how to use regexp to match symbols

2011-06-13 Thread eventual
Hi,
I have a list of mp3 files in my computer and some of the file names consists 
of  a bracket like this darling I love [you.mp3
I wish to check them for duplicates using the script below, but theres error 
msg like this Unmatched [ in regex; marked by -- HERE in m/only one brace 
here[ -- HERE anything.mp3/ at Untitled1 line 13.
 
So how do I rewrite the regexp.
Thanks.
 
## script ###
#!/usr/bin/perl
use strict;
use warnings;
use File::Find;
 
my @datas = (test.mp3 , only one brace here[anything.mp3 , whatever.mp3);

while (@datas){
  my $ref = splice @datas,0,1;
  foreach (@datas){
  if ($ref =~/$_/){
 print $ref is a duplicate\n;
  }else{
 print $ref is not a duplicate\n;
  }
  }
}

Re: how to use regexp to match symbols

2011-06-13 Thread Rob Coops
On Mon, Jun 13, 2011 at 2:05 PM, eventual eventualde...@yahoo.com wrote:

 Hi,
 I have a list of mp3 files in my computer and some of the file names
 consists of  a bracket like this darling I love [you.mp3
 I wish to check them for duplicates using the script below, but theres
 error msg like this Unmatched [ in regex; marked by -- HERE in m/only one
 brace here[ -- HERE anything.mp3/ at Untitled1 line 13.

 So how do I rewrite the regexp.
 Thanks.

 ## script ###
 #!/usr/bin/perl
 use strict;
 use warnings;
 use File::Find;

 my @datas = (test.mp3 , only one brace here[anything.mp3 ,
 whatever.mp3);

 while (@datas){
   my $ref = splice @datas,0,1;
   foreach (@datas){
   if ($ref =~/$_/){
  print $ref is a duplicate\n;
   }else{
  print $ref is not a duplicate\n;
   }
   }
 }


Escape the special character by using a \ so in your case you would
say: only one brace here\[anything.mp3 which the regular expression engine
will translate to:  only one brace here[anything.mp3 instead of  only one
brace hereOpen a caracter groupanything.mp3 which would mean you never
close the group and thus the regular expression is invalid and will throw an
error.

Regards,

Rob


Re: how to use regexp to match symbols

2011-06-13 Thread John W. Krahn

eventual wrote:

Hi,


Hello,


I have a list of mp3 files in my computer and some of the file names
consists of  a bracket like this darling I love [you.mp3
I wish to check them for duplicates using the script below, but theres
error msg like this Unmatched [ in regex; marked by-- HERE in m/only
one brace here[-- HERE anything.mp3/ at Untitled1 line 13.

So how do I rewrite the regexp.
Thanks.

## script ###
#!/usr/bin/perl
use strict;
use warnings;
use File::Find;

my @datas = (test.mp3 , only one brace here[anything.mp3 , whatever.mp3);

while (@datas){
   my $ref = splice @datas,0,1;


That is usually written as:

   my $ref = shift @datas;



   foreach (@datas){
   if ($ref =~/$_/){


That doesn't test if $ref is a duplicate, it tests if $_ is a substring 
of $ref, so this would print This is a test mp3 file.html is a 
duplicate\n:


if ( This is a test mp3 file.html =~ /test.mp3/ )

Because . will match any character and the pattern is not anchored.

If you want to see if the two strings are exactly the same then:

   if ( $ref eq $_ ) {

Or you could use a hash instead of an array so you would know that there 
are no duplicates.


But as to your question about the '[' character causing an error 
messages, you have to use quotemeta to escape regular expressions 
special characters:


   if ( $ref =~ /\Q$_/ ) {



  print $ref is a duplicate\n;
   }else{
  print $ref is not a duplicate\n;
   }
   }
}




John
--
Any intelligent fool can make things bigger and
more complex... It takes a touch of genius -
and a lot of courage to move in the opposite
direction.   -- Albert Einstein

--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/




Re: how to use regexp to match symbols

2011-06-13 Thread Dr.Ruud

On 2011-06-13 14:05, eventual wrote:


I have a list of mp3 files in my computer and some of the file names consists of  a 
bracket like this darling I love [you.mp3
I wish to check them for duplicates using the script below, but theres error msg like this 
Unmatched [ in regex; marked by-- HERE in m/only one brace here[-- HERE 
anything.mp3/ at Untitled1 line 13.


Why would you want to use a regex for this?

Use 'eq', or see 'perldoc -f index'.

In case of real regex need, see 'perldoc -f quotemeta'.

--
Ruud

--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/




Re: how to use regexp to match symbols

2011-06-13 Thread Sayth Renshaw
 I have a list of mp3 files in my computer and some of the file names consists 
 of  a bracket like this darling I love [you.mp3
 I wish to check them for duplicates using the script below, but theres error 
 msg like this Unmatched [ in regex; marked by -- HERE in m/only one brace 
 here[ -- HERE anything.mp3/ at

Searching google I found that several scripts use the Find::Duplicates
Module. 
http://search.cpan.org/~tmtm/File-Find-Duplicates-1.00/lib/File/Find/Duplicates.pm

Sayth

--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/




RE: regexp validation (arbitrary code execution) (regexp injection)

2011-06-02 Thread Bob McConnell
From: Stanislaw Findeisen

 Suppose you have a collection of books, and want to provide your users
 with the ability to search the book title, author or content using
 regular expressions.
 
 But you don't want to let them execute any code.
 
 How would you validate/compile/evaluate the user provided regex so as
to
 provide maximum flexibility and prevent code execution?

You want them to run an application without having to run an
application? That doesn't make any sense.

You have several options available to give your users access to a
database.

1. Write a client application or applet they can copy or install on
their workstation to access the database directly.

2. Write a simpler application or applet that accesses a non-DB server
which in turn access the database.

3. Create a site on a web server they can access with a browser, which
then accesses the database.

There are any number of variations on these themes, but in each case,
they have to run some application code somewhere in order to access the
data.

Bob McConnell

--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/




Re: regexp validation (arbitrary code execution) (regexp injection)

2011-06-02 Thread Stanisław Findeisen
On 2011-06-02 14:27, Bob McConnell wrote:
 From: Stanislaw Findeisen
 
 Suppose you have a collection of books, and want to provide your users
 with the ability to search the book title, author or content using
 regular expressions.

 But you don't want to let them execute any code.

 How would you validate/compile/evaluate the user provided regex so as
 to
 provide maximum flexibility and prevent code execution?
 
 You want them to run an application without having to run an
 application? That doesn't make any sense.

This is a complete misunderstanding. Sorry, perhaps I wasn't clear enough.

I was talking about users injecting *their* code via the regex. See for
instance:

http://perldoc.perl.org/perlretut.html#A-bit-of-magic:-executing-Perl-code-in-a-regular-expression

or /e modifier for the built-in function s (search and replace).

When doing:

$string =~ $regex

where $regex is user provided, arbitrary regular expression, anything
can happen.

-- 
Eisenbits - proven software solutions: http://www.eisenbits.com/
OpenPGP: E3D9 C030 88F5 D254 434C  6683 17DD 22A0 8A3B 5CC0

-- 
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/




Re: regexp validation (arbitrary code execution) (regexp injection)

2011-06-02 Thread Rob Coops
2011/6/1 Stanisław Findeisen s...@eisenbits.com

 Suppose you have a collection of books, and want to provide your users
 with the ability to search the book title, author or content using
 regular expressions.

 But you don't want to let them execute any code.

 How would you validate/compile/evaluate the user provided regex so as to
 provide maximum flexibility and prevent code execution?

 --
 Eisenbits - proven software solutions: http://www.eisenbits.com/
 OpenPGP: E3D9 C030 88F5 D254 434C  6683 17DD 22A0 8A3B 5CC0

 --
 To unsubscribe, e-mail: beginners-unsubscr...@perl.org
 For additional commands, e-mail: beginners-h...@perl.org
 http://learn.perl.org/


 Hi Stanisław,

From what you are saying I think you are looking for an option to take a
string and check it for any potential bad characters that would cause system
to execute unwanted code.

So a bit like this: In.*?forest$ is a safe string to feed into your
regular expression but: .*/; open my $fh, , $0; close $fh; $_ = ~/ is
an evil string causing you a lot of grief. At least that is how I understand
your question...

To be honest I am not sure if this is an issue as I suspect that the
following construction.
if ( $title =~ m/$userinput/ ) { do stuff... }
will give you any issues as far as I can remember the variable that you are
feeding here will not be treated as code by the interpreted but simply as a
matching instructions which would mean that what ever your user throws at it
perl will in the worst case return an failure to match.

But please don't take my word for it try it in a very simple test and see
what happens.

If you do have to ensure that a user cannot execute any code you could
simply prevent the user from entering the ; or smarter yet filter this out
from the user input, to prevent a smart user from feeding it to your code
via an method other then the front-end you provided. Without a means to
close the previous regular expression the user can not really insert
executable code into your regular expression. At least thats what I would
try but I am by no means an expert in the area and I suspect there might be
some people reading this and wondering why I didn't think of A, B or C if so
please do speak up people ;-)

Regards,

Rob


Re: regexp validation (arbitrary code execution) (regexp injection)

2011-06-02 Thread Randal L. Schwartz
 Stanisław == Stanisław Findeisen s...@eisenbits.com writes:

Stanisław But you don't want to let them execute any code.

Unless use re 'eval' is in scope, /$a/ is safe even if $a came from an
untrusted source, as long as you limit the run-time to a few seconds or
so with an alarm.  (Some regex can take nearly forever to fail.)

See perldoc perlre for more details.

-- 
Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777 0095
mer...@stonehenge.com URL:http://www.stonehenge.com/merlyn/
Smalltalk/Perl/Unix consulting, Technical writing, Comedy, etc. etc.
See http://methodsandmessages.posterous.com/ for Smalltalk discussion

-- 
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/




Re: regexp validation (arbitrary code execution) (regexp injection)

2011-06-02 Thread Paul Johnson
On Wed, Jun 01, 2011 at 11:25:39PM +0200, Stanisław Findeisen wrote:
 Suppose you have a collection of books, and want to provide your users
 with the ability to search the book title, author or content using
 regular expressions.
 
 But you don't want to let them execute any code.
 
 How would you validate/compile/evaluate the user provided regex so as to
 provide maximum flexibility and prevent code execution?

In general this shouldn't be a problem provided you don't turn on

  use re eval;

  $ perl -e '/$ARGV[0]/' '(?{ print hello })'
  Eval-group not allowed at runtime, use re 'eval' in regex m/(?{ print
  hello })/ at -e line 1.

  $ perl -Mre=eval -e '/$ARGV[0]/' '(?{ print hello })'
  hello

Of course, you're not going to be too worried about people saying hello,
but once you can execute arbitrary code all bets are off:

  $ perl -e '/$ARGV[0]/' '(?{ system sudo mailx -s ha baddie\@example.com  
/etc/shadow ])'

Make sure you don't do the whole match as part of a string eval, and
since you're only matching, you shouldn't have to worry about s///e.

If you prefer a more paranoid approach you might want to restrict the
characters you allow in the user input, but this doesn't provide maximum
flexibility.

-- 
Paul Johnson - p...@pjcj.net
http://www.pjcj.net

--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/




regexp validation (arbitrary code execution) (regexp injection)

2011-06-01 Thread Stanisław Findeisen
Suppose you have a collection of books, and want to provide your users
with the ability to search the book title, author or content using
regular expressions.

But you don't want to let them execute any code.

How would you validate/compile/evaluate the user provided regex so as to
provide maximum flexibility and prevent code execution?

-- 
Eisenbits - proven software solutions: http://www.eisenbits.com/
OpenPGP: E3D9 C030 88F5 D254 434C  6683 17DD 22A0 8A3B 5CC0

-- 
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/




Re: regexp matching nummeric ranges

2010-12-13 Thread Randal L. Schwartz
 Kammen == Kammen van, Marco, Springer SBM NL 
 marco.vankam...@springer.com writes:

Kammen What am I doing wrong??

Using a regex when something else would be much better.

Stop trying to pound a nail in with a wrench handle.

-- 
Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777 0095
mer...@stonehenge.com URL:http://www.stonehenge.com/merlyn/
Smalltalk/Perl/Unix consulting, Technical writing, Comedy, etc. etc.
See http://methodsandmessages.posterous.com/ for Smalltalk discussion

-- 
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/




Re: Regexp delimiters

2010-12-08 Thread C.DeRykus
On Dec 7, 9:38 am, p...@utilika.org (Jonathan Pool) wrote:
  Well, I have no idea why it does what it does, but I can tell you how to 
  make it work:
  s¶3(456)7¶¶$1¶x;
  s§3(456)7§§$1§x;


Hm, what  platform and perl version?

No errors here:

  c:\perl  -wE say $^V,$^O;$_='123456789';s§3(456)7§$1§;say
  v5.12.1MSWin32
  1245689

[...]

--
Charles DeRykus


--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/




Re: Regexp delimiters

2010-12-08 Thread C.DeRykus
On Dec 7, 9:38 am, p...@utilika.org (Jonathan Pool) wrote:
  Well, I have no idea why it does what it does, but I can tell you how to 
  make it work:
  s¶3(456)7¶¶$1¶x;
  s§3(456)7§§$1§x;

Oops. yes there is:

c:\perl -Mutf8 -wE
say $^V,$^O;$_='123456789';  s§3(456)7§$1§;say
Malformed UTF-8 character (unexpected continuation byte 0xa7, with no
preceding
start byte) at -e line 1.
Malformed UTF-8 character (unexpected continuation byte 0xa7, with no
preceding
start byte) at -e line 1.
v5.12.1MSWin32
1245689

--
Charles DeRykus


--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/




Re: Regexp delimiters

2010-12-08 Thread C.DeRykus
On Dec 7, 9:38 am, p...@utilika.org (Jonathan Pool) wrote:
  Well, I have no idea why it does what it does, but I can tell you how to 
  make it work:
  s¶3(456)7¶¶$1¶x;
  s§3(456)7§§$1§x;


Oops, sorry, yes there is:

c:\perl -Mutf8 -wE
say $^V,$^O;$_='123456789';s§3(456)7§$1§;say
Malformed UTF-8 character (unexpected continuation byte 0xa7,
   with no preceding start byte) at -e line 1.
Malformed UTF-8 character (unexpected continuation byte 0xa7,
   with no preceding start byte) at -e line 1.

v5.12.1MSWin32
1245689

--
Charles DeRykus



--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/




Re: Regexp delimiters

2010-12-08 Thread Jonathan Pool
 Hm, what  platform and perl version?

5.8.8 and 5.12.2 on RHEL, and 5.10.0 on OS X 10.6.

 c:\perl -Mutf8 -wE
say $^V,$^O;$_='123456789';s§3(456)7§$1§;say
 Malformed UTF-8 character (unexpected continuation byte 0xa7,
   with no preceding start byte) at -e line 1.

Not the same error as I got. This one looks to me like submitting 256-bit 
ASCII, where the section sign is A7, after telling the host you would be 
submitting UTF-8, where the section sign is C2A7. (Sorry if my terminology is 
wrong.)
ˉ


--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/




Re: Regexp delimiters

2010-12-08 Thread Jonathan Pool
  c:\perl  -wE say $^V,$^O;$_='123456789';s§3(456)7§$1§;say
  v5.12.1MSWin32
  1245689

My equivalent that works is:

perl -wE use utf8;my \$_='123456789';s§3(456)7§§\$1§;say;
1245689

If I stop treating this section-sign delimiter as a bracketing delimiter, it 
fails:

perl -wE use utf8;my \$_='123456789';s§3(456)7§\$1§;say;
Substitution replacement not terminated at -e line 1.
ˉ


--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/




Re: Regexp delimiters

2010-12-07 Thread Jonathan Pool
 Well, I have no idea why it does what it does, but I can tell you how to make 
 it work:
 s¶3(456)7¶¶$1¶x;
 s§3(456)7§§$1§x;

Amazing. Thanks very much.

This seems to contradict the documentation. The perlop man page clearly says 
that there are exactly 4 bracketing delimiters: (), [], {}, and . 
Everything else should be non-bracketing.

But, in fact, several characters that I have tried behave as bracketing 
delimiters.

An exception seems to be combining characters. The two that I tried don't work 
as either regular or bracketing delimiters.

I have tested this on Perl 5.8.8, 5.10.0, and 5.12.2.

See the code below for results. The combining characters appear in the last 2 
test pairs.



#!/usr/bin/perl -w

use warnings 'FATAL', 'all';
# Make every warning fatal.

use strict;
# Require strict checking of variable references, etc.

use utf8;
# Make Perl interpret the script as UTF-8.

my $string = '123456789';
# Initialize a scalar.

print The original string is $string\n;

# $string =~ s%3(456)7%$1%; # Succeeds
# $string =~ s%3(456)7%%$1%; # Fails
# $string =~ s§3(456)7§$1§; # Fails
# $string =~ s§3(456)7§§$1§; # Succeeds
# $string =~ s–3(456)7–$1–; # Fails
# $string =~ s–3(456)7––$1–; # Succeeds
# $string =~ s“3(456)7“$1“; # Fails
# $string =~ s“3(456)7““$1“; # Succeeds
# $string =~ s‱3(456)7‱$1‱; # Fails
# $string =~ s‱3(456)7‱‱$1‱; # Succeeds
# $string =~ s⇧3(456)7⇧$1⇧; # Fails
# $string =~ s⇧3(456)7⇧⇧$1⇧; # Succeeds
# $string =~ s⃠3(456)7⃠$1⃠; # Fails (single U+20e0)
# $string =~ s⃠3(456)7⃠⃠$1⃠; # Fails (double U+20e0)
# $string =~ s̸3(456)7̸$1̸; # Fails (single U+0338)
# $string =~ s̸3(456)7̸̸$1̸; # Fails (double U+0338)
# Modify it (uncomment any one line above.)

print The amended string is $string\n;



--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/




Regexp delimiters

2010-12-05 Thread Jonathan Pool
The perlop document under s/PATTERN/REPLACEMENT/msixpogce says Any 
non-whitespace delimiter may replace the slashes.

I take this to mean that any non-whitespace character may be used instead of a 
slash.

However, I am finding that some non-whitespace characters cause errors. For 
example, using a ¶ or § character instead of a slash causes an error, such 
as Bareword found where operator expected or Number found where operator 
expected. When I use a /, #, or ,, I get no error. Here is a script that 
demonstrates this problem:

#!/usr/bin/perl -w
use warnings 'FATAL', 'all';
use strict;
use utf8;
my $string = '123456789';
print The original string is $string\n;
$string =~ s§3(456)7§$1§;
print The amended string is $string\n;



--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/




Re: Regexp delimiters

2010-12-05 Thread Brian Fraser
Well, I have no idea why it does what it does, but I can tell you how to
make it work:
s¶3(456)7¶¶$1¶x;
s§3(456)7§§$1§x;

For whatever reason, Perl is treating those character as an 'opening'
delimiter[0], so that when you write s¶3(456)7¶$1¶;, you are telling Perl
that the regex part is delimited by '¶'s, but the substitution part is
delimited by '$'s (think of something like s{}//;).

Hopefully someone here will be able to enlighten us both further.

Brian.

[0]
http://perldoc.perl.org/perlop.html#Gory-details-of-parsing-quoted-constructs

On Sun, Dec 5, 2010 at 6:33 PM, Jonathan Pool p...@utilika.org wrote:

 The perlop document under s/PATTERN/REPLACEMENT/msixpogce says Any
 non-whitespace delimiter may replace the slashes.s

 I take this to mean that any non-whitespace character may be used instead
 of a slash.

 However, I am finding that some non-whitespace characters cause errors. For
 example, using a ¶ or § character instead of a slash causes an error,
 such as Bareword found where operator expected or Number found where
 operator expected. When I use a /, #, or ,, I get no error. Here is a
 script that demonstrates this problem:

 #!/usr/bin/perl -w
 use warnings 'FATAL', 'all';
 use strict;
 use utf8;
 my $string = '123456789';
 print The original string is $string\n;
 $string =~ s§3(456)7§$1§;
 print The amended string is $string\n;



 --
 To unsubscribe, e-mail: beginners-unsubscr...@perl.org
 For additional commands, e-mail: beginners-h...@perl.org
 http://learn.perl.org/





Re: Regexp delimiters

2010-12-05 Thread Shawn H Corey

On 10-12-05 05:58 PM, Brian Fraser wrote:

Well, I have no idea why it does what it does, but I can tell you how to
make it work:
s¶3(456)7¶¶$1¶x;
s§3(456)7§§$1§x;

For whatever reason, Perl is treating those character as an 'opening'
delimiter[0], so that when you write s¶3(456)7¶$1¶;, you are telling Perl
that the regex part is delimited by '¶'s, but the substitution part is
delimited by '$'s (think of something like s{}//;).

Hopefully someone here will be able to enlighten us both further.



$ perl -e's¶3(456)7¶¶$1¶x;'
Unrecognized character \xB6 in column 14 at -e line 1.
$ perl -Mutf8 -e's¶3(456)7¶¶$1¶x;'

You have to tell perl to use UTF-8.  Add this line to the top of your 
script(s):


use utf8;

See `perldoc utf8` for more details.


--
Just my 0.0002 million dollars worth,
  Shawn

Confusion is the first step of understanding.

Programming is as much about organization and communication
as it is about coding.

The secret to great software:  Fail early  often.

Eliminate software piracy:  use only FLOSS.

--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/




Re: Regexp delimiters

2010-12-05 Thread Brian Fraser

 You have to tell perl to use UTF-8.  Add this line to the top of your
 script(s):
 use utf8;

 See `perldoc utf8` for more details.

Hm, I don't mean to step on your toes or anything, but he is already using
utf8. The problem is with some utf8 characters being interpreted as a paired
delimiter, I think.

Brian.


Re: Regexp delimiters

2010-12-05 Thread Shawn H Corey

On 10-12-05 07:38 PM, Brian Fraser wrote:

You have to tell perl to use UTF-8.  Add this line to the top of
your script(s):
use utf8;

See `perldoc utf8` for more details.

Hm, I don't mean to step on your toes or anything, but he is already
using utf8. The problem is with some utf8 characters being interpreted
as a paired delimiter, I think.

Brian.



It works for me.  What version of Perl is he running?  5.6 does not work 
well with UTF-8.



--
Just my 0.0002 million dollars worth,
  Shawn

Confusion is the first step of understanding.

Programming is as much about organization and communication
as it is about coding.

The secret to great software:  Fail early  often.

Eliminate software piracy:  use only FLOSS.

--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/




Re: Regexp delimiters

2010-12-05 Thread Brian Fraser
That's probably because you are using what I sent, rather than what the OP
did:

 C:\perl -E s§3(456)7§$1§;

Unrecognized character \x98 in column 16 at -e line 1.


 C:\perl -Mutf8 -E s§3(456)7§$1§;

Substitution replacement not terminated at -e line 1.


 C:\perl -E s§3(456)7§§$1§; say

Unrecognized character \x98 in column 14 at -e line 1.


 C:\perl -Mutf8 -E s§3(456)7§§$1§; say



 C:\


Brian.


Re: regexp matching nummeric ranges

2010-11-30 Thread Rob Dixon

On 30/11/2010 06:39, Uri Guttman wrote:

GK == Guruprasad Kulkarniguruprasa...@gmail.com  writes:


   GK  Here is another way to do it:

   GK  /^127\.0\.0\.([\d]|[1-9][\d]|[1][\d][\d]|[2]([0-4][\d]|[5][0-4]))$/) {

why are you putting single chars inside a char class? [\d] is the same
as \d and [1] is just 1.


Also this is another solution that wrongly verifies 127.0.0.0. It also 
unnecessarily makes use of captures instead of grouping, and puts single 
values into character classes ([1], [\d] etc.). Perhaps it is better 
written:


  /^127\.0\.0\.(?:
[1-9]\d? | # 1 .. 99
1\d\d| # 100 .. 199
2[0-4]\d | # 200 .. 249
25[0-4]# 250 .. 254
  )$/x;

But my feeling is that these long-winded pure regex solutions are more 
of a response to a challenge than a practical solution. At the very 
least they need commenting to explain what they are doing. Capturing the 
value of the last byte field, as I suggested, seems to describe the 
purpose of the code far better, with no significant penalty that I can 
think of.


- Rob

--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/




regexp matching nummeric ranges

2010-11-29 Thread Kammen van, Marco, Springer SBM NL
Dear List,

I've been struggeling with the following:

#!/usr/bin/perl

use strict;
use warnings;

my $ip = (127.0.0.255);

if ($ip =~ /127\.0\.0\.[2..254]/) {
  print IP Matched!\n;;
} else {
  print No Match!\n;
}

For a reason i don't understand:

127.0.0.1 doesn't match as expected...
Everything between 127.0.0.2 and 127.0.0.299 matches...
127.0.0.230 doesn't match... 

What am I doing wrong??

Thanks! 


- 
Marco van Kammen
Springer Science+Business Media
System Manager  Postmaster 
- 
van Godewijckstraat 30 | 3311 GX
Office Number: 05E21 
Dordrecht | The Netherlands 
-  
tel 
 +31(78)6576446
fax 
 +31(78)6576302

- 
www.springeronline.com 
www.springer.com
- 



--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/




Re: regexp matching nummeric ranges

2010-11-29 Thread Rob Dixon

On 29/11/2010 14:22, Kammen van, Marco, Springer SBM NL wrote:

Dear List,

I've been struggeling with the following:

#!/usr/bin/perl

use strict;
use warnings;

my $ip = (127.0.0.255);

if ($ip =~ /127\.0\.0\.[2..254]/) {
   print IP Matched!\n;;
} else {
   print No Match!\n;
}

For a reason i don't understand:

127.0.0.1 doesn't match as expected...
Everything between 127.0.0.2 and 127.0.0.299 matches...
127.0.0.230 doesn't match...

What am I doing wrong??

Thanks!


Hello Marco

Regular expressions can't match a decimal string by value. The regex 
/[2..254]/ uses a character class which matches a SINGLE character, 
which may be '2', '5', '4' or '.'. It is the same as /[254.]/ as 
characters that appear a second time have no effect. To verify the value 
of a decimal substring you need to add an extra test:


  if ($ip =~ /^127\.0\.0\.([0-9]+)$/ and 2 = $1 and $1 = 254) {
:
  }

In the case of a successful match, this captures the fourth sequence of 
digits, leaving it in $1. This value can then be tested separately to 
make sure it is in the desired range. Note that I have added the 
beginning and end of line anchors ^ and $ which ensure that the the 
string doesn't just contain a valid IP address, otherwise anything like 
XXX.127.0.0.200.300.400 would pass the test.


HTH,

Rob

--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/




Re: regexp matching nummeric ranges

2010-11-29 Thread John W. Krahn

Kammen van, Marco, Springer SBM NL wrote:

Dear List,


Hello,


I've been struggeling with the following:

#!/usr/bin/perl

use strict;
use warnings;

my $ip = (127.0.0.255);

if ($ip =~ /127\.0\.0\.[2..254]/) {
   print IP Matched!\n;;
} else {
   print No Match!\n;
}

For a reason i don't understand:

127.0.0.1 doesn't match as expected...
Everything between 127.0.0.2 and 127.0.0.299 matches...
127.0.0.230 doesn't match...

What am I doing wrong??


As Rob said [2..254] is a character class that matches one character (so 
127.0.0.230 should match also.)  You also don't anchor the pattern so 
something like '765127.0.0.273646' would match as well.  What you need 
is something like this:


#!/usr/bin/perl

use strict;
use warnings;

my $ip = '127.0.0.255';

my $IP_match = qr{
\A   # anchor at beginning of string
127\.0\.0\.  # match the literal characters
(?:
[2-9]# match one digit numbers 2 - 9
|# OR
[0-9][0-9]   # match any two digit number
|# OR
1[0-9][0-9]  # match 100 - 199
|# OR
2[0-4][0-9]  # match 200 - 249
|# OR
25[0-4]  # match 250 - 254
)
\z   # anchor at end of string
}x;

if ( $ip =~ $IP_match ) {
  print IP Matched!\n;;
}
else {
  print No Match!\n;
}


Or, another way to do it:

#!/usr/bin/perl

use strict;
use warnings;

use Socket;

my $ip = inet_aton '127.0.0.255';

my $start = inet_aton '127.0.0.2';
my $end   = inet_aton '127.0.0.254';


if ( $ip ge $start  $ip le $end ) {
  print IP Matched!\n;;
}
else {
  print No Match!\n;
}




John
--
Any intelligent fool can make things bigger and
more complex... It takes a touch of genius -
and a lot of courage to move in the opposite
direction.   -- Albert Einstein

--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/




Re: regexp matching nummeric ranges

2010-11-29 Thread Rob Dixon

On 29/11/2010 23:46, John W. Krahn wrote:

Kammen van, Marco, Springer SBM NL wrote:

Dear List,


Hello,


I've been struggeling with the following:

#!/usr/bin/perl

use strict;
use warnings;

my $ip = (127.0.0.255);

if ($ip =~ /127\.0\.0\.[2..254]/) {
print IP Matched!\n;;
} else {
print No Match!\n;
}

For a reason i don't understand:

127.0.0.1 doesn't match as expected...
Everything between 127.0.0.2 and 127.0.0.299 matches...
127.0.0.230 doesn't match...

What am I doing wrong??


As Rob said [2..254] is a character class that matches one character (so
127.0.0.230 should match also.) You also don't anchor the pattern so
something like '765127.0.0.273646' would match as well. What you need is
something like this:

#!/usr/bin/perl

use strict;
use warnings;

my $ip = '127.0.0.255';

my $IP_match = qr{
\A # anchor at beginning of string
127\.0\.0\. # match the literal characters
(?:
[2-9] # match one digit numbers 2 - 9
| # OR
[0-9][0-9] # match any two digit number
| # OR
1[0-9][0-9] # match 100 - 199
| # OR
2[0-4][0-9] # match 200 - 249
| # OR
25[0-4] # match 250 - 254
)
\z # anchor at end of string
}x;

if ( $ip =~ $IP_match ) {
print IP Matched!\n;;
}
else {
print No Match!\n;
}


Or, another way to do it:

#!/usr/bin/perl

use strict;
use warnings;

use Socket;

my $ip = inet_aton '127.0.0.255';

my $start = inet_aton '127.0.0.2';
my $end = inet_aton '127.0.0.254';


if ( $ip ge $start  $ip le $end ) {
print IP Matched!\n;;
}
else {
print No Match!\n;
}


This regex solution allows the IP address 127.0.0.01, which is out of range.

- Rob

--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/




Re: regexp matching nummeric ranges

2010-11-29 Thread Guruprasad Kulkarni
Hi Marco,

Here is another way to do it:

#!/usr/bin/perl
use strict;
use warnings;
my $ip = 127.0.0.1;
if ($ip =~
/^127\.0\.0\.([\d]|[1-9][\d]|[1][\d][\d]|[2]([0-4][\d]|[5][0-4]))$/) {
 print IP Matched!\n;;
} else {
 print No Match!\n;
}

On Tue, Nov 30, 2010 at 11:21 AM, Rob Dixon rob.di...@gmx.com wrote:

  On 29/11/2010 23:46, John W. Krahn wrote:

 Kammen van, Marco, Springer SBM NL wrote:

 Dear List,


 Hello,

 I've been struggeling with the following:

 #!/usr/bin/perl

 use strict;
 use warnings;

 my $ip = (127.0.0.255);

 if ($ip =~ /127\.0\.0\.[2..254]/) {
 print IP Matched!\n;;
 } else {
 print No Match!\n;
 }

 For a reason i don't understand:

 127.0.0.1 doesn't match as expected...
 Everything between 127.0.0.2 and 127.0.0.299 matches...
 127.0.0.230 doesn't match...

 What am I doing wrong??


 As Rob said [2..254] is a character class that matches one character (so
 127.0.0.230 should match also.) You also don't anchor the pattern so
 something like '765127.0.0.273646' would match as well. What you need is
 something like this:

 #!/usr/bin/perl

 use strict;
 use warnings;

 my $ip = '127.0.0.255';

 my $IP_match = qr{
 \A # anchor at beginning of string
 127\.0\.0\. # match the literal characters
 (?:
 [2-9] # match one digit numbers 2 - 9
 | # OR
 [0-9][0-9] # match any two digit number
 | # OR
 1[0-9][0-9] # match 100 - 199
 | # OR
 2[0-4][0-9] # match 200 - 249
 | # OR
 25[0-4] # match 250 - 254
 )
 \z # anchor at end of string
 }x;

 if ( $ip =~ $IP_match ) {
 print IP Matched!\n;;
 }
 else {
 print No Match!\n;
 }


 Or, another way to do it:

 #!/usr/bin/perl

 use strict;
 use warnings;

 use Socket;

 my $ip = inet_aton '127.0.0.255';

 my $start = inet_aton '127.0.0.2';
 my $end = inet_aton '127.0.0.254';


 if ( $ip ge $start  $ip le $end ) {
 print IP Matched!\n;;
 }
 else {
 print No Match!\n;
 }


 This regex solution allows the IP address 127.0.0.01, which is out of
 range.

 - Rob


 --
 To unsubscribe, e-mail: beginners-unsubscr...@perl.org
 For additional commands, e-mail: beginners-h...@perl.org
 http://learn.perl.org/





RE: regexp matching nummeric ranges

2010-11-29 Thread Kammen van, Marco, Springer SBM NL
-Original Message-
From: John W. Krahn [mailto:jwkr...@shaw.ca] 
Sent: Tuesday, November 30, 2010 12:47 AM
To: Perl Beginners
Subject: Re: regexp matching nummeric ranges


As Rob said [2..254] is a character class that matches one character
(so 
127.0.0.230 should match also.)  You also don't anchor the pattern so

something like '765127.0.0.273646' would match as well.  What you need 
is something like this:

#!/usr/bin/perl

use strict;
use warnings;

my $ip = '127.0.0.255';

my $IP_match = qr{
 \A   # anchor at beginning of string
 127\.0\.0\.  # match the literal characters
 (?:
 [2-9]# match one digit numbers 2 - 9
 |# OR
 [0-9][0-9]   # match any two digit number
 |# OR
 1[0-9][0-9]  # match 100 - 199
 |# OR
 2[0-4][0-9]  # match 200 - 249
 |# OR
 25[0-4]  # match 250 - 254
 )
 \z   # anchor at end of string
 }x;

if ( $ip =~ $IP_match ) {
   print IP Matched!\n;;
}
else {
   print No Match!\n;
}

Thanks for all the good pointers...
This is something I can work with!

Marco. 


--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/




Re: regexp matching nummeric ranges

2010-11-29 Thread Uri Guttman
 GK == Guruprasad Kulkarni guruprasa...@gmail.com writes:

  GK Here is another way to do it:

  GK /^127\.0\.0\.([\d]|[1-9][\d]|[1][\d][\d]|[2]([0-4][\d]|[5][0-4]))$/) {

why are you putting single chars inside a char class? [\d] is the same
as \d and [1] is just 1.

also please don't quote entire emails below your post. learn to bottom
post and edit the quoted emails. we read from top to bottom so post that
way too.

uri

-- 
Uri Guttman  --  u...@stemsystems.com    http://www.sysarch.com --
-  Perl Code Review , Architecture, Development, Training, Support --
-  Gourmet Hot Cocoa Mix    http://bestfriendscocoa.com -

-- 
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/




Re: regexp matching nummeric ranges

2010-11-29 Thread John W. Krahn

Rob Dixon wrote:


On 29/11/2010 23:46, John W. Krahn wrote:


As Rob said [2..254] is a character class that matches one character (so
127.0.0.230 should match also.) You also don't anchor the pattern so
something like '765127.0.0.273646' would match as well. What you need is
something like this:

#!/usr/bin/perl

use strict;
use warnings;

my $ip = '127.0.0.255';

my $IP_match = qr{
\A # anchor at beginning of string
127\.0\.0\. # match the literal characters
(?:
[2-9] # match one digit numbers 2 - 9
| # OR
[0-9][0-9] # match any two digit number


This regex solution allows the IP address 127.0.0.01, which is out of
range.


Yes, sorry, that should be:

[1-9][0-9]


| # OR
1[0-9][0-9] # match 100 - 199
| # OR
2[0-4][0-9] # match 200 - 249
| # OR
25[0-4] # match 250 - 254
)
\z # anchor at end of string
}x;

if ( $ip =~ $IP_match ) {
print IP Matched!\n;;
}
else {
print No Match!\n;
}




John
--
Any intelligent fool can make things bigger and
more complex... It takes a touch of genius -
and a lot of courage to move in the opposite
direction.   -- Albert Einstein

--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/




Re: Parsing file and regexp

2010-02-19 Thread olivier.scalb...@algosyn.com
(Sorry but I have problem with my ISP, so I repost !)

Uri Guttman wrote:

  how do you know when a keyword section begins or ends? how large is this
  file? could free text have keywords? i see a ; to end a word list but
  that isn't enough to properly parse this if you have 'free text'.
 
osc Is it possible to do this with regular expression ?
osc Or should I write a small parser ?
 
  yes and yes.
 
osc I have tried pattern matching with the 's' and also with the 'm'
osc option,
osc but with no good result ...
 
  please show your code. there is no way to help otherwise. s/// is not a
  pattern matcher but a substitution operator. it uses regexes and can be
  used to parse things.
 
  uri
 

Hi Uri,

Sorry, code is at my office 

The free text can not contain keywords. And keywords start at the
beginning of a line. The list of words is terminated by a ;.

For the pattern matching I have used the option s:
m/pattern/s, to swallow the different \n.

Olivier


-- 
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/




Re: Parsing file and regexp

2010-02-19 Thread olivier.scalb...@algosyn.com
Uri Guttman wrote:
  please show your code. there is no way to help otherwise. s/// is not a
  pattern matcher but a substitution operator. it uses regexes and can be
  used to parse things.
 
  uri
 

Here it is ...

$ cat test.txt
keyword1 word1, word2
  word3;
  blabla

  blabla


keyword2
  word4, word5,
  word6, word7, word8,
  word9;

  bla bla
  bla bla

keyword1
  word10, word11;


$ cat parse.pl
use warnings;

open FILE,  test.txt or die Could not open $!;
$/ = undef;
$source = FILE;
close(FILE);


if ($source =~ m/keyword1\s*(\w*)(,\w*)*/s) {
print(Match !\n);
print($1\n);
print($2\n);
}

$ perl parse.pl
Match !
word1
,


Here I would like to have 2 matches:
word1, word2
  word3;
and word10, word11;



Thanks to help me !

Olivier




-- 
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/




Re: Parsing file and regexp

2010-02-19 Thread Shawn H Corey
olivier.scalb...@algosyn.com wrote:
 $ cat test.txt
 keyword1 word1, word2
   word3;
   blabla
 
   blabla
 
 
 keyword2
   word4, word5,
   word6, word7, word8,
   word9;
 
   bla bla
   bla bla
 
 keyword1
   word10, word11;

#!/usr/bin/perl

use strict;
use warnings;

use Data::Dumper;

# Make Data::Dumper pretty
$Data::Dumper::Sortkeys = 1;
$Data::Dumper::Indent   = 1;

# Set maximum depth for Data::Dumper, zero means unlimited
$Data::Dumper::Maxdepth = 0;

my $file = shift @ARGV;

my $source;
open my $source_fh, '', $file or die could not open $file: $!\n;
{
  local $/;
  $source = $source_fh;
}
close $source_fh;

my %keywords;
my @captured = $source =~ m{ ( keyword\d+ ) ( [^;]+ ) \; }gmsx;
while( @captured ){
  my $keyword = shift @captured;
  my $words = shift @captured;
  $words =~ s{ \A \s+ }{}msx;
  $words =~ s{ \s+ \z }{}msx;
  my @words = split m{ \s* \, \s* }msx, $words;
  push @{ $keywords{$keyword} }, @words;
}

print 'keywords: ', Dumper \%keywords;

__END__

-- 
Just my 0.0002 million dollars worth,
  Shawn

Programming is as much about organization and communication
as it is about coding.

I like Perl; it's the only language where you can bless your
thingy.

Eliminate software piracy:  use only FLOSS.

-- 
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/




Parsing file and regexp

2010-02-13 Thread olivier.scalb...@algosyn.com
Hello,

I need to extract info from some text files. And I want to do it with
Perl !

The file I need to parse has the following layout:

keywordA word1, word2, word3;

Here we can have some free text
...
...

keywordB word4,
  word5, word6, word7, word8,
  word9, word10;

KeywordA
  word1, word2;

...

I want to extract all the keywords with their associated words.
For example, with this file, I would like to have:
keywordA: (word1, word2, word3)
keywordB: (word4, word5, word6, word7, word8, word9, word10)
keywordA: (word1, word2)

Is it possible to do this with regular expression ?
Or should I write a small parser ?

I have tried pattern matching with the 's' and also with the 'm'
option,
but with no good result ...

Thanks to help me !

Olivier


-- 
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/




Re: Parsing file and regexp

2010-02-13 Thread Uri Guttman
 osc == olivier scalb...@algosyn com olivier.scalb...@algosyn.com 
 writes:

  osc keywordA word1, word2, word3;

  osc Here we can have some free text
  osc ...
  osc ...

  osc keywordB word4,
  osc   word5, word6, word7, word8,
  osc   word9, word10;

  osc KeywordA
  osc   word1, word2;

  osc ...

how do you know when a keyword section begins or ends? how large is this
file? could free text have keywords? i see a ; to end a word list but
that isn't enough to properly parse this if you have 'free text'.

  osc I want to extract all the keywords with their associated words.
  osc For example, with this file, I would like to have:
  osc keywordA: (word1, word2, word3)
  osc keywordB: (word4, word5, word6, word7, word8, word9, word10)
  osc keywordA: (word1, word2)

  osc Is it possible to do this with regular expression ?
  osc Or should I write a small parser ?

yes and yes.

  osc I have tried pattern matching with the 's' and also with the 'm'
  osc option,
  osc but with no good result ...

please show your code. there is no way to help otherwise. s/// is not a
pattern matcher but a substitution operator. it uses regexes and can be
used to parse things.

uri

-- 
Uri Guttman  --  u...@stemsystems.com    http://www.sysarch.com --
-  Perl Code Review , Architecture, Development, Training, Support --
-  Gourmet Hot Cocoa Mix    http://bestfriendscocoa.com -

-- 
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/




Re: Regexp to remove spaces

2009-12-29 Thread Dr.Ruud

sftriman wrote:

Dr.Ruud:



sub trim {  ...
}#trim


You're missing the tr to squash space down


To trim() is to remove from head and tail only.
Just use it as an example to build a trim_and_normalize().



So I think it can boil down to:

sub fixsp7 {
s#\A\s+##, s#\s+\z##, tr/ \t\n\r\f/ /s foreach @_;
return;
}


Best remove from the end before removing from the start.

--
Ruud

--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/




Re: Regexp to remove spaces

2009-12-28 Thread sftriman
On Dec 23, 2:31 am, rvtol+use...@isolution.nl (Dr.Ruud) wrote:
 sftriman wrote:
  1ST PLACE - THE WINNER:  5.0s average on 5 runs

  # Limitation - pointer
  sub fixsp5 {
  ${$_[0]}=~tr/ \t\n\r\f/ /s;
  ${$_[0]}=~s/\A //;
  ${$_[0]}=~s/ \z//;
  }

 Just decide to change in-place, based on the defined-ness of wantarray.

 sub trim {
      no warnings 'uninitialized';

      if ( defined wantarray ) {
          # need to return scalar / list
          my @values= @_;
          s#^\s+##s, s#\s+$##s foreach @values;
          return wantarray ? @values : $values[0];
      }

      # need to change in-place
      s#^\s+##s, s#\s+$##s foreach @_;
      return;

 }    #trim

 --
 Ruud

Hi there,

You're missing the tr to squash space down, but I see what you're
doing.
I never need to trim an array at this point, but if I did...

So I think it can boil down to:

sub fixsp7 {
s#\A\s+##, s#\s+\z##, tr/ \t\n\r\f/ /s foreach @_;
return;
}

This is in keeping consistent with my other 6 test cases.  I run it
against
several test strings including some with line breaks to make sure the
results are always the same.  Note I am using \A and \z and not ^ and
$.
Still, I think this has the flavor of what you intended.

Result: 5 trial runs over the same data set, 1,000,000 times, average
time was 16.30s.  All things considered, this puts it in a 4-way tie
for 3rd place with the other methods.  IF - the times above still
stand...

And in fact, they don't.  Why?  CPU usage is high on my box right now.
So I baselined the other methods in the 6.0s range, and they are now
coming in at 25s!  So maybe this one is the fastest!  I'll have to do
more
testing.

To be fair, I had to rewrite the former winner as:

sub fixsp1a {
${$_[0]}=~s/\A\s+//;
${$_[0]}=~s/\s+\z//;
${$_[0]}=~s/\s+/ /g;
}

using \A and \z.

I wonder how expensive that foreach is.  Knowing that it is exactly
one argument, is there a faster way for this to run, not using
foreach?
Even so, this may not be the fastest trim method - in place, no
pointer,
one line, with the foreach @_ as written.

David


--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/




Re: Regexp to remove spaces

2009-12-28 Thread Shawn H Corey
sftriman wrote:
 So I think it can boil down to:
 
 sub fixsp7 {
 s#\A\s+##, s#\s+\z##, tr/ \t\n\r\f/ /s foreach @_;
 return;
 }

sub fixsp7 {
  tr/ \t\n\r\f/ /s, s#\A\s##, s#\s\z##  foreach @_;
  return;
}

Placing the tr/// first reduces the number of characters scanned for
s#\s\z## which makes things slightly faster.


-- 
Just my 0.0002 million dollars worth,
  Shawn

Programming is as much about organization and communication
as it is about coding.

I like Perl; it's the only language where you can bless your
thingy.

-- 
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/




Re: Regexp to remove spaces

2009-12-23 Thread Dr.Ruud

sftriman wrote:


1ST PLACE - THE WINNER:  5.0s average on 5 runs

# Limitation - pointer
sub fixsp5 {
${$_[0]}=~tr/ \t\n\r\f/ /s;
${$_[0]}=~s/\A //;
${$_[0]}=~s/ \z//;
}


Just decide to change in-place, based on the defined-ness of wantarray.

sub trim {
no warnings 'uninitialized';

if ( defined wantarray ) {
# need to return scalar / list
my @values= @_;
s#^\s+##s, s#\s+$##s foreach @values;
return wantarray ? @values : $values[0];
}

# need to change in-place
s#^\s+##s, s#\s+$##s foreach @_;
return;
}#trim


--
Ruud

--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/




Re: Regexp to remove spaces

2009-12-22 Thread sftriman
Thanks to everyone for their input!

So I've tried out many of the methods, first making sure that each
works as I intended it.
Which is, I'm not concerned with multi-line text, just single line
data.  That said, I have noted
that I should use \A and \z in general over ^ and $.

I wrote a 176 byte string for testing, and ran each method 1,000,000
times to time
the speed.  The winner is: 3 regexp, using tr for intra-string
spaces.  I found I could
make this even faster using a pointer to the variable versus passing
in the variable
as a local input parameter, modifying, then returning it.  (In all
cases, my goal is
to write a sub for general use anywhere I want it, so I wrote each
possibility as
a sub.  There ARE cases where I need to compare the the original
string with the
cleaned string, but I can deal with that as need be with local
variables.)

1ST PLACE - THE WINNER:  5.0s average on 5 runs

# Limitation - pointer
sub fixsp5 {
${$_[0]}=~tr/ \t\n\r\f/ /s;
${$_[0]}=~s/\A //;
${$_[0]}=~s/ \z//;
}

2nd PLACE - same as above, but with local variables - 6.0s average on
5 runs

sub fixsp4 {
my ($x)=...@_;
$x=~tr/ \t\n\r\f/ /s;
$x=~s/\A //;
$x=~s/ \z//;
return $x;
}

[ QUESTION - any difference usingmy $x=shift;??? ]

3rd PLACE - 3 way tie, my method, either as variable in, change in
place, or pointer - 17.0s average

sub fixsp0 {
my ($x)=...@_;
$x=~s/^\s+//;
$x=~s/\s+$//;
$x=~s/\s+/ /g;
return $x;
}

# Limitation: pointer
sub fixsp1 {
${$_[0]}=~s/^\s+//;
${$_[0]}=~s/\s+$//;
${$_[0]}=~s/\s+/ /g;
}

# Limitation: change in place
sub fixsp2 {
$_[0]=~s/^\s+//;
$_[0]=~s/\s+$//;
$_[0]=~s/\s+/ /g;
}

4TH PLACE - 20.0s average on 5 runs (did not try change in place or as
pointer)

sub fixsp6 {
my ($x)=...@_;
s/\s+\z//, s/\A\s+//, s/\s+/ /g, for $x;
return $x;
}

5TH PLACE - DEAD LAST! (or DFL in some parlance) - 62.0s average on 3
runs

sub fixsp3 {
my ($x)=...@_;
$x=~s/^(\s+)|(\s+)$//g;
$x=~s/\s+/ /g;
return $x;
}

Any and all comments welcome.

David


-- 
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/




Re: Regexp to remove spaces

2009-12-21 Thread Dr.Ruud

sftriman wrote:

I use this series of regexp all over the place to clean up lines of
text:

$x=~s/^\s+//g;
$x=~s/\s+$//g;
$x=~s/\s+/ /g;

in that order, and note the final one replace \s+ with a single space.


The g-modifier on the first 2 is bogus
(unless you would add an m-modifier).

I currently tend to write it like this:

s/\s+\z//, s/\A\s+//, s/\s+/ /g, for $x;

So first remove tail spaces (less to lshift next).
Then remove head spaces. Then normalize.


For a multi-line buffer you can do it like this:

perl -wle '

  my $x = EOT;
123456   \t
abc def
\t\t\t\t\t\t\t\t
   *** ***   ***   \t
EOT

  s/^\s+//mg, s/\s+$//mg, s/[^\S\n]+/ /g for $x;

  $x =~ s/\n/\n/g;
  print $x, ;
'

123 456
abc def
*** *** ***

--
Ruud

--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/




Re: Regexp to remove spaces

2009-12-21 Thread Dr.Ruud

Shawn H Corey wrote:


$text =~ tr{\t}{ };
$text =~ tr{\n}{ };
$text =~ tr{\r}{ };
$text =~ tr{\f}{ };
$text =~ tr{ }{ }s;


That can be written as:

  tr/\t\n\r\f/ /, tr/ / /s for $text;

But it doesn't remove all leading nor all trailing spaces.

--
Ruud

--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/




Re: Regexp to remove spaces

2009-12-21 Thread Albert Q
2009/12/20 Dr.Ruud rvtol+use...@isolution.nl rvtol%2buse...@isolution.nl

 sftriman wrote:

 I use this series of regexp all over the place to clean up lines of
 text:

 $x=~s/^\s+//g;
 $x=~s/\s+$//g;
 $x=~s/\s+/ /g;

 in that order, and note the final one replace \s+ with a single space.


 The g-modifier on the first 2 is bogus
 (unless you would add an m-modifier).

 I currently tend to write it like this:

s/\s+\z//, s/\A\s+//, s/\s+/ /g, for $x;

 So first remove tail spaces (less to lshift next).
 Then remove head spaces. Then normalize.


 For a multi-line buffer you can do it like this:

 perl -wle '

  my $x = EOT;
123456   \t
 abc def
 \t\t\t\t\t\t\t\t
   *** ***   ***   \t
 EOT

  s/^\s+//mg, s/\s+$//mg, s/[^\S\n]+/ /g for $x;


I know what it does, but I haven't seen this form of *for* before. Where can
I find the description of this syntax in perldoc?

Thanks.



 $x =~ s/\n/\n/g;
  print $x, ;
 '

 123 456
 abc def
 *** *** ***

 --
 Ruud


 --
 To unsubscribe, e-mail: beginners-unsubscr...@perl.org
 For additional commands, e-mail: beginners-h...@perl.org
 http://learn.perl.org/





-- 
missing the days we spend together


Re: Regexp to remove spaces

2009-12-21 Thread Jim Gibson

At 6:11 PM +0800 12/21/09, Albert Q wrote:

2009/12/20 Dr.Ruud rvtol+use...@isolution.nl rvtol%2buse...@isolution.nl


  For a multi-line buffer you can do it like this:


 perl -wle '

  my $x = EOT;
123456   \t
 abc def
 \t\t\t\t\t\t\t\t
   *** ***   ***   \t
 EOT

  s/^\s+//mg, s/\s+$//mg, s/[^\S\n]+/ /g for $x;



I know what it does, but I haven't seen this form of *for* before. Where can
I find the description of this syntax in perldoc?


That is a question about Perl syntax, so look in perldoc perlsyn. 
Search for the section on Statement Modifiers, and realize that 
for and foreach are synonyms.



--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/




Faster way to do a regexp using a hash

2009-12-20 Thread sftriman
I've been wondering for a long time... is there a slick (and hopefully
fast!) way
to do this?

foreach (keys %fixhash) {
$x=~s/\b$_\b/$fixhash{$_}/gi;
}

So if

$x=this could be so cool

and

$fixhash{could}=would;
$fixhash{COOL}=awesome;
$fixhash{beso}=nope;
$fixhash{his}=impossible;

then it would end up

this would be so awesome

Thanks!
David


-- 
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/




Regexp to remove spaces

2009-12-20 Thread sftriman
I use this series of regexp all over the place to clean up lines of
text:

$x=~s/^\s+//g;
$x=~s/\s+$//g;
$x=~s/\s+/ /g;

in that order, and note the final one replace \s+ with a single space.

Basically, it's (1) remove all leading space, (2) remove all trailing
space,
and (3) replace all multi-space with a single space [which, at this
point,
should only occur on interior characters].

Is there a handy way to do this in one regexp?  And, fast?  I've been
using Devel::NYTProf to study code timing and see that some regexp,
especially mine, can be CPU expensive/intensive.

Thanks!
David


-- 
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/




  1   2   3   4   5   6   7   8   9   10   >