Re: multiple named captures with a single regexp
/(\w+)/g gets the command as well and only the args are wanted, so it would need to be my @args = $s =~ / (\w+)/g; shift @args; also, my VAR if TEST; is deprecated IIRC and slated to be removed soon (as it's behavior is surprising). It would probably be better to say my @args = $s =~ /^\w+\s/ && $s =~ /(?:\s+(\w+))/g; or (if you don't like using && like that) my @args = $s =~ /^\w+\s/ ? $s =~ /(?:\s+(\w+))/g : (); On Wed, Mar 1, 2017 at 9:34 AM X Dungenesswrote: > On Wed, Mar 1, 2017 at 2:52 AM, Chas. Owens wrote: > > Sadly, Perl will only capture the last match of capture with a > qualifier, so > > that just won't work. The split function really is the simplest and most > > elegant solution for this sort of problem (you have a string with a > > delimiter and you want the pieces). All of that said, if you are > willing to > > modify the regex you can say > > > > my $s = "command arg1 arg2 arg3 arg4"; > > my @args = $s =~ /(?:\s+(\w+))/g; > > > > Hm, I'd write it as: > my @args = $s =~ / (\w+)/g; > > or, if the command check isn't too inelegant: > > my @args = $s =~ / (\w+)/g if $str =~ /^command\s/; > > > > for my $arg (@args) { > > print "$arg\n"; > > } > > > > However, this does not allow you to check the command is correct. > > >
Re: multiple named captures with a single regexp
On Wed, Mar 1, 2017 at 2:52 AM, Chas. Owenswrote: > Sadly, Perl will only capture the last match of capture with a qualifier, so > that just won't work. The split function really is the simplest and most > elegant solution for this sort of problem (you have a string with a > delimiter and you want the pieces). All of that said, if you are willing to > modify the regex you can say > > my $s = "command arg1 arg2 arg3 arg4"; > my @args = $s =~ /(?:\s+(\w+))/g; > Hm, I'd write it as: my @args = $s =~ / (\w+)/g; or, if the command check isn't too inelegant: my @args = $s =~ / (\w+)/g if $str =~ /^command\s/; > for my $arg (@args) { > print "$arg\n"; > } > > However, this does not allow you to check the command is correct. > -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: multiple named captures with a single regexp
Sadly, Perl will only capture the last match of capture with a qualifier, so that just won't work. The split function really is the simplest and most elegant solution for this sort of problem (you have a string with a delimiter and you want the pieces). All of that said, if you are willing to modify the regex you can say my $s = "command arg1 arg2 arg3 arg4"; my @args = $s =~ /(?:\s+(\w+))/g; for my $arg (@args) { print "$arg\n"; } However, this does not allow you to check the command is correct. Another option, and I would in no way claim this is an elegant solution, is to use code execution in the middle of the regex with (?{}) to pull out the matched fields: @args = (); my $start; $s =~ m{ \w+ # command \s (?{$start = pos;}) # capture the first start position (?: \w+ # the argument # capture the argument (?{ push @args, substr $s, $start, pos() - $start; }) # optional delimiter and capture the next start (?: \s+ (?{ $start = pos; }))? )+ }x; for my $arg (@args) { print "$arg\n"; } Of course, all of these solutions are bound to fail when you hit the real world (assuming the command is a Unix command) as arguments are allowed to have spaces in them if they are quoted. There is a way to do this with regex, but balancing the quotes is far more pain than it is worth. A simple regex to tokenize the string plus some logic to put the quoted sections back together will allow you to extract the arguments from the string: #!/usr/bin/perl use strict; use warnings; my $s = qq("command with space" arg1 "arg 2" "arg3"); my @parts = $s =~ /([ ]+|"|\w+)/g; my @args; my $in_string = 0; my $buf = ""; while (@parts) { my $part = shift @parts; # ditch the delimiters if not in a string next if not $in_string and $part =~ / /; # in strings, a " means end the string # otherwise, just build up a buffer of the things # in the string if ($in_string) { if ($part eq '"') { $in_string = 0; push @args, $buf; $buf = ""; } else { $buf .= $part; } next; } # if not in a string, " means start a string if ($part eq '"') { $in_string = 1; next; } # if not a delimiter or a ", then this is just a normal token push @args, $part; } shift @args; #ditch the command for my $arg (@args) { print "$arg\n"; } Of course, this still doesn't handle Unix commands properly as you can escape " and use ' to create strings, but those details are left as an exercise for the reader. On Wed, Mar 1, 2017 at 4:04 AM Luca Ferrari <fluca1...@infinito.it> wrote: > Hi all, > I'm not sure if this is possible, but imagine I've got a line as follows: > > command arg1 arg2 arg3 arg4 ... > > I would like to capture all args with a single regexp, possibly with a > named capture, but I don't know exactly how to do: > > my $re = qr/command\s+(?\w+)+/; > > the above of course is going to capture only the first one (one shoot) > or the last one within a loop. > How can I extract the whole array of arguments? > > Please note, a raw solution is to remove the command and split, but > I'm asking for a more elegant solution. > > Thanks, > Luca > > -- > To unsubscribe, e-mail: beginners-unsubscr...@perl.org > For additional commands, e-mail: beginners-h...@perl.org > http://learn.perl.org/ > > >
Re: multiple named captures with a single regexp
Hi Luca, On Wed, 1 Mar 2017 10:01:34 +0100 Luca Ferrari <fluca1...@infinito.it> wrote: > Hi all, > I'm not sure if this is possible, but imagine I've got a line as follows: > > command arg1 arg2 arg3 arg4 ... > > I would like to capture all args with a single regexp, possibly with a > named capture, but I don't know exactly how to do: > > my $re = qr/command\s+(?\w+)+/; > > the above of course is going to capture only the first one (one shoot) > or the last one within a loop. > How can I extract the whole array of arguments? > Perhaps try using \G and the /g and possibly /o flags , see: http://perl-begin.org/uses/text-parsing/ (Note that perl-begin is a site that I maintain). Regards, Shlomi Fish > Please note, a raw solution is to remove the command and split, but > I'm asking for a more elegant solution. > > Thanks, > Luca > -- - Shlomi Fish http://www.shlomifish.org/ Freecell Solver - http://fc-solve.shlomifish.org/ It is a good idea to stop worrying about problems (or “problems” in quotes) that cannot be fixed. -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
multiple named captures with a single regexp
Hi all, I'm not sure if this is possible, but imagine I've got a line as follows: command arg1 arg2 arg3 arg4 ... I would like to capture all args with a single regexp, possibly with a named capture, but I don't know exactly how to do: my $re = qr/command\s+(?\w+)+/; the above of course is going to capture only the first one (one shoot) or the last one within a loop. How can I extract the whole array of arguments? Please note, a raw solution is to remove the command and split, but I'm asking for a more elegant solution. Thanks, Luca -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: Regexp under PERL
On 8 July 2015 at 19:12, Nagy Tamas (TVI-GmbH) tamas.n...@tvi-gmbh.de wrote: This is the code: } elsif (defined($row) ($row =~ m/\(\*[ ]+\\@PATH\[ ]+:=[ ]+'(\/)?([\*A-Za-z_ ]*(\/)?)+'[ ]\*\)?/)) { # PATH first version: \(\*[ ]+@PATH[ ]+:=[ ]+'(\\/)?([\*A-Za-z_ ]*(\\/)?)+'[ ]\*\)? my @path = split(':=', $row, 2); $temppath = $path[1]; my trimmedpath = split(''', $temppath, 3); $currentpath = trimmedpath[1]; The last )) ist he closing of the elsif. Sorry. Still no idea. Tamas Nagy Again, you're just bolting stuff together in the email client thinking its the code. There's no way that can work. The most obvious here you have three quote marks in split() meaning everything after that is nonsense. Then you use variables without sigils ( which is also nonsense under strict ) And you entirely forget to declare variables ( again, nonsense under strict ). When you eliminate all those superficial defects, the code has no bugs, and executes silently without so much as a squeak. Attached is what I have, and it doesn't replicate the problem. -- Kent KENTNL - https://metacpan.org/author/KENTNL x.pl Description: Perl program -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
AW: Regexp under PERL
Hi, This is the code: } elsif (defined($row) ($row =~ m/\(\*[ ]+\\@PATH\[ ]+:=[ ]+'(\/)?([\*A-Za-z_ ]*(\/)?)+'[ ]\*\)?/)) { # PATH first version: \(\*[ ]+@PATH[ ]+:=[ ]+'(\\/)?([\*A-Za-z_ ]*(\\/)?)+'[ ]\*\)? my @path = split(':=', $row, 2); $temppath = $path[1]; my trimmedpath = split(''', $temppath, 3); $currentpath = trimmedpath[1]; The last )) ist he closing of the elsif. Sorry. Still no idea. Tamas Nagy -Ursprüngliche Nachricht- Von: Kent Fredric [mailto:kentfred...@gmail.com] Gesendet: Dienstag, 7. Juli 2015 19:03 An: Nagy Tamas (TVI-GmbH) Cc: beginners@perl.org Betreff: Re: Regexp under PERL On 8 July 2015 at 04:40, Nagy Tamas (TVI-GmbH) tamas.n...@tvi-gmbh.de wrote: m/\(\*[ ]+\\@PATH\[ ]+:=[ ]+'(\/)?([\*A-Za-z_ ]*(\/)?)+'[ ]\*\)?/)) This is not the exact code you 're using obviously, because the last 2 ) marks are actually outside the regex. Removing those ))'s makes the regex compile just fine. So we need the code, not just the regex. Ideally, if you can give some perl code that is minimal that replicates your problem exactly, then that would be very helpful in us helping you. Ideally, your code should be reduced as far as possible till you have the least possible amount of code that demonstrates your problem. Additional notes: Values in @PATH are not relevant to your expression, because you explicitly escape the @ to mean a literal @. If you did not escape it, it would have interpolated. But even then, I'd still have no idea what you are doing :) -- Kent KENTNL - https://metacpan.org/author/KENTNL
Regexp under PERL
Hi, PERL shows this line ok, but for the next lines it tells: String found where operator expected at line... m/\(\*[ ]+\\@PATH\[ ]+:=[ ]+'(\/)?([\*A-Za-z_ ]*(\/)?)+'[ ]\*\)?/)) So it seems that it is not ok. I have the proper regexp that was tested at http://www.regexr.com/ # Tested version: \(\*[ ]+@PATH[ ]+:=[ ]+'(\\/)?([\*A-Za-z_ ]*(\\/)?)+'[ ]\*\)? Input data: (* @PATH := '\/ph\/** Forest\/Apple' *) (* @PATH := '\/ph\/** Forest\/Pear' *) (* @PATH := '\/ph\/** Forest\/Tree\/Plum' *) (* @PATH := '\/ph\/** Forest\/Oaktree\/Oak' *) If I use the tested version, it tells: Unmatched ( in regex; marked by -- HERE in m/..:=[ ]+'( -- HERE at . line Tamas
Re: Regexp under PERL
On 8 July 2015 at 04:40, Nagy Tamas (TVI-GmbH) tamas.n...@tvi-gmbh.de wrote: m/\(\*[ ]+\\@PATH\[ ]+:=[ ]+'(\/)?([\*A-Za-z_ ]*(\/)?)+'[ ]\*\)?/)) This is not the exact code you 're using obviously, because the last 2 ) marks are actually outside the regex. Removing those ))'s makes the regex compile just fine. So we need the code, not just the regex. Ideally, if you can give some perl code that is minimal that replicates your problem exactly, then that would be very helpful in us helping you. Ideally, your code should be reduced as far as possible till you have the least possible amount of code that demonstrates your problem. Additional notes: Values in @PATH are not relevant to your expression, because you explicitly escape the @ to mean a literal @. If you did not escape it, it would have interpolated. But even then, I'd still have no idea what you are doing :) -- Kent KENTNL - https://metacpan.org/author/KENTNL -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: is faster a regexp with multiple choices or a single one with lower case?
Hi Bill, On Thu, Jan 8, 2015 at 1:36 AM, $Bill n...@todbe.com wrote: Why not just ignore the case ? Sure it's an option. Why does the script care what the case is ? Is there a rationale for checking it ? Of course there's, and of course my script does different things depending on what I'm looking at. I have just posted a short example to discuss about regular expressions, not about the particular case in my script (that is, by the way, quite simple). Thanks, Luca -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: is faster a regexp with multiple choices or a single one with lower case?
On Wed, 7 Jan 2015 10:59:07 +0200 Shlomi Fish shlo...@shlomifish.org wrote: Anyway, one can use the Benchmark.pm module to determine which alternative is faster, but I suspect their speeds are not going to be that much different. See: http://perl-begin.org/topics/optimising-and-profiling/ (Note: perl-begin.org is a site I originated and maintain). And this is the answer I'd give - if you're curious as to which of two approaches will be faster, benchmark it and find out. It's often better to do this yourself, as the results may in some cases vary widely depending on the system you're running it on, the perl version, how Perl was built, etc. The sure-fire way to see which of multiple options is faster is to use Benchmark.pm to try them and find out :) For an example, I used the following (dirty) short script to set up 1,000 test filenames with random lengths and capitalisation, half of which should match the pattern, and testing each approach against all of those test filenames, 10,000 times: [davidp@supernova:~]$ cat tmp/benchmark_lc.pl #!/usr/bin/perl use strict; use Benchmark; # Put together an array of various test strings, with random # lengths and case my @valid_chars = ('a'..'z', 'A'..'Z'); my @test_data = map { join('', map { $valid_chars[int rand @valid_chars] } 1..rand(10)) . (rand 0.5 ? '.bat' : '.bar') } (1..1000); Benchmark::cmpthese(10_000, { lc_first = sub { for my $string (@test_data) { $string = lc $string; if ($string =~ /\.bat$/) { } } }, regex_nocase = sub { for my $string (@test_data) { if ($string =~ /\.bat$/i) { } } }, }, ); And my results suggest that, for me, using lc() on the string first before attempting to match was around 30% faster: [davidp@supernova:~]$ perl tmp/benchmark_lc.pl Rate regex_nocase lc_first regex_nocase 2674/s -- -24% lc_first 3509/s 31% -- Of course, YMMV. -- David Precious (bigpresh) dav...@preshweb.co.uk http://www.preshweb.co.uk/ www.preshweb.co.uk/twitter www.preshweb.co.uk/linkedinwww.preshweb.co.uk/facebook www.preshweb.co.uk/cpanwww.preshweb.co.uk/github -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: is faster a regexp with multiple choices or a single one with lower case?
On Wed, 7 Jan 2015 07:56:18 + Andrew Solomon and...@geekuni.com wrote: Hi Luca, I haven't tested it, but my suspicion is that your first solution will be faster because regular expressions (which don't contain variables) are only compiled once, while you have a function call for every use of lc. By the way another alternative might be: $extention =~ /\.bat/i (which would also match BaT, BAt...) The second code excerpt that was given will also match all that: « $extension = lc $extension; $extension =~ / \.bat /x; » Anyway, one can use the Benchmark.pm module to determine which alternative is faster, but I suspect their speeds are not going to be that much different. See: http://perl-begin.org/topics/optimising-and-profiling/ (Note: perl-begin.org is a site I originated and maintain). Regards, Shlomi Fish -- - Shlomi Fish http://www.shlomifish.org/ Perl Humour - http://perl-begin.org/humour/ John: Hey, we are completely non-violent vampires. We don’t suck blood. Selina: I thought all vampires suck blood. John: Bullocks, hen. Vampires come in all shapes and sizes. — http://www.shlomifish.org/humour/Selina-Mandrake/ Please reply to list if it's a mailing list post - http://shlom.in/reply . -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
is faster a regexp with multiple choices or a single one with lower case?
Hi all, this could be trivial, and I suspect the answer is that the regexp engine is smart enough, but suppose I want to test the following: $extention =~ / \.bat | \.BAT /x; is the following a better solution? $extension = lc $extension; $extension =~ / \.bat /x; In other words, when testing for all-lower or all-upper cases should I first trasnform to one of them or use a regexp with alternatives? Any suggestion? Thanks, Luca -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: is faster a regexp with multiple choices or a single one with lower case?
Hi Luca, I haven't tested it, but my suspicion is that your first solution will be faster because regular expressions (which don't contain variables) are only compiled once, while you have a function call for every use of lc. By the way another alternative might be: $extention =~ /\.bat/i (which would also match BaT, BAt...) Andrew On Wed, Jan 7, 2015 at 7:45 AM, Luca Ferrari fluca1...@infinito.it wrote: Hi all, this could be trivial, and I suspect the answer is that the regexp engine is smart enough, but suppose I want to test the following: $extention =~ / \.bat | \.BAT /x; is the following a better solution? $extension = lc $extension; $extension =~ / \.bat /x; In other words, when testing for all-lower or all-upper cases should I first trasnform to one of them or use a regexp with alternatives? Any suggestion? Thanks, Luca -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/ -- Andrew Solomon Mentor@Geekuni http://geekuni.com/ http://www.linkedin.com/in/asolomon -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
RegExp
Hi all, how do you get all words starting with letter 'r' in a string. thanks,rakesh
Re: RegExp
On Sat, 8 Mar 2014 18:20:48 +0530 rakesh sharma rakeshsharm...@hotmail.com wrote: Hi all, how do you get all words starting with letter 'r' in a string. thanks,rakesh /\br/ -- Don't stop where the ink does. Shawn -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: RegExp
Hello Rakesh, On Sat, 8 Mar 2014 18:20:48 +0530 rakesh sharma rakeshsharm...@hotmail.com wrote: Hi all, how do you get all words starting with letter 'r' in a string. thanks,rakesh 1. Find all words in the sentence. Your idea of what is a word will need to be specified. 2. Put them in an array - let's say @words. 3. Use « grep { /\Ar/i } @words » . See: * http://perldoc.perl.org/functions/grep.html * https://metacpan.org/pod/List::MoreUtils * https://metacpan.org/pod/List::Util Regards, — Shlomi Fish -- - Shlomi Fish http://www.shlomifish.org/ Escape from GNU Autohell - http://www.shlomifish.org/open-source/anti/autohell/ There is an IGLU Cabal, but its only purpose is to deny the existence of an IGLU Cabal. — Martha Greenberg Please reply to list if it's a mailing list post - http://shlom.in/reply . -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: RegExp
Am 08.03.2014 13:50, schrieb rakesh sharma: how do you get all words starting with letter 'r' in a string. What have you tried so far? Greetings, Janek -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: RegExp
On Mar 8, 2014, at 4:50 AM, rakesh sharma rakeshsharm...@hotmail.com wrote: Hi all, how do you get all words starting with letter 'r' in a string. Try my @rwords = $string =~ /\br\w*?\b/g; -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: regexp puzzle
On 3/8/2014 12:05 AM, Bill McCormick wrote: I have the following string I want to extract from: my $str = foo (3 bar): baz; and I want to to extract to end up with $p1 = foo; $p2 = 3; $p3 = baz; the complication is that the \s(\d\s.+) is optional, so in then $p2 may not be set. getting close was my ($p1, $p3) = $str =~ /^(.+):\s(.*)$/; How can I make the (3 bar) optional. Here's what I came up with: ($key, $lines, $value) = $_ =~ /^(.+?)(?:\s\((\d)\s.+\))?:\s(.*)$/; --- This email is free from viruses and malware because avast! Antivirus protection is active. http://www.avast.com -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
regexp puzzle
I have the following string I want to extract from: my $str = foo (3 bar): baz; and I want to to extract to end up with $p1 = foo; $p2 = 3; $p3 = baz; the complication is that the \s(\d\s.+) is optional, so in then $p2 may not be set. getting close was my ($p1, $p3) = $str =~ /^(.+):\s(.*)$/; How can I make the (3 bar) optional. Thanks! --- This email is free from viruses and malware because avast! Antivirus protection is active. http://www.avast.com -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: regexp puzzle
([^]+) \(([0-9]+).*\) ([a-z]+) On Mar 8, 2014 1:07 AM, Bill McCormick wpmccorm...@gmail.com wrote: I have the following string I want to extract from: my $str = foo (3 bar): baz; and I want to to extract to end up with $p1 = foo; $p2 = 3; $p3 = baz; the complication is that the \s(\d\s.+) is optional, so in then $p2 may not be set. getting close was my ($p1, $p3) = $str =~ /^(.+):\s(.*)$/; How can I make the (3 bar) optional. Thanks! --- This email is free from viruses and malware because avast! Antivirus protection is active. http://www.avast.com -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: regexp puzzle
On 3/8/2014 12:41 AM, shawn wilson wrote: my $str = foo (3 bar): baz; my $test = foo (3 bar): baz; my ($p1, $p2, $p3) = $test =~ /([^]+) \(([0-9]+).*\) ([a-z]+)/; print p1=[$p1] p2=[$p2] p3=[$p3]\n; Use of uninitialized value $p1 in concatenation (.) or string at ./lock_report.pl line 11. Use of uninitialized value $p2 in concatenation (.) or string at ./lock_report.pl line 11. Use of uninitialized value $p3 in concatenation (.) or string at ./lock_report.pl line 11. p1=[] p2=[] p3=[] P --- This email is free from viruses and malware because avast! Antivirus protection is active. http://www.avast.com -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: regexp puzzle
On Mar 8, 2014 1:41 AM, shawn wilson ag4ve...@gmail.com wrote: Oh and per optional, just do (?:\([0-9]+).*\)? You should probably use do my @match = $str =~ / ([^]+) (?:\([0-9]+).*\)? ([a-z]+)/; my ($a, $b, $c) = (scalar(@match) == 3 ? @match : $match[0], undef, $match[1]); ([^]+) \(([0-9]+).*\) ([a-z]+) On Mar 8, 2014 1:07 AM, Bill McCormick wpmccorm...@gmail.com wrote: I have the following string I want to extract from: my $str = foo (3 bar): baz; and I want to to extract to end up with $p1 = foo; $p2 = 3; $p3 = baz; the complication is that the \s(\d\s.+) is optional, so in then $p2 may not be set. getting close was my ($p1, $p3) = $str =~ /^(.+):\s(.*)$/; How can I make the (3 bar) optional. Thanks! --- This email is free from viruses and malware because avast! Antivirus protection is active. http://www.avast.com -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: regexp puzzle
On Mar 7, 2014, at 10:05 PM, Bill McCormick wpmccorm...@gmail.com wrote: I have the following string I want to extract from: my $str = foo (3 bar): baz; and I want to to extract to end up with $p1 = foo; $p2 = 3; $p3 = baz; the complication is that the \s(\d\s.+) is optional, so in then $p2 may not be set. getting close was my ($p1, $p3) = $str =~ /^(.+):\s(.*)$/; You can make a substring optional by following it with the ? quantifier. If you substring is more than one character, you can group it with capturing parentheses or a non-capturing grouping construct (?: ). Here is a sample, using the extended regular expression syntax with the x option: my( $p1, $p2, $p3 ) = $str =~ m{ \A (\w+) \s+ (?: \( (\d+) \s+ \w+ \) )? : \s (\w+) }x; if( $p1 $p3 ) { print “p1=$p1, p2=$p2, p3=$p3\n”; }else{ print “No match\n”; } Always test the returned values to see if the match succeeded. So if '(3 bar)’ is not present, does the colon still remain? That will determine if the colon should be inside or outside the optional substring part. -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
perl regexp performance - architecture?
I'm currently loading some new servers with CentOS6 on which perl5.10 is the standard version of perl provided. However, I've also loaded perl5.18 and I don't think the version of perl is significant in the results I'm seeing. Basically, I'm seeing perl performance significantly slower on my new systems than on my 6 year old systems. Here's some of the relevant details: + 6 year old server, 32 bit architecture, CentOS5 perl5.8 perl, and in particular regexp operations, perform reasonably fast. + Very new server, 64 bit architecture, CentOS6, perl5.10 (and have tried perl5.18) perl, and in particular regexp operations, perform significantly slower than on the 6 year old server. That struck me as odd right off. I though surely, perl running on a modern high-end cpu is going to beat out my code running on 6 year old hardware. I've compared CPU models at various CPU benchmarking sites and the new CPUs, as you would expect, are ranked significantly higher in performance than the old. I've also installed perl5.8 on the new 64bit servers and the performance is similar to that of perl5.10 and perl5.18 on the same 64bit servers. Given that, I don't think perl version plays a significant factor is the performance diffs. Is it an accepted fact that perl performance takes a hit on 64 bit architecture? I've tried comparing some of the perl -V and Config.pm results looking for significant differences. That output is pretty verbose and the most significant difference is the architecture. I could provide some of my benchmarking code if that would be of help. The differences are significant. The only reason I'm looking at this is because I could see right off that some of my code is taking 30-40% longer to run in the new environment. Once I started putting in some timing with Time::HiRes I could see the delay involved large amounts of regexp processing. Right now, I'm just looking for any opinions on what I'm seeing so that I know the architecture is the significant factor in the performance degradation and then consider any recommendations for improvements. I'm happy to provide further relevant details. Thanks, Phil
Re: perl regexp performance - architecture?
On Mon, Feb 17, 2014 at 12:41 PM, Phil Smith philbo...@gmail.com wrote: I'm currently loading some new servers with CentOS6 on which perl5.10 is the standard version of perl provided. However, I've also loaded perl5.18 and I don't think the version of perl is significant in the results I'm seeing. Basically, I'm seeing perl performance significantly slower on my new systems than on my 6 year old systems. Here's some of the relevant details: + 6 year old server, 32 bit architecture, CentOS5 perl5.8 perl, and in particular regexp operations, perform reasonably fast. + Very new server, 64 bit architecture, CentOS6, perl5.10 (and have tried perl5.18) perl, and in particular regexp operations, perform significantly slower than on the 6 year old server. That struck me as odd right off. I though surely, perl running on a modern high-end cpu is going to beat out my code running on 6 year old hardware. I've compared CPU models at various CPU benchmarking sites and the new CPUs, as you would expect, are ranked significantly higher in performance than the old. I've also installed perl5.8 on the new 64bit servers and the performance is similar to that of perl5.10 and perl5.18 on the same 64bit servers. Given that, I don't think perl version plays a significant factor is the performance diffs. Is it an accepted fact that perl performance takes a hit on 64 bit architecture? I've tried comparing some of the perl -V and Config.pm results looking for significant differences. That output is pretty verbose and the most significant difference is the architecture. I could provide some of my benchmarking code if that would be of help. The differences are significant. The only reason I'm looking at this is because I could see right off that some of my code is taking 30-40% longer to run in the new environment. Once I started putting in some timing with Time::HiRes I could see the delay involved large amounts of regexp processing. Right now, I'm just looking for any opinions on what I'm seeing so that I know the architecture is the significant factor in the performance degradation and then consider any recommendations for improvements. I'm happy to provide further relevant details. This sounds like it could be something OS-specific and, googling CentOS regex performance generates hits, eg, http://pkgs.org/centos-5/puias-computational-x86_64/boost141-regex-1.4.0-2.el5.x86_64.rpm.html HTH, Charles DeRykus
Re: perl regexp performance - architecture?
On Mon, Feb 17, 2014 at 6:16 PM, Charles DeRykus dery...@gmail.com wrote: On Mon, Feb 17, 2014 at 12:41 PM, Phil Smith philbo...@gmail.com wrote: I'm currently loading some new servers with CentOS6 on which perl5.10 is the standard version of perl provided. However, I've also loaded perl5.18 and I don't think the version of perl is significant in the results I'm seeing. Basically, I'm seeing perl performance significantly slower on my new systems than on my 6 year old systems. Here's some of the relevant details: + 6 year old server, 32 bit architecture, CentOS5 perl5.8 perl, and in particular regexp operations, perform reasonably fast. + Very new server, 64 bit architecture, CentOS6, perl5.10 (and have tried perl5.18) perl, and in particular regexp operations, perform significantly slower than on the 6 year old server. That struck me as odd right off. I though surely, perl running on a modern high-end cpu is going to beat out my code running on 6 year old hardware. I've compared CPU models at various CPU benchmarking sites and the new CPUs, as you would expect, are ranked significantly higher in performance than the old. I've also installed perl5.8 on the new 64bit servers and the performance is similar to that of perl5.10 and perl5.18 on the same 64bit servers. Given that, I don't think perl version plays a significant factor is the performance diffs. Is it an accepted fact that perl performance takes a hit on 64 bit architecture? I've tried comparing some of the perl -V and Config.pm results looking for significant differences. That output is pretty verbose and the most significant difference is the architecture. I could provide some of my benchmarking code if that would be of help. The differences are significant. The only reason I'm looking at this is because I could see right off that some of my code is taking 30-40% longer to run in the new environment. Once I started putting in some timing with Time::HiRes I could see the delay involved large amounts of regexp processing. Right now, I'm just looking for any opinions on what I'm seeing so that I know the architecture is the significant factor in the performance degradation and then consider any recommendations for improvements. I'm happy to provide further relevant details. This sounds like it could be something OS-specific and, googling CentOS regex performance generates hits, eg, http://pkgs.org/centos-5/puias-computational-x86_64/boost141-regex-1.4.0-2.el5.x86_64.rpm.html No, I really don't think it is specific to a version of CentOS. I've installed various permutations of 32 and 64 bit CentOS 5 and 6. The better performance seems to follow the 32 bit architecture rather than a specific Perl version or CentOS version. Phil
Fwd: perl regexp performance - architecture?
On Mon, Feb 17, 2014 at 4:25 PM, Phil Smith philbo...@gmail.com wrote: On Mon, Feb 17, 2014 at 6:16 PM, Charles DeRykus dery...@gmail.comwrote: On Mon, Feb 17, 2014 at 12:41 PM, Phil Smith philbo...@gmail.com wrote: I'm currently loading some new servers with CentOS6 on which perl5.10 is the standard version of perl provided. However, I've also loaded perl5.18 and I don't think the version of perl is significant in the results I'm seeing. Basically, I'm seeing perl performance significantly slower on my new systems than on my 6 year old systems. Here's some of the relevant details: + 6 year old server, 32 bit architecture, CentOS5 perl5.8 perl, and in particular regexp operations, perform reasonably fast. + Very new server, 64 bit architecture, CentOS6, perl5.10 (and have tried perl5.18) perl, and in particular regexp operations, perform significantly slower than on the 6 year old server. That struck me as odd right off. I though surely, perl running on a modern high-end cpu is going to beat out my code running on 6 year old hardware. I've compared CPU models at various CPU benchmarking sites and the new CPUs, as you would expect, are ranked significantly higher in performance than the old. I've also installed perl5.8 on the new 64bit servers and the performance is similar to that of perl5.10 and perl5.18 on the same 64bit servers. Given that, I don't think perl version plays a significant factor is the performance diffs. Is it an accepted fact that perl performance takes a hit on 64 bit architecture? I've tried comparing some of the perl -V and Config.pm results looking for significant differences. That output is pretty verbose and the most significant difference is the architecture. I could provide some of my benchmarking code if that would be of help. The differences are significant. The only reason I'm looking at this is because I could see right off that some of my code is taking 30-40% longer to run in the new environment. Once I started putting in some timing with Time::HiRes I could see the delay involved large amounts of regexp processing. Right now, I'm just looking for any opinions on what I'm seeing so that I know the architecture is the significant factor in the performance degradation and then consider any recommendations for improvements. I'm happy to provide further relevant details. This sounds like it could be something OS-specific and, googling CentOS regex performance generates hits, eg, http://pkgs.org/centos-5/puias-computational-x86_64/boost141-regex-1.4.0-2.el5.x86_64.rpm.html No, I really don't think it is specific to a version of CentOS. I've installed various permutations of 32 and 64 bit CentOS 5 and 6. The better performance seems to follow the 32 bit architecture rather than a specific Perl version or CentOS version. Newer perl regex engines have added Unicode support which can add drag. I'd be surprised though if just the 64-bit architecture itself was totally responsible for major slowdowns. Some of the issues are mentioned here: http://stackoverflow.com/questions/17800112/upgraded-from-perl-5-8-32bit-to-5-16-64bit-regex-performance-hit Per above, some of the items, you'll need to be careful with: were both Perls compiled with the same flags? are both perls threaded perls (disabling threading support makes it faster) how big are your integers? 64 bit or 32 bit? what compiler optimizations were chosen? did your previous Perl have some distribution-specific patches applied? Basically, you have to compare the whole perl -V output -- Charles DeRykus As you can see, you need to be carefully examining the comparison scenarios. -- Charles DeRykus
Re: perl regexp performance - architecture?
On Mon, Feb 17, 2014 at 9:10 PM, Charles DeRykus dery...@gmail.com wrote: On Mon, Feb 17, 2014 at 4:25 PM, Phil Smith philbo...@gmail.com wrote: On Mon, Feb 17, 2014 at 6:16 PM, Charles DeRykus dery...@gmail.comwrote: On Mon, Feb 17, 2014 at 12:41 PM, Phil Smith philbo...@gmail.comwrote: I'm currently loading some new servers with CentOS6 on which perl5.10 is the standard version of perl provided. However, I've also loaded perl5.18 and I don't think the version of perl is significant in the results I'm seeing. Basically, I'm seeing perl performance significantly slower on my new systems than on my 6 year old systems. Here's some of the relevant details: + 6 year old server, 32 bit architecture, CentOS5 perl5.8 perl, and in particular regexp operations, perform reasonably fast. + Very new server, 64 bit architecture, CentOS6, perl5.10 (and have tried perl5.18) perl, and in particular regexp operations, perform significantly slower than on the 6 year old server. That struck me as odd right off. I though surely, perl running on a modern high-end cpu is going to beat out my code running on 6 year old hardware. I've compared CPU models at various CPU benchmarking sites and the new CPUs, as you would expect, are ranked significantly higher in performance than the old. I've also installed perl5.8 on the new 64bit servers and the performance is similar to that of perl5.10 and perl5.18 on the same 64bit servers. Given that, I don't think perl version plays a significant factor is the performance diffs. Is it an accepted fact that perl performance takes a hit on 64 bit architecture? I've tried comparing some of the perl -V and Config.pm results looking for significant differences. That output is pretty verbose and the most significant difference is the architecture. I could provide some of my benchmarking code if that would be of help. The differences are significant. The only reason I'm looking at this is because I could see right off that some of my code is taking 30-40% longer to run in the new environment. Once I started putting in some timing with Time::HiRes I could see the delay involved large amounts of regexp processing. Right now, I'm just looking for any opinions on what I'm seeing so that I know the architecture is the significant factor in the performance degradation and then consider any recommendations for improvements. I'm happy to provide further relevant details. This sounds like it could be something OS-specific and, googling CentOS regex performance generates hits, eg, http://pkgs.org/centos-5/puias-computational-x86_64/boost141-regex-1.4.0-2.el5.x86_64.rpm.html No, I really don't think it is specific to a version of CentOS. I've installed various permutations of 32 and 64 bit CentOS 5 and 6. The better performance seems to follow the 32 bit architecture rather than a specific Perl version or CentOS version. Newer perl regex engines have added Unicode support which can add drag. I'd be surprised though if just the 64-bit architecture itself was totally responsible for major slowdowns. Some of the issues are mentioned here: http://stackoverflow.com/questions/17800112/upgraded-from-perl-5-8-32bit-to-5-16-64bit-regex-performance-hit Per above, some of the items, you'll need to be careful with: were both Perls compiled with the same flags? are both perls threaded perls (disabling threading support makes it faster) how big are your integers? 64 bit or 32 bit? what compiler optimizations were chosen? did your previous Perl have some distribution-specific patches applied? Basically, you have to compare the whole perl -V output -- Charles DeRykus As you can see, you need to be carefully examining the comparison scenarios. -- Charles DeRykus Yes... I saw that link as well, Charles. I mentioned in my original post that I was looking at the diffs in perl -V output. The output is pretty verbose, but the differences seem to focus on 32bit vs 64bit architecture and configs that you would expect related to that as in various byte size definitions. Like many people (and that's an assumption), I don't build perl. I take what comes with a given distribution as with CentOS5 and CentOS6 (and soon to be CentOS7). Yes, I realize they provide versions well earlier than what is the most recent. Given that mode, which again I assume to be a common practice, I would expect the performance degradation to be something many people would commonly notice when they moved from 32bit to 64bit machines. I've tried perl5.8.8 on both 32bit and 64bit where the -V output seems limited to arch differences, so based on that the only common thread in the performance tests is the architecture and the better performance seems to follow 32 bit. Thanks, Phil
regexp as hash value?
Hi, I'm just wondering if it is possible to place a regexp as a value into an hash so to use it later as something like: my $string =~ $hash_ref-{ $key }; Is it possible? Should I take into account something special? Thanks, Luca -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: regexp as hash value?
On Sat, Jan 25, 2014 at 03:51:53PM +0100, Luca Ferrari wrote: Hi, I'm just wondering if it is possible to place a regexp as a value into an hash so to use it later as something like: my $string =~ $hash_ref-{ $key }; Is it possible? Should I take into account something special? Yes, this is possible. You need to use qr// to construct your RE: $ perl -E '$h = { a = qr/y/ }; say $_ =~ $h-{a} for qw(x y z)' 1 $ -- Paul Johnson - p...@pjcj.net http://www.pjcj.net -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: regexp as hash value?
On Sat, Jan 25, 2014 at 4:12 PM, Paul Johnson p...@pjcj.net wrote: $ perl -E '$h = { a = qr/y/ }; say $_ =~ $h-{a} for qw(x y z)' Thanks, but then another doubt: having a look at http://perldoc.perl.org/perlop.html#Regexp-Quote-Like-Operators I dont understand how I can use the regexp for substitution, that is s/// as an hash value. The following is not working: my $hash = { q/regexp/ = qr/s,from,to,/ }; any clue? THanks, Luca -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: regexp as hash value?
On Sat, Jan 25, 2014 at 06:41:00PM +0100, Luca Ferrari wrote: On Sat, Jan 25, 2014 at 4:12 PM, Paul Johnson p...@pjcj.net wrote: $ perl -E '$h = { a = qr/y/ }; say $_ =~ $h-{a} for qw(x y z)' Thanks, but then another doubt: having a look at http://perldoc.perl.org/perlop.html#Regexp-Quote-Like-Operators I dont understand how I can use the regexp for substitution, that is s/// as an hash value. The following is not working: my $hash = { q/regexp/ = qr/s,from,to,/ }; You won't be able to do the full substitution this way, but you can use the RE in the substitution: $ perl -E '$h = { a = qr/y/ }; say $_ =~ s/$h-{a}/p/r for qw(x y z)' x p z $ If the replacement text is different for each substitution then you may be better served storing anonymous subs in your hash: $ perl -E '$h = { a = sub { s/y/p/r } }; say $h-{a}-() for qw(x y z)' x p z $ Good luck. -- Paul Johnson - p...@pjcj.net http://www.pjcj.net -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: regexp and parsing assistance
On Jun 8, 2013, at 8:06 PM, Noah wrote: Hi there, I am attempting to parse the following output and not quite sure how to do it. The text is in columns and spaced out that way regardless if there are 0 numbers in say col5 or Col 6 or not. If the column has an entry then I want to save it to a variable if there is no entry then that variable will be equal to 'blank' The first line is a header and can be ignored. C Col2 C Col4 Col5 Col6 Col7Col8 new_line * 123.456.789.101/85 A 803Reject new_line B 804 76 10 800.99.999.098765 78910 I new_line O 805 1234 1 800.9.999.1 98765 78910 I new_line If your data consists of constant-width fields, then the best approach is to use the unpack function. See 'perldoc -f unpack' for how to use it and 'perldoc -f pack' for the template parameters that describe your data. This statement will unpack the second and third data lines you have shown, presuming that you have read the lines into the variable $line: my @fields = unpack('A2 A19 A2 A3 A11 A11 A18 A5 A5 A1',$line); However, your data as shown has variable data in the first or second column. If that is really the case, then you will have to look at the first twenty columns of your data and determine where column three starts. Then you can use the unpack function to parse the rest of the columns. Maybe something like this: if( $line =~ /^\s{20}/ ) { # no data in first 20 columns, unpack remainder $line = substr($line,20); }else{ # data in first 20 columns -- remove first two fields $line =~ s/\S+\s\S+\s//; } my @fields = unpack('A2 A3 A11 A11 A18 A5 A5 A1',$line); Exactly what you need to do depends upon the exact nature of your data and how much it varies from line to line. Good luck! -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: regexp and parsing assistance
On 6/9/13 9:00 AM, Jim Gibson wrote: On Jun 8, 2013, at 8:06 PM, Noah wrote: Hi there, I am attempting to parse the following output and not quite sure how to do it. The text is in columns and spaced out that way regardless if there are 0 numbers in say col5 or Col 6 or not. If the column has an entry then I want to save it to a variable if there is no entry then that variable will be equal to 'blank' The first line is a header and can be ignored. C Col2 C Col4 Col5 Col6 Col7Col8 new_line * 123.456.789.101/85 A 803Reject new_line B 804 76 10 800.99.999.098765 78910 I new_line O 805 1234 1 800.9.999.1 98765 78910 I new_line If your data consists of constant-width fields, then the best approach is to use the unpack function. See 'perldoc -f unpack' for how to use it and 'perldoc -f pack' for the template parameters that describe your data. This statement will unpack the second and third data lines you have shown, presuming that you have read the lines into the variable $line: my @fields = unpack('A2 A19 A2 A3 A11 A11 A18 A5 A5 A1',$line); However, your data as shown has variable data in the first or second column. If that is really the case, then you will have to look at the first twenty columns of your data and determine where column three starts. Then you can use the unpack function to parse the rest of the columns. Maybe something like this: if( $line =~ /^\s{20}/ ) { # no data in first 20 columns, unpack remainder $line = substr($line,20); }else{ # data in first 20 columns -- remove first two fields $line =~ s/\S+\s\S+\s//; } my @fields = unpack('A2 A3 A11 A11 A18 A5 A5 A1',$line); Exactly what you need to do depends upon the exact nature of your data and how much it varies from line to line. Good luck! Thanks Jim, Here is the regexp I came up with. Works really well if ($line =~ /^\*\s(\S+)\s+(\S)\s+(\d+)\s{6}([\s\d]{0,5})\s{6}([\s\d]{0,5})[\s\]+(\S+)(.*)/) { then I strip the start and trailing spaces for the scalars I collect. Cheers, Noah -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
regexp and parsing assistance
Hi there, I am attempting to parse the following output and not quite sure how to do it. The text is in columns and spaced out that way regardless if there are 0 numbers in say col5 or Col 6 or not. If the column has an entry then I want to save it to a variable if there is no entry then that variable will be equal to 'blank' The first line is a header and can be ignored. C Col2 C Col4 Col5 Col6 Col7Col8 new_line * 123.456.789.101/85 A 803Reject new_line B 804 76 10 800.99.999.098765 78910 I new_line O 805 1234 1 800.9.999.1 98765 78910 I new_line Any assistance is helpful. Cheers -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: character setts in a regexp
On Fri, Jan 11, 2013 at 2:01 PM, Christer Palm b...@bredband.net wrote: Do you have suggestions on this character issue? Is it possible to determine the character set of a text efficiently? Is it other ways to solve the problem? As far as other ways to solve the problem, my suggestion would be to not use regexps to parse XML, use an XML parser. For example, something like https://metacpan.org/module/XML::Feed . chrs, john. -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: character setts in a regexp
On Sat, Jan 12, 2013 at 12:56 PM, Charles DeRykus dery...@gmail.com wrote: On Fri, Jan 11, 2013 at 2:01 PM, Christer Palm b...@bredband.net wrote: Hi! I have a perl script that parses RSS streams from different news sources and experience problems with national characters in a regexp function used for matching a keyword list with the RSS data. Everything works fine with a simple regexp for plain english i.e. words containing the letters A-Z, a-z, 0-9. if ( $description =~ m/\b$key/i ) {….} Keywords or RSS data with national characters don’t work at all. I’m not really surprised this was expected as character sets used in the different RSS streams are outside my control. I am have the ”use utf8;” function activated but I’m not really sure if it is needed. I can’t see any difference used or not. If a convert all the national characters used in the keyword list to html type ”aring” and so on. Changes every occurrence of octal, unicode characters used i.e. decimal and hex to html type in the RSS data in a character parser everything works fine but takes time that I don’t what to avoid. Do you have suggestions on this character issue? Is it possible to determine the character set of a text efficiently? Is it other ways to solve the problem? ... #!/usr/bin/perl use strict; use warnings; binmode(STDOUT, :utf8); $cosa = my \x{263a}; print cosa=$cosa\n; print found smiley at \\b\n if $cosa =~ /\b\x{263a}/; print found smiley (no \\b) if $cosa =~ /\x{263a}/; The output: cosa=my ☺ found smiley (no \b) From: http://www.unicode.org/reports/tr18/#Simple_Word_Boundaries --- Most regular expression engines allow a test for word boundaries (such as by \b in Perl). They generally use a very simple mechanism for determining word boundaries: one example of that would be having word boundaries between any pair of characters where one is a word_character and the other is not, or at the start and end of a string. This is not adequate for Unicode regular expressions. - Based on the above, Perl's \b semantics appear to be not adequate for Unicode regular expressions since, it doesn't address extended code points of Unicode, only values in the alphanumeric range and underscore. So, you may possibly want to try a preceding space to delimit the keyword print match if my \x{263a}=~ / \x{263a}/; # matches! #print match if my \b\x{263a} =~ /\b\x{263a/; # would not match -- Charles DeRykus -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: character setts in a regexp
On Fri, Jan 11, 2013 at 2:01 PM, Christer Palm b...@bredband.net wrote: Hi! I have a perl script that parses RSS streams from different news sources and experience problems with national characters in a regexp function used for matching a keyword list with the RSS data. Everything works fine with a simple regexp for plain english i.e. words containing the letters A-Z, a-z, 0-9. if ( $description =~ m/\b$key/i ) {….} Keywords or RSS data with national characters don’t work at all. I’m not really surprised this was expected as character sets used in the different RSS streams are outside my control. I am have the ”use utf8;” function activated but I’m not really sure if it is needed. I can’t see any difference used or not. If a convert all the national characters used in the keyword list to html type ”aring” and so on. Changes every occurrence of octal, unicode characters used i.e. decimal and hex to html type in the RSS data in a character parser everything works fine but takes time that I don’t what to avoid. Do you have suggestions on this character issue? Is it possible to determine the character set of a text efficiently? Is it other ways to solve the problem? /Christer -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/ On Fri, Jan 11, 2013 at 2:01 PM, Christer Palm b...@bredband.net wrote: Hi! I have a perl script that parses RSS streams from different news sources and experience problems with national characters in a regexp function used for matching a keyword list with the RSS data. Everything works fine with a simple regexp for plain english i.e. words containing the letters A-Z, a-z, 0-9. if ( $description =~ m/\b$key/i ) {….} Keywords or RSS data with national characters don’t work at all. I’m not really surprised this was expected as character sets used in the different RSS streams are outside my control. I am have the ”use utf8;” function activated but I’m not really sure if it is needed. I can’t see any difference used or not. If a convert all the national characters used in the keyword list to html type ”aring” and so on. Changes every occurrence of octal, unicode characters used i.e. decimal and hex to html type in the RSS data in a character parser everything works fine but takes time that I don’t what to avoid. Do you have suggestions on this character issue? Is it possible to determine the character set of a text efficiently? Is it other ways to solve the problem? I'm not sure if this is related but the docs mention some character and byte semantics overlap. *** START perlunicode: ..As discussed elsewhere, Perl has one foot (two hooves?) planted in each of two worlds: the old world of bytes and the new world of characters, upgrading from bytes to characters when necessary. If your legacy code does not explicitly use Unicode, no automatic switch-over to characters should happen. Characters shouldn't get downgraded to bytes, either. It is possible to accidentally mix bytes and characters, however (see perluniintro), in which case \w in regular expressions might start behaving differently (unless the /a modifier is in effect). Review your code. Use warnings and the strict pragma. *** END perlunicode speculate Perhaps, although not explicit, this downgrading might potentially impact \b as well as \w. Here's an example which appears to support this since adding \b causes the match to fail. (There may workaround via the character properties mentioned in perlunicode) /speculate #!/usr/bin/perl use strict; use warnings; binmode(STDOUT, :utf8); $cosa = my \x{263a}; print cosa=$cosa\n; print found smiley at \\b\n if $cosa =~ /\b\x{263a}/; print found smiley (no \\b) if $cosa =~ /\x{263a}/; The output: cosa=my ☺ found smiley (no \b) -- Charles DeRykus -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
character setts in a regexp
Hi! I have a perl script that parses RSS streams from different news sources and experience problems with national characters in a regexp function used for matching a keyword list with the RSS data. Everything works fine with a simple regexp for plain english i.e. words containing the letters A-Z, a-z, 0-9. if ( $description =~ m/\b$key/i ) {….} Keywords or RSS data with national characters don’t work at all. I’m not really surprised this was expected as character sets used in the different RSS streams are outside my control. I am have the ”use utf8;” function activated but I’m not really sure if it is needed. I can’t see any difference used or not. If a convert all the national characters used in the keyword list to html type ”aring” and so on. Changes every occurrence of octal, unicode characters used i.e. decimal and hex to html type in the RSS data in a character parser everything works fine but takes time that I don’t what to avoid. Do you have suggestions on this character issue? Is it possible to determine the character set of a text efficiently? Is it other ways to solve the problem? /Christer -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: character setts in a regexp
On Jan 11, 2013, at 2:01 PM, Christer Palm wrote: Hi! I have a perl script that parses RSS streams from different news sources and experience problems with national characters in a regexp function used for matching a keyword list with the RSS data. Everything works fine with a simple regexp for plain english i.e. words containing the letters A-Z, a-z, 0-9. if ( $description =~ m/\b$key/i ) {….} Keywords or RSS data with national characters don’t work at all. I’m not really surprised this was expected as character sets used in the different RSS streams are outside my control. I am have the ”use utf8;” function activated but I’m not really sure if it is needed. I can’t see any difference used or not. The 'use utf8;' is necessary if you have UTF-8 characters in your Perl source file that you want interpreted correctly, e.g., in string literals or variable names. If a convert all the national characters used in the keyword list to html type ”aring” and so on. Changes every occurrence of octal, unicode characters used i.e. decimal and hex to html type in the RSS data in a character parser everything works fine but takes time that I don’t what to avoid. Do you have suggestions on this character issue? Is it possible to determine the character set of a text efficiently? Is it other ways to solve the problem? Have you read the following? perldoc perlunitut perldoc perlunicode perldoc perlunifaq -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: character setts in a regexp
On Fri, Jan 11, 2013 at 11:01:45PM +0100, Christer Palm wrote: Hi! Hello, I have a perl script that parses RSS streams from different news sources and experience problems with national characters in a regexp function used for matching a keyword list with the RSS data. Everything works fine with a simple regexp for plain english i.e. words containing the letters A-Z, a-z, 0-9. if ( $description =~ m/\b$key/i ) {….} Keywords or RSS data with national characters don’t work at all. I’m not really surprised this was expected as character sets used in the different RSS streams are outside my control. The XML standard provides a way to specify the character set in the XML document. ?xml version=1.0 encoding=utf-8? Are you parsing the XML unintelligently (e.g., regex) or are you using an XML parser to do it? I have done limited XML parsing in Perl, but I would seek an API that supports the XML standards for encodings and ideally just does the Right Thing(tm). In theory, it should Just Work(tm) if you can find an appropriate family of modules. I am have the ”use utf8;” function activated but I’m not really sure if it is needed. I can’t see any difference used or not. As mentioned, the utf8 pragma basically just tells perl that the source file is UTF-8 encoded (and so literal strings should be considered UTF-8 text, for example). The Encode module can be used to manually decode and encode strings between various encodings. E.g., if you know the text is UTF-16LE then you can do this: use Encode; my $input = getRssStream(); my $text = Encode::decode('UTF-16LE', $input); Encodings are also supported at the IO layer, so depending on where you're getting it from you might be able to just inform said layers of the encoding and have the rest automatic. E.g., # Something like this: binmode $socket, ':encoding(UTF-16LE)'; Do you have suggestions on this character issue? Is it possible to determine the character set of a text efficiently? Is it other ways to solve the problem? There are some modules to guess encodings (e.g., File::BOM). Of course, it's impossible to be certain. It's best to use the standards in the transport protocol or data format to define the encoding so that you know for sure what is expected and don't have to guess (because it isn't always possible to detect it correctly). Regards, -- Brandon McCaig bamcc...@gmail.com bamcc...@castopulence.org Castopulence Software https://www.castopulence.org/ Blog http://www.bamccaig.com/ perl -E '$_=q{V zrna gur orfg jvgu jung V fnl. }. q{Vg qbrfa'\''g nyjnlf fbhaq gung jnl.}; tr/A-Ma-mN-Zn-z/N-Zn-zA-Ma-m/;say' signature.asc Description: Digital signature
Re: question of regexp or (another solution)
On 2012-12-15 06:13, timothy adigun wrote: Using Dr., Ruud's data. This is another way of doing it: [solution using a hash] Realize that with keys(), the input order is not preserved. Another difference is that when a key comes back later, the hash solution will collide those, which is either wanted of unwanted. So it all depends on what the *real* specifications are. A combined approach is to use the hash, and also push new keys in a side array. Then you can use that array to restore order later. The hash solution is not good with huge files. The code pattern I showed, is most used in map-reduce, where the input file is ordered (at least) on key, so you don't have to check that anymore. -- Ruud -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
question of regexp or (another solution)
Hi! I work in a library and i need to have several fields in one line Example I have this =995 \\$xPR$wLivre =995 \\$bECAM$cECAM =995 \\$n =995 \\$oDisponible =995 \\$kG1 42171 and i want in one line =995 \\$bECAM$cECAM$kG1 42171$n$oDisponible$xPR$wLivre
Re: question of regexp or (another solution)
i complete my email Hi! I work in a library and i need to have several fields in one line Example I have this =995 \\$xPR$wLivre =995 \\$bECAM$cECAM =995 \\$n =995 \\$oDisponible =995 \\$kG1 42171 and i want in one line =995 \\$bECAM$cECAM$kG1 42171$n$oDisponible$xPR$wLivre How could i do with a script perl? Many thanks samuel 2012/12/14 samuel desseaux sdesse...@gmail.com Hi! I work in a library and i need to have several fields in one line Example I have this =995 \\$xPR$wLivre =995 \\$bECAM$cECAM =995 \\$n =995 \\$oDisponible =995 \\$kG1 42171 and i want in one line =995 \\$bECAM$cECAM$kG1 42171$n$oDisponible$xPR$wLivre
Re: question of regexp or (another solution)
On 2012-12-14 14:54, samuel desseaux wrote: =995 \\$xPR$wLivre =995 \\$bECAM$cECAM =995 \\$n =995 \\$oDisponible =995 \\$kG1 42171 and i want in one line =995 \\$bECAM$cECAM$kG1 42171$n$oDisponible$xPR$wLivre echo -n '1 a 1 b 1 c 2 x =995 \\$xPR$wLivre =995 \\$bECAM$cECAM =995 \\$n =995 \\$oDisponible =995 \\$kG1 42171 zz 1 zz 2 zz 3 ' |perl -Mstrict -wle ' my ($key, $value); while ( my $line = ) { chomp $line; my ($k, $v) = split , $line, 2; if ( defined $key and $key eq $k ) { $value .= $v; } else { print $key\t$value if defined $key; ($key, $value) = ($k, $v); } } print $key\t$value if defined $key; ' 1 abc 2 x =995\\$xPR$wLivre\\$bECAM$cECAM\\$n\\$oDisponible\\$kG1 42171 zz 123 -- Ruud -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: question of regexp or (another solution)
Hi, On Fri, Dec 14, 2012 at 2:53 PM, samuel desseaux sdesse...@gmail.comwrote: Hi! I work in a library and i need to have several fields in one line Example I have this =995 \\$xPR$wLivre =995 \\$bECAM$cECAM =995 \\$n =995 \\$oDisponible =995 \\$kG1 42171 and i want in one line =995 \\$bECAM$cECAM$kG1 42171$n$oDisponible$xPR$wLivre Using Dr., Ruud's data. This is another way of doing it: use warnings; use strict; my %data_collection; while (DATA) { chomp; my ( $key, $value ) = split /\s+/, $_, 2; push @{ $data_collection{$key} }, $value; } print $_, , @{ $data_collection{$_} }, $/ for keys %data_collection; __DATA__ =995 \\$xPR$wLivre =995 \\$bECAM$cECAM =995 \\$n =995 \\$oDisponible =995 \\$kG1 42171 zz 1 zz 2 zz 3 -- Tim
Re: question of regexp or (another solution)
Hi, On Fri, Dec 14, 2012 at 2:53 PM, samuel desseaux sdesse...@gmail.comwrote: Hi! I work in a library and i need to have several fields in one line Example I have this =995 \\$xPR$wLivre =995 \\$bECAM$cECAM =995 \\$n =995 \\$oDisponible =995 \\$kG1 42171 and i want in one line =995 \\$bECAM$cECAM$kG1 42171$n$oDisponible$xPR$wLivre Adding to Tim's wisdom Here is another way of doing it. use warnings; use strict; my %data_collection; while (DATA) { chomp; my ( $key, $value ) = split /\s+/, $_, 2; $data_collection{$key} .= $value; } print $data_collection{$_}, $/ for sort keys %data_collection; __DATA__ =995 \\$xPR$wLivre =995 \\$bECAM$cECAM =995 \\$n =995 \\$oDisponible =995 \\$kG1 42171 zz 1 zz 2 zz 3 rms 0xcafebabe rms 0xfed best, Shaji --- Your talent is God's gift to you. What you do with it is your gift back to God. --- From: timothy adigun 2teezp...@gmail.com To: samuel desseaux sdesse...@gmail.com Cc: beginners@perl.org Sent: Saturday, 15 December 2012 10:43 AM Subject: Re: question of regexp or (another solution) Hi, On Fri, Dec 14, 2012 at 2:53 PM, samuel desseaux sdesse...@gmail.comwrote: Hi! I work in a library and i need to have several fields in one line Example I have this =995 \\$xPR$wLivre =995 \\$bECAM$cECAM =995 \\$n =995 \\$oDisponible =995 \\$kG1 42171 and i want in one line =995 \\$bECAM$cECAM$kG1 42171$n$oDisponible$xPR$wLivre Using Dr., Ruud's data. This is another way of doing it: use warnings; use strict; my %data_collection; while (DATA) { chomp; my ( $key, $value ) = split /\s+/, $_, 2; push @{ $data_collection{$key} }, $value; } print $_, , @{ $data_collection{$_} }, $/ for keys %data_collection; __DATA__ =995 \\$xPR$wLivre =995 \\$bECAM$cECAM =995 \\$n =995 \\$oDisponible =995 \\$kG1 42171 zz 1 zz 2 zz 3 -- Tim
Re: Scalar::Util::blessed() considers Regexp references to be blessed?
On Mon, Jan 23, 2012 at 2:12 AM, David Christensen dpchr...@holgerdanske.com wrote: beginners@perl.org: While coding some tests tonight, I discovered that Scalar::Util::blessed() considers Regexp references to be blessed. Is this a bug or a feature? Implementation detail. Internally, regexen have their own data type, REGEXP (which you can see with Scalar::Util::reftype), which is then 'blessed' into the Regexp class. This actually allows some very cool tricks, but nothing that should ever see the light of a production server, or show up in a beginners mailing list : )
Scalar::Util::blessed() considers Regexp references to be blessed?
beginners@perl.org: While coding some tests tonight, I discovered that Scalar::Util::blessed() considers Regexp references to be blessed. Is this a bug or a feature? TIA, David 2012-01-22 21:07:57 dpchrist@p43400e ~/sandbox/perl $ cat blessed #! /usr/bin/perl # $Id: blessed,v 1.1 2012-01-23 05:06:51 dpchrist Exp $ use strict; use warnings; use Test::More tests = 11; use Scalar::Utilqw( blessed ); our $foo; ok( !blessed(undef),'undefined value' ); # 1 ok( !blessed(''), 'empty string' ); # 2 ok( !blessed(0),'zero' ); # 3 ok( !blessed(1),'one' ); # 4 ok( !blessed(\0), 'scalar reference' ); # 5 ok( !blessed([]), 'array reference' ); # 6 ok( !blessed({}), 'hash reference'); # 7 ok( !blessed(sub {}), 'code reference'); # 8 ok( !blessed(*foo), 'glob reference'); # 9 ok( ref(qr//) eq 'Regexp', 'qr// creates Regexp reference' ); # 10 ok( !blessed(qr//), 'Regexp reference' ); # 11 2012-01-22 21:08:27 dpchrist@p43400e ~/sandbox/perl $ perl blessed 1..11 ok 1 - undefined value ok 2 - empty string ok 3 - zero ok 4 - one ok 5 - scalar reference ok 6 - array reference ok 7 - hash reference ok 8 - code reference ok 9 - glob reference ok 10 - qr// creates Regexp reference not ok 11 - Regexp reference # Failed test 'Regexp reference' # at blessed line 18. # Looks like you failed 1 test of 11. 2012-01-22 21:08:30 dpchrist@p43400e ~/sandbox/perl $ perl -MScalar::Util -e 'print $Scalar::Util::VERSION, \n' 1.23 2012-01-22 21:08:34 dpchrist@p43400e ~/sandbox/perl $ perl -v This is perl, v5.10.1 (*) built for i486-linux-gnu-thread-multi (with 53 registered patches, see perl -V for more detail) Copyright 1987-2009, Larry Wall Perl may be copied only under the terms of either the Artistic License or the GNU General Public License, which may be found in the Perl 5 source kit. Complete documentation for Perl, including FAQ lists, should be found on this system using man perl or perldoc perl. If you have access to the Internet, point your browser at http://www.perl.org/, the Perl Home Page. 2012-01-22 21:08:37 dpchrist@p43400e ~/sandbox/perl $ cat /etc/debian_version 6.0.3 -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
how to use regexp to match symbols
Hi, I have a list of mp3 files in my computer and some of the file names consists of a bracket like this darling I love [you.mp3 I wish to check them for duplicates using the script below, but theres error msg like this Unmatched [ in regex; marked by -- HERE in m/only one brace here[ -- HERE anything.mp3/ at Untitled1 line 13. So how do I rewrite the regexp. Thanks. ## script ### #!/usr/bin/perl use strict; use warnings; use File::Find; my @datas = (test.mp3 , only one brace here[anything.mp3 , whatever.mp3); while (@datas){ my $ref = splice @datas,0,1; foreach (@datas){ if ($ref =~/$_/){ print $ref is a duplicate\n; }else{ print $ref is not a duplicate\n; } } }
Re: how to use regexp to match symbols
On Mon, Jun 13, 2011 at 2:05 PM, eventual eventualde...@yahoo.com wrote: Hi, I have a list of mp3 files in my computer and some of the file names consists of a bracket like this darling I love [you.mp3 I wish to check them for duplicates using the script below, but theres error msg like this Unmatched [ in regex; marked by -- HERE in m/only one brace here[ -- HERE anything.mp3/ at Untitled1 line 13. So how do I rewrite the regexp. Thanks. ## script ### #!/usr/bin/perl use strict; use warnings; use File::Find; my @datas = (test.mp3 , only one brace here[anything.mp3 , whatever.mp3); while (@datas){ my $ref = splice @datas,0,1; foreach (@datas){ if ($ref =~/$_/){ print $ref is a duplicate\n; }else{ print $ref is not a duplicate\n; } } } Escape the special character by using a \ so in your case you would say: only one brace here\[anything.mp3 which the regular expression engine will translate to: only one brace here[anything.mp3 instead of only one brace hereOpen a caracter groupanything.mp3 which would mean you never close the group and thus the regular expression is invalid and will throw an error. Regards, Rob
Re: how to use regexp to match symbols
eventual wrote: Hi, Hello, I have a list of mp3 files in my computer and some of the file names consists of a bracket like this darling I love [you.mp3 I wish to check them for duplicates using the script below, but theres error msg like this Unmatched [ in regex; marked by-- HERE in m/only one brace here[-- HERE anything.mp3/ at Untitled1 line 13. So how do I rewrite the regexp. Thanks. ## script ### #!/usr/bin/perl use strict; use warnings; use File::Find; my @datas = (test.mp3 , only one brace here[anything.mp3 , whatever.mp3); while (@datas){ my $ref = splice @datas,0,1; That is usually written as: my $ref = shift @datas; foreach (@datas){ if ($ref =~/$_/){ That doesn't test if $ref is a duplicate, it tests if $_ is a substring of $ref, so this would print This is a test mp3 file.html is a duplicate\n: if ( This is a test mp3 file.html =~ /test.mp3/ ) Because . will match any character and the pattern is not anchored. If you want to see if the two strings are exactly the same then: if ( $ref eq $_ ) { Or you could use a hash instead of an array so you would know that there are no duplicates. But as to your question about the '[' character causing an error messages, you have to use quotemeta to escape regular expressions special characters: if ( $ref =~ /\Q$_/ ) { print $ref is a duplicate\n; }else{ print $ref is not a duplicate\n; } } } John -- Any intelligent fool can make things bigger and more complex... It takes a touch of genius - and a lot of courage to move in the opposite direction. -- Albert Einstein -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: how to use regexp to match symbols
On 2011-06-13 14:05, eventual wrote: I have a list of mp3 files in my computer and some of the file names consists of a bracket like this darling I love [you.mp3 I wish to check them for duplicates using the script below, but theres error msg like this Unmatched [ in regex; marked by-- HERE in m/only one brace here[-- HERE anything.mp3/ at Untitled1 line 13. Why would you want to use a regex for this? Use 'eq', or see 'perldoc -f index'. In case of real regex need, see 'perldoc -f quotemeta'. -- Ruud -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: how to use regexp to match symbols
I have a list of mp3 files in my computer and some of the file names consists of a bracket like this darling I love [you.mp3 I wish to check them for duplicates using the script below, but theres error msg like this Unmatched [ in regex; marked by -- HERE in m/only one brace here[ -- HERE anything.mp3/ at Searching google I found that several scripts use the Find::Duplicates Module. http://search.cpan.org/~tmtm/File-Find-Duplicates-1.00/lib/File/Find/Duplicates.pm Sayth -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
RE: regexp validation (arbitrary code execution) (regexp injection)
From: Stanislaw Findeisen Suppose you have a collection of books, and want to provide your users with the ability to search the book title, author or content using regular expressions. But you don't want to let them execute any code. How would you validate/compile/evaluate the user provided regex so as to provide maximum flexibility and prevent code execution? You want them to run an application without having to run an application? That doesn't make any sense. You have several options available to give your users access to a database. 1. Write a client application or applet they can copy or install on their workstation to access the database directly. 2. Write a simpler application or applet that accesses a non-DB server which in turn access the database. 3. Create a site on a web server they can access with a browser, which then accesses the database. There are any number of variations on these themes, but in each case, they have to run some application code somewhere in order to access the data. Bob McConnell -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: regexp validation (arbitrary code execution) (regexp injection)
On 2011-06-02 14:27, Bob McConnell wrote: From: Stanislaw Findeisen Suppose you have a collection of books, and want to provide your users with the ability to search the book title, author or content using regular expressions. But you don't want to let them execute any code. How would you validate/compile/evaluate the user provided regex so as to provide maximum flexibility and prevent code execution? You want them to run an application without having to run an application? That doesn't make any sense. This is a complete misunderstanding. Sorry, perhaps I wasn't clear enough. I was talking about users injecting *their* code via the regex. See for instance: http://perldoc.perl.org/perlretut.html#A-bit-of-magic:-executing-Perl-code-in-a-regular-expression or /e modifier for the built-in function s (search and replace). When doing: $string =~ $regex where $regex is user provided, arbitrary regular expression, anything can happen. -- Eisenbits - proven software solutions: http://www.eisenbits.com/ OpenPGP: E3D9 C030 88F5 D254 434C 6683 17DD 22A0 8A3B 5CC0 -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: regexp validation (arbitrary code execution) (regexp injection)
2011/6/1 Stanisław Findeisen s...@eisenbits.com Suppose you have a collection of books, and want to provide your users with the ability to search the book title, author or content using regular expressions. But you don't want to let them execute any code. How would you validate/compile/evaluate the user provided regex so as to provide maximum flexibility and prevent code execution? -- Eisenbits - proven software solutions: http://www.eisenbits.com/ OpenPGP: E3D9 C030 88F5 D254 434C 6683 17DD 22A0 8A3B 5CC0 -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/ Hi Stanisław, From what you are saying I think you are looking for an option to take a string and check it for any potential bad characters that would cause system to execute unwanted code. So a bit like this: In.*?forest$ is a safe string to feed into your regular expression but: .*/; open my $fh, , $0; close $fh; $_ = ~/ is an evil string causing you a lot of grief. At least that is how I understand your question... To be honest I am not sure if this is an issue as I suspect that the following construction. if ( $title =~ m/$userinput/ ) { do stuff... } will give you any issues as far as I can remember the variable that you are feeding here will not be treated as code by the interpreted but simply as a matching instructions which would mean that what ever your user throws at it perl will in the worst case return an failure to match. But please don't take my word for it try it in a very simple test and see what happens. If you do have to ensure that a user cannot execute any code you could simply prevent the user from entering the ; or smarter yet filter this out from the user input, to prevent a smart user from feeding it to your code via an method other then the front-end you provided. Without a means to close the previous regular expression the user can not really insert executable code into your regular expression. At least thats what I would try but I am by no means an expert in the area and I suspect there might be some people reading this and wondering why I didn't think of A, B or C if so please do speak up people ;-) Regards, Rob
Re: regexp validation (arbitrary code execution) (regexp injection)
Stanisław == Stanisław Findeisen s...@eisenbits.com writes: Stanisław But you don't want to let them execute any code. Unless use re 'eval' is in scope, /$a/ is safe even if $a came from an untrusted source, as long as you limit the run-time to a few seconds or so with an alarm. (Some regex can take nearly forever to fail.) See perldoc perlre for more details. -- Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777 0095 mer...@stonehenge.com URL:http://www.stonehenge.com/merlyn/ Smalltalk/Perl/Unix consulting, Technical writing, Comedy, etc. etc. See http://methodsandmessages.posterous.com/ for Smalltalk discussion -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: regexp validation (arbitrary code execution) (regexp injection)
On Wed, Jun 01, 2011 at 11:25:39PM +0200, Stanisław Findeisen wrote: Suppose you have a collection of books, and want to provide your users with the ability to search the book title, author or content using regular expressions. But you don't want to let them execute any code. How would you validate/compile/evaluate the user provided regex so as to provide maximum flexibility and prevent code execution? In general this shouldn't be a problem provided you don't turn on use re eval; $ perl -e '/$ARGV[0]/' '(?{ print hello })' Eval-group not allowed at runtime, use re 'eval' in regex m/(?{ print hello })/ at -e line 1. $ perl -Mre=eval -e '/$ARGV[0]/' '(?{ print hello })' hello Of course, you're not going to be too worried about people saying hello, but once you can execute arbitrary code all bets are off: $ perl -e '/$ARGV[0]/' '(?{ system sudo mailx -s ha baddie\@example.com /etc/shadow ])' Make sure you don't do the whole match as part of a string eval, and since you're only matching, you shouldn't have to worry about s///e. If you prefer a more paranoid approach you might want to restrict the characters you allow in the user input, but this doesn't provide maximum flexibility. -- Paul Johnson - p...@pjcj.net http://www.pjcj.net -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
regexp validation (arbitrary code execution) (regexp injection)
Suppose you have a collection of books, and want to provide your users with the ability to search the book title, author or content using regular expressions. But you don't want to let them execute any code. How would you validate/compile/evaluate the user provided regex so as to provide maximum flexibility and prevent code execution? -- Eisenbits - proven software solutions: http://www.eisenbits.com/ OpenPGP: E3D9 C030 88F5 D254 434C 6683 17DD 22A0 8A3B 5CC0 -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: regexp matching nummeric ranges
Kammen == Kammen van, Marco, Springer SBM NL marco.vankam...@springer.com writes: Kammen What am I doing wrong?? Using a regex when something else would be much better. Stop trying to pound a nail in with a wrench handle. -- Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777 0095 mer...@stonehenge.com URL:http://www.stonehenge.com/merlyn/ Smalltalk/Perl/Unix consulting, Technical writing, Comedy, etc. etc. See http://methodsandmessages.posterous.com/ for Smalltalk discussion -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: Regexp delimiters
On Dec 7, 9:38 am, p...@utilika.org (Jonathan Pool) wrote: Well, I have no idea why it does what it does, but I can tell you how to make it work: s¶3(456)7¶¶$1¶x; s§3(456)7§§$1§x; Hm, what platform and perl version? No errors here: c:\perl -wE say $^V,$^O;$_='123456789';s§3(456)7§$1§;say v5.12.1MSWin32 1245689 [...] -- Charles DeRykus -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: Regexp delimiters
On Dec 7, 9:38 am, p...@utilika.org (Jonathan Pool) wrote: Well, I have no idea why it does what it does, but I can tell you how to make it work: s¶3(456)7¶¶$1¶x; s§3(456)7§§$1§x; Oops. yes there is: c:\perl -Mutf8 -wE say $^V,$^O;$_='123456789'; s§3(456)7§$1§;say Malformed UTF-8 character (unexpected continuation byte 0xa7, with no preceding start byte) at -e line 1. Malformed UTF-8 character (unexpected continuation byte 0xa7, with no preceding start byte) at -e line 1. v5.12.1MSWin32 1245689 -- Charles DeRykus -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: Regexp delimiters
On Dec 7, 9:38 am, p...@utilika.org (Jonathan Pool) wrote: Well, I have no idea why it does what it does, but I can tell you how to make it work: s¶3(456)7¶¶$1¶x; s§3(456)7§§$1§x; Oops, sorry, yes there is: c:\perl -Mutf8 -wE say $^V,$^O;$_='123456789';s§3(456)7§$1§;say Malformed UTF-8 character (unexpected continuation byte 0xa7, with no preceding start byte) at -e line 1. Malformed UTF-8 character (unexpected continuation byte 0xa7, with no preceding start byte) at -e line 1. v5.12.1MSWin32 1245689 -- Charles DeRykus -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: Regexp delimiters
Hm, what platform and perl version? 5.8.8 and 5.12.2 on RHEL, and 5.10.0 on OS X 10.6. c:\perl -Mutf8 -wE say $^V,$^O;$_='123456789';s§3(456)7§$1§;say Malformed UTF-8 character (unexpected continuation byte 0xa7, with no preceding start byte) at -e line 1. Not the same error as I got. This one looks to me like submitting 256-bit ASCII, where the section sign is A7, after telling the host you would be submitting UTF-8, where the section sign is C2A7. (Sorry if my terminology is wrong.) ˉ -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: Regexp delimiters
c:\perl -wE say $^V,$^O;$_='123456789';s§3(456)7§$1§;say v5.12.1MSWin32 1245689 My equivalent that works is: perl -wE use utf8;my \$_='123456789';s§3(456)7§§\$1§;say; 1245689 If I stop treating this section-sign delimiter as a bracketing delimiter, it fails: perl -wE use utf8;my \$_='123456789';s§3(456)7§\$1§;say; Substitution replacement not terminated at -e line 1. ˉ -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: Regexp delimiters
Well, I have no idea why it does what it does, but I can tell you how to make it work: s¶3(456)7¶¶$1¶x; s§3(456)7§§$1§x; Amazing. Thanks very much. This seems to contradict the documentation. The perlop man page clearly says that there are exactly 4 bracketing delimiters: (), [], {}, and . Everything else should be non-bracketing. But, in fact, several characters that I have tried behave as bracketing delimiters. An exception seems to be combining characters. The two that I tried don't work as either regular or bracketing delimiters. I have tested this on Perl 5.8.8, 5.10.0, and 5.12.2. See the code below for results. The combining characters appear in the last 2 test pairs. #!/usr/bin/perl -w use warnings 'FATAL', 'all'; # Make every warning fatal. use strict; # Require strict checking of variable references, etc. use utf8; # Make Perl interpret the script as UTF-8. my $string = '123456789'; # Initialize a scalar. print The original string is $string\n; # $string =~ s%3(456)7%$1%; # Succeeds # $string =~ s%3(456)7%%$1%; # Fails # $string =~ s§3(456)7§$1§; # Fails # $string =~ s§3(456)7§§$1§; # Succeeds # $string =~ s–3(456)7–$1–; # Fails # $string =~ s–3(456)7––$1–; # Succeeds # $string =~ s“3(456)7“$1“; # Fails # $string =~ s“3(456)7““$1“; # Succeeds # $string =~ s‱3(456)7‱$1‱; # Fails # $string =~ s‱3(456)7‱‱$1‱; # Succeeds # $string =~ s⇧3(456)7⇧$1⇧; # Fails # $string =~ s⇧3(456)7⇧⇧$1⇧; # Succeeds # $string =~ s⃠3(456)7⃠$1⃠; # Fails (single U+20e0) # $string =~ s⃠3(456)7⃠⃠$1⃠; # Fails (double U+20e0) # $string =~ s̸3(456)7̸$1̸; # Fails (single U+0338) # $string =~ s̸3(456)7̸̸$1̸; # Fails (double U+0338) # Modify it (uncomment any one line above.) print The amended string is $string\n; -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Regexp delimiters
The perlop document under s/PATTERN/REPLACEMENT/msixpogce says Any non-whitespace delimiter may replace the slashes. I take this to mean that any non-whitespace character may be used instead of a slash. However, I am finding that some non-whitespace characters cause errors. For example, using a ¶ or § character instead of a slash causes an error, such as Bareword found where operator expected or Number found where operator expected. When I use a /, #, or ,, I get no error. Here is a script that demonstrates this problem: #!/usr/bin/perl -w use warnings 'FATAL', 'all'; use strict; use utf8; my $string = '123456789'; print The original string is $string\n; $string =~ s§3(456)7§$1§; print The amended string is $string\n; -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: Regexp delimiters
Well, I have no idea why it does what it does, but I can tell you how to make it work: s¶3(456)7¶¶$1¶x; s§3(456)7§§$1§x; For whatever reason, Perl is treating those character as an 'opening' delimiter[0], so that when you write s¶3(456)7¶$1¶;, you are telling Perl that the regex part is delimited by '¶'s, but the substitution part is delimited by '$'s (think of something like s{}//;). Hopefully someone here will be able to enlighten us both further. Brian. [0] http://perldoc.perl.org/perlop.html#Gory-details-of-parsing-quoted-constructs On Sun, Dec 5, 2010 at 6:33 PM, Jonathan Pool p...@utilika.org wrote: The perlop document under s/PATTERN/REPLACEMENT/msixpogce says Any non-whitespace delimiter may replace the slashes.s I take this to mean that any non-whitespace character may be used instead of a slash. However, I am finding that some non-whitespace characters cause errors. For example, using a ¶ or § character instead of a slash causes an error, such as Bareword found where operator expected or Number found where operator expected. When I use a /, #, or ,, I get no error. Here is a script that demonstrates this problem: #!/usr/bin/perl -w use warnings 'FATAL', 'all'; use strict; use utf8; my $string = '123456789'; print The original string is $string\n; $string =~ s§3(456)7§$1§; print The amended string is $string\n; -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: Regexp delimiters
On 10-12-05 05:58 PM, Brian Fraser wrote: Well, I have no idea why it does what it does, but I can tell you how to make it work: s¶3(456)7¶¶$1¶x; s§3(456)7§§$1§x; For whatever reason, Perl is treating those character as an 'opening' delimiter[0], so that when you write s¶3(456)7¶$1¶;, you are telling Perl that the regex part is delimited by '¶'s, but the substitution part is delimited by '$'s (think of something like s{}//;). Hopefully someone here will be able to enlighten us both further. $ perl -e's¶3(456)7¶¶$1¶x;' Unrecognized character \xB6 in column 14 at -e line 1. $ perl -Mutf8 -e's¶3(456)7¶¶$1¶x;' You have to tell perl to use UTF-8. Add this line to the top of your script(s): use utf8; See `perldoc utf8` for more details. -- Just my 0.0002 million dollars worth, Shawn Confusion is the first step of understanding. Programming is as much about organization and communication as it is about coding. The secret to great software: Fail early often. Eliminate software piracy: use only FLOSS. -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: Regexp delimiters
You have to tell perl to use UTF-8. Add this line to the top of your script(s): use utf8; See `perldoc utf8` for more details. Hm, I don't mean to step on your toes or anything, but he is already using utf8. The problem is with some utf8 characters being interpreted as a paired delimiter, I think. Brian.
Re: Regexp delimiters
On 10-12-05 07:38 PM, Brian Fraser wrote: You have to tell perl to use UTF-8. Add this line to the top of your script(s): use utf8; See `perldoc utf8` for more details. Hm, I don't mean to step on your toes or anything, but he is already using utf8. The problem is with some utf8 characters being interpreted as a paired delimiter, I think. Brian. It works for me. What version of Perl is he running? 5.6 does not work well with UTF-8. -- Just my 0.0002 million dollars worth, Shawn Confusion is the first step of understanding. Programming is as much about organization and communication as it is about coding. The secret to great software: Fail early often. Eliminate software piracy: use only FLOSS. -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: Regexp delimiters
That's probably because you are using what I sent, rather than what the OP did: C:\perl -E s§3(456)7§$1§; Unrecognized character \x98 in column 16 at -e line 1. C:\perl -Mutf8 -E s§3(456)7§$1§; Substitution replacement not terminated at -e line 1. C:\perl -E s§3(456)7§§$1§; say Unrecognized character \x98 in column 14 at -e line 1. C:\perl -Mutf8 -E s§3(456)7§§$1§; say C:\ Brian.
Re: regexp matching nummeric ranges
On 30/11/2010 06:39, Uri Guttman wrote: GK == Guruprasad Kulkarniguruprasa...@gmail.com writes: GK Here is another way to do it: GK /^127\.0\.0\.([\d]|[1-9][\d]|[1][\d][\d]|[2]([0-4][\d]|[5][0-4]))$/) { why are you putting single chars inside a char class? [\d] is the same as \d and [1] is just 1. Also this is another solution that wrongly verifies 127.0.0.0. It also unnecessarily makes use of captures instead of grouping, and puts single values into character classes ([1], [\d] etc.). Perhaps it is better written: /^127\.0\.0\.(?: [1-9]\d? | # 1 .. 99 1\d\d| # 100 .. 199 2[0-4]\d | # 200 .. 249 25[0-4]# 250 .. 254 )$/x; But my feeling is that these long-winded pure regex solutions are more of a response to a challenge than a practical solution. At the very least they need commenting to explain what they are doing. Capturing the value of the last byte field, as I suggested, seems to describe the purpose of the code far better, with no significant penalty that I can think of. - Rob -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
regexp matching nummeric ranges
Dear List, I've been struggeling with the following: #!/usr/bin/perl use strict; use warnings; my $ip = (127.0.0.255); if ($ip =~ /127\.0\.0\.[2..254]/) { print IP Matched!\n;; } else { print No Match!\n; } For a reason i don't understand: 127.0.0.1 doesn't match as expected... Everything between 127.0.0.2 and 127.0.0.299 matches... 127.0.0.230 doesn't match... What am I doing wrong?? Thanks! - Marco van Kammen Springer Science+Business Media System Manager Postmaster - van Godewijckstraat 30 | 3311 GX Office Number: 05E21 Dordrecht | The Netherlands - tel +31(78)6576446 fax +31(78)6576302 - www.springeronline.com www.springer.com - -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: regexp matching nummeric ranges
On 29/11/2010 14:22, Kammen van, Marco, Springer SBM NL wrote: Dear List, I've been struggeling with the following: #!/usr/bin/perl use strict; use warnings; my $ip = (127.0.0.255); if ($ip =~ /127\.0\.0\.[2..254]/) { print IP Matched!\n;; } else { print No Match!\n; } For a reason i don't understand: 127.0.0.1 doesn't match as expected... Everything between 127.0.0.2 and 127.0.0.299 matches... 127.0.0.230 doesn't match... What am I doing wrong?? Thanks! Hello Marco Regular expressions can't match a decimal string by value. The regex /[2..254]/ uses a character class which matches a SINGLE character, which may be '2', '5', '4' or '.'. It is the same as /[254.]/ as characters that appear a second time have no effect. To verify the value of a decimal substring you need to add an extra test: if ($ip =~ /^127\.0\.0\.([0-9]+)$/ and 2 = $1 and $1 = 254) { : } In the case of a successful match, this captures the fourth sequence of digits, leaving it in $1. This value can then be tested separately to make sure it is in the desired range. Note that I have added the beginning and end of line anchors ^ and $ which ensure that the the string doesn't just contain a valid IP address, otherwise anything like XXX.127.0.0.200.300.400 would pass the test. HTH, Rob -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: regexp matching nummeric ranges
Kammen van, Marco, Springer SBM NL wrote: Dear List, Hello, I've been struggeling with the following: #!/usr/bin/perl use strict; use warnings; my $ip = (127.0.0.255); if ($ip =~ /127\.0\.0\.[2..254]/) { print IP Matched!\n;; } else { print No Match!\n; } For a reason i don't understand: 127.0.0.1 doesn't match as expected... Everything between 127.0.0.2 and 127.0.0.299 matches... 127.0.0.230 doesn't match... What am I doing wrong?? As Rob said [2..254] is a character class that matches one character (so 127.0.0.230 should match also.) You also don't anchor the pattern so something like '765127.0.0.273646' would match as well. What you need is something like this: #!/usr/bin/perl use strict; use warnings; my $ip = '127.0.0.255'; my $IP_match = qr{ \A # anchor at beginning of string 127\.0\.0\. # match the literal characters (?: [2-9]# match one digit numbers 2 - 9 |# OR [0-9][0-9] # match any two digit number |# OR 1[0-9][0-9] # match 100 - 199 |# OR 2[0-4][0-9] # match 200 - 249 |# OR 25[0-4] # match 250 - 254 ) \z # anchor at end of string }x; if ( $ip =~ $IP_match ) { print IP Matched!\n;; } else { print No Match!\n; } Or, another way to do it: #!/usr/bin/perl use strict; use warnings; use Socket; my $ip = inet_aton '127.0.0.255'; my $start = inet_aton '127.0.0.2'; my $end = inet_aton '127.0.0.254'; if ( $ip ge $start $ip le $end ) { print IP Matched!\n;; } else { print No Match!\n; } John -- Any intelligent fool can make things bigger and more complex... It takes a touch of genius - and a lot of courage to move in the opposite direction. -- Albert Einstein -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: regexp matching nummeric ranges
On 29/11/2010 23:46, John W. Krahn wrote: Kammen van, Marco, Springer SBM NL wrote: Dear List, Hello, I've been struggeling with the following: #!/usr/bin/perl use strict; use warnings; my $ip = (127.0.0.255); if ($ip =~ /127\.0\.0\.[2..254]/) { print IP Matched!\n;; } else { print No Match!\n; } For a reason i don't understand: 127.0.0.1 doesn't match as expected... Everything between 127.0.0.2 and 127.0.0.299 matches... 127.0.0.230 doesn't match... What am I doing wrong?? As Rob said [2..254] is a character class that matches one character (so 127.0.0.230 should match also.) You also don't anchor the pattern so something like '765127.0.0.273646' would match as well. What you need is something like this: #!/usr/bin/perl use strict; use warnings; my $ip = '127.0.0.255'; my $IP_match = qr{ \A # anchor at beginning of string 127\.0\.0\. # match the literal characters (?: [2-9] # match one digit numbers 2 - 9 | # OR [0-9][0-9] # match any two digit number | # OR 1[0-9][0-9] # match 100 - 199 | # OR 2[0-4][0-9] # match 200 - 249 | # OR 25[0-4] # match 250 - 254 ) \z # anchor at end of string }x; if ( $ip =~ $IP_match ) { print IP Matched!\n;; } else { print No Match!\n; } Or, another way to do it: #!/usr/bin/perl use strict; use warnings; use Socket; my $ip = inet_aton '127.0.0.255'; my $start = inet_aton '127.0.0.2'; my $end = inet_aton '127.0.0.254'; if ( $ip ge $start $ip le $end ) { print IP Matched!\n;; } else { print No Match!\n; } This regex solution allows the IP address 127.0.0.01, which is out of range. - Rob -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: regexp matching nummeric ranges
Hi Marco, Here is another way to do it: #!/usr/bin/perl use strict; use warnings; my $ip = 127.0.0.1; if ($ip =~ /^127\.0\.0\.([\d]|[1-9][\d]|[1][\d][\d]|[2]([0-4][\d]|[5][0-4]))$/) { print IP Matched!\n;; } else { print No Match!\n; } On Tue, Nov 30, 2010 at 11:21 AM, Rob Dixon rob.di...@gmx.com wrote: On 29/11/2010 23:46, John W. Krahn wrote: Kammen van, Marco, Springer SBM NL wrote: Dear List, Hello, I've been struggeling with the following: #!/usr/bin/perl use strict; use warnings; my $ip = (127.0.0.255); if ($ip =~ /127\.0\.0\.[2..254]/) { print IP Matched!\n;; } else { print No Match!\n; } For a reason i don't understand: 127.0.0.1 doesn't match as expected... Everything between 127.0.0.2 and 127.0.0.299 matches... 127.0.0.230 doesn't match... What am I doing wrong?? As Rob said [2..254] is a character class that matches one character (so 127.0.0.230 should match also.) You also don't anchor the pattern so something like '765127.0.0.273646' would match as well. What you need is something like this: #!/usr/bin/perl use strict; use warnings; my $ip = '127.0.0.255'; my $IP_match = qr{ \A # anchor at beginning of string 127\.0\.0\. # match the literal characters (?: [2-9] # match one digit numbers 2 - 9 | # OR [0-9][0-9] # match any two digit number | # OR 1[0-9][0-9] # match 100 - 199 | # OR 2[0-4][0-9] # match 200 - 249 | # OR 25[0-4] # match 250 - 254 ) \z # anchor at end of string }x; if ( $ip =~ $IP_match ) { print IP Matched!\n;; } else { print No Match!\n; } Or, another way to do it: #!/usr/bin/perl use strict; use warnings; use Socket; my $ip = inet_aton '127.0.0.255'; my $start = inet_aton '127.0.0.2'; my $end = inet_aton '127.0.0.254'; if ( $ip ge $start $ip le $end ) { print IP Matched!\n;; } else { print No Match!\n; } This regex solution allows the IP address 127.0.0.01, which is out of range. - Rob -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
RE: regexp matching nummeric ranges
-Original Message- From: John W. Krahn [mailto:jwkr...@shaw.ca] Sent: Tuesday, November 30, 2010 12:47 AM To: Perl Beginners Subject: Re: regexp matching nummeric ranges As Rob said [2..254] is a character class that matches one character (so 127.0.0.230 should match also.) You also don't anchor the pattern so something like '765127.0.0.273646' would match as well. What you need is something like this: #!/usr/bin/perl use strict; use warnings; my $ip = '127.0.0.255'; my $IP_match = qr{ \A # anchor at beginning of string 127\.0\.0\. # match the literal characters (?: [2-9]# match one digit numbers 2 - 9 |# OR [0-9][0-9] # match any two digit number |# OR 1[0-9][0-9] # match 100 - 199 |# OR 2[0-4][0-9] # match 200 - 249 |# OR 25[0-4] # match 250 - 254 ) \z # anchor at end of string }x; if ( $ip =~ $IP_match ) { print IP Matched!\n;; } else { print No Match!\n; } Thanks for all the good pointers... This is something I can work with! Marco. -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: regexp matching nummeric ranges
GK == Guruprasad Kulkarni guruprasa...@gmail.com writes: GK Here is another way to do it: GK /^127\.0\.0\.([\d]|[1-9][\d]|[1][\d][\d]|[2]([0-4][\d]|[5][0-4]))$/) { why are you putting single chars inside a char class? [\d] is the same as \d and [1] is just 1. also please don't quote entire emails below your post. learn to bottom post and edit the quoted emails. we read from top to bottom so post that way too. uri -- Uri Guttman -- u...@stemsystems.com http://www.sysarch.com -- - Perl Code Review , Architecture, Development, Training, Support -- - Gourmet Hot Cocoa Mix http://bestfriendscocoa.com - -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: regexp matching nummeric ranges
Rob Dixon wrote: On 29/11/2010 23:46, John W. Krahn wrote: As Rob said [2..254] is a character class that matches one character (so 127.0.0.230 should match also.) You also don't anchor the pattern so something like '765127.0.0.273646' would match as well. What you need is something like this: #!/usr/bin/perl use strict; use warnings; my $ip = '127.0.0.255'; my $IP_match = qr{ \A # anchor at beginning of string 127\.0\.0\. # match the literal characters (?: [2-9] # match one digit numbers 2 - 9 | # OR [0-9][0-9] # match any two digit number This regex solution allows the IP address 127.0.0.01, which is out of range. Yes, sorry, that should be: [1-9][0-9] | # OR 1[0-9][0-9] # match 100 - 199 | # OR 2[0-4][0-9] # match 200 - 249 | # OR 25[0-4] # match 250 - 254 ) \z # anchor at end of string }x; if ( $ip =~ $IP_match ) { print IP Matched!\n;; } else { print No Match!\n; } John -- Any intelligent fool can make things bigger and more complex... It takes a touch of genius - and a lot of courage to move in the opposite direction. -- Albert Einstein -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: Parsing file and regexp
(Sorry but I have problem with my ISP, so I repost !) Uri Guttman wrote: how do you know when a keyword section begins or ends? how large is this file? could free text have keywords? i see a ; to end a word list but that isn't enough to properly parse this if you have 'free text'. osc Is it possible to do this with regular expression ? osc Or should I write a small parser ? yes and yes. osc I have tried pattern matching with the 's' and also with the 'm' osc option, osc but with no good result ... please show your code. there is no way to help otherwise. s/// is not a pattern matcher but a substitution operator. it uses regexes and can be used to parse things. uri Hi Uri, Sorry, code is at my office The free text can not contain keywords. And keywords start at the beginning of a line. The list of words is terminated by a ;. For the pattern matching I have used the option s: m/pattern/s, to swallow the different \n. Olivier -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: Parsing file and regexp
Uri Guttman wrote: please show your code. there is no way to help otherwise. s/// is not a pattern matcher but a substitution operator. it uses regexes and can be used to parse things. uri Here it is ... $ cat test.txt keyword1 word1, word2 word3; blabla blabla keyword2 word4, word5, word6, word7, word8, word9; bla bla bla bla keyword1 word10, word11; $ cat parse.pl use warnings; open FILE, test.txt or die Could not open $!; $/ = undef; $source = FILE; close(FILE); if ($source =~ m/keyword1\s*(\w*)(,\w*)*/s) { print(Match !\n); print($1\n); print($2\n); } $ perl parse.pl Match ! word1 , Here I would like to have 2 matches: word1, word2 word3; and word10, word11; Thanks to help me ! Olivier -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: Parsing file and regexp
olivier.scalb...@algosyn.com wrote: $ cat test.txt keyword1 word1, word2 word3; blabla blabla keyword2 word4, word5, word6, word7, word8, word9; bla bla bla bla keyword1 word10, word11; #!/usr/bin/perl use strict; use warnings; use Data::Dumper; # Make Data::Dumper pretty $Data::Dumper::Sortkeys = 1; $Data::Dumper::Indent = 1; # Set maximum depth for Data::Dumper, zero means unlimited $Data::Dumper::Maxdepth = 0; my $file = shift @ARGV; my $source; open my $source_fh, '', $file or die could not open $file: $!\n; { local $/; $source = $source_fh; } close $source_fh; my %keywords; my @captured = $source =~ m{ ( keyword\d+ ) ( [^;]+ ) \; }gmsx; while( @captured ){ my $keyword = shift @captured; my $words = shift @captured; $words =~ s{ \A \s+ }{}msx; $words =~ s{ \s+ \z }{}msx; my @words = split m{ \s* \, \s* }msx, $words; push @{ $keywords{$keyword} }, @words; } print 'keywords: ', Dumper \%keywords; __END__ -- Just my 0.0002 million dollars worth, Shawn Programming is as much about organization and communication as it is about coding. I like Perl; it's the only language where you can bless your thingy. Eliminate software piracy: use only FLOSS. -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Parsing file and regexp
Hello, I need to extract info from some text files. And I want to do it with Perl ! The file I need to parse has the following layout: keywordA word1, word2, word3; Here we can have some free text ... ... keywordB word4, word5, word6, word7, word8, word9, word10; KeywordA word1, word2; ... I want to extract all the keywords with their associated words. For example, with this file, I would like to have: keywordA: (word1, word2, word3) keywordB: (word4, word5, word6, word7, word8, word9, word10) keywordA: (word1, word2) Is it possible to do this with regular expression ? Or should I write a small parser ? I have tried pattern matching with the 's' and also with the 'm' option, but with no good result ... Thanks to help me ! Olivier -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: Parsing file and regexp
osc == olivier scalb...@algosyn com olivier.scalb...@algosyn.com writes: osc keywordA word1, word2, word3; osc Here we can have some free text osc ... osc ... osc keywordB word4, osc word5, word6, word7, word8, osc word9, word10; osc KeywordA osc word1, word2; osc ... how do you know when a keyword section begins or ends? how large is this file? could free text have keywords? i see a ; to end a word list but that isn't enough to properly parse this if you have 'free text'. osc I want to extract all the keywords with their associated words. osc For example, with this file, I would like to have: osc keywordA: (word1, word2, word3) osc keywordB: (word4, word5, word6, word7, word8, word9, word10) osc keywordA: (word1, word2) osc Is it possible to do this with regular expression ? osc Or should I write a small parser ? yes and yes. osc I have tried pattern matching with the 's' and also with the 'm' osc option, osc but with no good result ... please show your code. there is no way to help otherwise. s/// is not a pattern matcher but a substitution operator. it uses regexes and can be used to parse things. uri -- Uri Guttman -- u...@stemsystems.com http://www.sysarch.com -- - Perl Code Review , Architecture, Development, Training, Support -- - Gourmet Hot Cocoa Mix http://bestfriendscocoa.com - -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: Regexp to remove spaces
sftriman wrote: Dr.Ruud: sub trim { ... }#trim You're missing the tr to squash space down To trim() is to remove from head and tail only. Just use it as an example to build a trim_and_normalize(). So I think it can boil down to: sub fixsp7 { s#\A\s+##, s#\s+\z##, tr/ \t\n\r\f/ /s foreach @_; return; } Best remove from the end before removing from the start. -- Ruud -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: Regexp to remove spaces
On Dec 23, 2:31 am, rvtol+use...@isolution.nl (Dr.Ruud) wrote: sftriman wrote: 1ST PLACE - THE WINNER: 5.0s average on 5 runs # Limitation - pointer sub fixsp5 { ${$_[0]}=~tr/ \t\n\r\f/ /s; ${$_[0]}=~s/\A //; ${$_[0]}=~s/ \z//; } Just decide to change in-place, based on the defined-ness of wantarray. sub trim { no warnings 'uninitialized'; if ( defined wantarray ) { # need to return scalar / list my @values= @_; s#^\s+##s, s#\s+$##s foreach @values; return wantarray ? @values : $values[0]; } # need to change in-place s#^\s+##s, s#\s+$##s foreach @_; return; } #trim -- Ruud Hi there, You're missing the tr to squash space down, but I see what you're doing. I never need to trim an array at this point, but if I did... So I think it can boil down to: sub fixsp7 { s#\A\s+##, s#\s+\z##, tr/ \t\n\r\f/ /s foreach @_; return; } This is in keeping consistent with my other 6 test cases. I run it against several test strings including some with line breaks to make sure the results are always the same. Note I am using \A and \z and not ^ and $. Still, I think this has the flavor of what you intended. Result: 5 trial runs over the same data set, 1,000,000 times, average time was 16.30s. All things considered, this puts it in a 4-way tie for 3rd place with the other methods. IF - the times above still stand... And in fact, they don't. Why? CPU usage is high on my box right now. So I baselined the other methods in the 6.0s range, and they are now coming in at 25s! So maybe this one is the fastest! I'll have to do more testing. To be fair, I had to rewrite the former winner as: sub fixsp1a { ${$_[0]}=~s/\A\s+//; ${$_[0]}=~s/\s+\z//; ${$_[0]}=~s/\s+/ /g; } using \A and \z. I wonder how expensive that foreach is. Knowing that it is exactly one argument, is there a faster way for this to run, not using foreach? Even so, this may not be the fastest trim method - in place, no pointer, one line, with the foreach @_ as written. David -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: Regexp to remove spaces
sftriman wrote: So I think it can boil down to: sub fixsp7 { s#\A\s+##, s#\s+\z##, tr/ \t\n\r\f/ /s foreach @_; return; } sub fixsp7 { tr/ \t\n\r\f/ /s, s#\A\s##, s#\s\z## foreach @_; return; } Placing the tr/// first reduces the number of characters scanned for s#\s\z## which makes things slightly faster. -- Just my 0.0002 million dollars worth, Shawn Programming is as much about organization and communication as it is about coding. I like Perl; it's the only language where you can bless your thingy. -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: Regexp to remove spaces
sftriman wrote: 1ST PLACE - THE WINNER: 5.0s average on 5 runs # Limitation - pointer sub fixsp5 { ${$_[0]}=~tr/ \t\n\r\f/ /s; ${$_[0]}=~s/\A //; ${$_[0]}=~s/ \z//; } Just decide to change in-place, based on the defined-ness of wantarray. sub trim { no warnings 'uninitialized'; if ( defined wantarray ) { # need to return scalar / list my @values= @_; s#^\s+##s, s#\s+$##s foreach @values; return wantarray ? @values : $values[0]; } # need to change in-place s#^\s+##s, s#\s+$##s foreach @_; return; }#trim -- Ruud -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: Regexp to remove spaces
Thanks to everyone for their input! So I've tried out many of the methods, first making sure that each works as I intended it. Which is, I'm not concerned with multi-line text, just single line data. That said, I have noted that I should use \A and \z in general over ^ and $. I wrote a 176 byte string for testing, and ran each method 1,000,000 times to time the speed. The winner is: 3 regexp, using tr for intra-string spaces. I found I could make this even faster using a pointer to the variable versus passing in the variable as a local input parameter, modifying, then returning it. (In all cases, my goal is to write a sub for general use anywhere I want it, so I wrote each possibility as a sub. There ARE cases where I need to compare the the original string with the cleaned string, but I can deal with that as need be with local variables.) 1ST PLACE - THE WINNER: 5.0s average on 5 runs # Limitation - pointer sub fixsp5 { ${$_[0]}=~tr/ \t\n\r\f/ /s; ${$_[0]}=~s/\A //; ${$_[0]}=~s/ \z//; } 2nd PLACE - same as above, but with local variables - 6.0s average on 5 runs sub fixsp4 { my ($x)=...@_; $x=~tr/ \t\n\r\f/ /s; $x=~s/\A //; $x=~s/ \z//; return $x; } [ QUESTION - any difference usingmy $x=shift;??? ] 3rd PLACE - 3 way tie, my method, either as variable in, change in place, or pointer - 17.0s average sub fixsp0 { my ($x)=...@_; $x=~s/^\s+//; $x=~s/\s+$//; $x=~s/\s+/ /g; return $x; } # Limitation: pointer sub fixsp1 { ${$_[0]}=~s/^\s+//; ${$_[0]}=~s/\s+$//; ${$_[0]}=~s/\s+/ /g; } # Limitation: change in place sub fixsp2 { $_[0]=~s/^\s+//; $_[0]=~s/\s+$//; $_[0]=~s/\s+/ /g; } 4TH PLACE - 20.0s average on 5 runs (did not try change in place or as pointer) sub fixsp6 { my ($x)=...@_; s/\s+\z//, s/\A\s+//, s/\s+/ /g, for $x; return $x; } 5TH PLACE - DEAD LAST! (or DFL in some parlance) - 62.0s average on 3 runs sub fixsp3 { my ($x)=...@_; $x=~s/^(\s+)|(\s+)$//g; $x=~s/\s+/ /g; return $x; } Any and all comments welcome. David -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: Regexp to remove spaces
sftriman wrote: I use this series of regexp all over the place to clean up lines of text: $x=~s/^\s+//g; $x=~s/\s+$//g; $x=~s/\s+/ /g; in that order, and note the final one replace \s+ with a single space. The g-modifier on the first 2 is bogus (unless you would add an m-modifier). I currently tend to write it like this: s/\s+\z//, s/\A\s+//, s/\s+/ /g, for $x; So first remove tail spaces (less to lshift next). Then remove head spaces. Then normalize. For a multi-line buffer you can do it like this: perl -wle ' my $x = EOT; 123456 \t abc def \t\t\t\t\t\t\t\t *** *** *** \t EOT s/^\s+//mg, s/\s+$//mg, s/[^\S\n]+/ /g for $x; $x =~ s/\n/\n/g; print $x, ; ' 123 456 abc def *** *** *** -- Ruud -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: Regexp to remove spaces
Shawn H Corey wrote: $text =~ tr{\t}{ }; $text =~ tr{\n}{ }; $text =~ tr{\r}{ }; $text =~ tr{\f}{ }; $text =~ tr{ }{ }s; That can be written as: tr/\t\n\r\f/ /, tr/ / /s for $text; But it doesn't remove all leading nor all trailing spaces. -- Ruud -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: Regexp to remove spaces
2009/12/20 Dr.Ruud rvtol+use...@isolution.nl rvtol%2buse...@isolution.nl sftriman wrote: I use this series of regexp all over the place to clean up lines of text: $x=~s/^\s+//g; $x=~s/\s+$//g; $x=~s/\s+/ /g; in that order, and note the final one replace \s+ with a single space. The g-modifier on the first 2 is bogus (unless you would add an m-modifier). I currently tend to write it like this: s/\s+\z//, s/\A\s+//, s/\s+/ /g, for $x; So first remove tail spaces (less to lshift next). Then remove head spaces. Then normalize. For a multi-line buffer you can do it like this: perl -wle ' my $x = EOT; 123456 \t abc def \t\t\t\t\t\t\t\t *** *** *** \t EOT s/^\s+//mg, s/\s+$//mg, s/[^\S\n]+/ /g for $x; I know what it does, but I haven't seen this form of *for* before. Where can I find the description of this syntax in perldoc? Thanks. $x =~ s/\n/\n/g; print $x, ; ' 123 456 abc def *** *** *** -- Ruud -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/ -- missing the days we spend together
Re: Regexp to remove spaces
At 6:11 PM +0800 12/21/09, Albert Q wrote: 2009/12/20 Dr.Ruud rvtol+use...@isolution.nl rvtol%2buse...@isolution.nl For a multi-line buffer you can do it like this: perl -wle ' my $x = EOT; 123456 \t abc def \t\t\t\t\t\t\t\t *** *** *** \t EOT s/^\s+//mg, s/\s+$//mg, s/[^\S\n]+/ /g for $x; I know what it does, but I haven't seen this form of *for* before. Where can I find the description of this syntax in perldoc? That is a question about Perl syntax, so look in perldoc perlsyn. Search for the section on Statement Modifiers, and realize that for and foreach are synonyms. -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Faster way to do a regexp using a hash
I've been wondering for a long time... is there a slick (and hopefully fast!) way to do this? foreach (keys %fixhash) { $x=~s/\b$_\b/$fixhash{$_}/gi; } So if $x=this could be so cool and $fixhash{could}=would; $fixhash{COOL}=awesome; $fixhash{beso}=nope; $fixhash{his}=impossible; then it would end up this would be so awesome Thanks! David -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Regexp to remove spaces
I use this series of regexp all over the place to clean up lines of text: $x=~s/^\s+//g; $x=~s/\s+$//g; $x=~s/\s+/ /g; in that order, and note the final one replace \s+ with a single space. Basically, it's (1) remove all leading space, (2) remove all trailing space, and (3) replace all multi-space with a single space [which, at this point, should only occur on interior characters]. Is there a handy way to do this in one regexp? And, fast? I've been using Devel::NYTProf to study code timing and see that some regexp, especially mine, can be CPU expensive/intensive. Thanks! David -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/