Re: Regexp under PERL
On 8 July 2015 at 19:12, Nagy Tamas (TVI-GmbH) tamas.n...@tvi-gmbh.de wrote: This is the code: } elsif (defined($row) ($row =~ m/\(\*[ ]+\\@PATH\[ ]+:=[ ]+'(\/)?([\*A-Za-z_ ]*(\/)?)+'[ ]\*\)?/)) { # PATH first version: \(\*[ ]+@PATH[ ]+:=[ ]+'(\\/)?([\*A-Za-z_ ]*(\\/)?)+'[ ]\*\)? my @path = split(':=', $row, 2); $temppath = $path[1]; my trimmedpath = split(''', $temppath, 3); $currentpath = trimmedpath[1]; The last )) ist he closing of the elsif. Sorry. Still no idea. Tamas Nagy Again, you're just bolting stuff together in the email client thinking its the code. There's no way that can work. The most obvious here you have three quote marks in split() meaning everything after that is nonsense. Then you use variables without sigils ( which is also nonsense under strict ) And you entirely forget to declare variables ( again, nonsense under strict ). When you eliminate all those superficial defects, the code has no bugs, and executes silently without so much as a squeak. Attached is what I have, and it doesn't replicate the problem. -- Kent KENTNL - https://metacpan.org/author/KENTNL x.pl Description: Perl program -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: Regexp under PERL
On 8 July 2015 at 04:40, Nagy Tamas (TVI-GmbH) tamas.n...@tvi-gmbh.de wrote: m/\(\*[ ]+\\@PATH\[ ]+:=[ ]+'(\/)?([\*A-Za-z_ ]*(\/)?)+'[ ]\*\)?/)) This is not the exact code you 're using obviously, because the last 2 ) marks are actually outside the regex. Removing those ))'s makes the regex compile just fine. So we need the code, not just the regex. Ideally, if you can give some perl code that is minimal that replicates your problem exactly, then that would be very helpful in us helping you. Ideally, your code should be reduced as far as possible till you have the least possible amount of code that demonstrates your problem. Additional notes: Values in @PATH are not relevant to your expression, because you explicitly escape the @ to mean a literal @. If you did not escape it, it would have interpolated. But even then, I'd still have no idea what you are doing :) -- Kent KENTNL - https://metacpan.org/author/KENTNL -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: RegExp
On Sat, 8 Mar 2014 18:20:48 +0530 rakesh sharma rakeshsharm...@hotmail.com wrote: Hi all, how do you get all words starting with letter 'r' in a string. thanks,rakesh /\br/ -- Don't stop where the ink does. Shawn -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: RegExp
Hello Rakesh, On Sat, 8 Mar 2014 18:20:48 +0530 rakesh sharma rakeshsharm...@hotmail.com wrote: Hi all, how do you get all words starting with letter 'r' in a string. thanks,rakesh 1. Find all words in the sentence. Your idea of what is a word will need to be specified. 2. Put them in an array - let's say @words. 3. Use « grep { /\Ar/i } @words » . See: * http://perldoc.perl.org/functions/grep.html * https://metacpan.org/pod/List::MoreUtils * https://metacpan.org/pod/List::Util Regards, — Shlomi Fish -- - Shlomi Fish http://www.shlomifish.org/ Escape from GNU Autohell - http://www.shlomifish.org/open-source/anti/autohell/ There is an IGLU Cabal, but its only purpose is to deny the existence of an IGLU Cabal. — Martha Greenberg Please reply to list if it's a mailing list post - http://shlom.in/reply . -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: RegExp
Am 08.03.2014 13:50, schrieb rakesh sharma: how do you get all words starting with letter 'r' in a string. What have you tried so far? Greetings, Janek -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: RegExp
On Mar 8, 2014, at 4:50 AM, rakesh sharma rakeshsharm...@hotmail.com wrote: Hi all, how do you get all words starting with letter 'r' in a string. Try my @rwords = $string =~ /\br\w*?\b/g; -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: regexp puzzle
On 3/8/2014 12:05 AM, Bill McCormick wrote: I have the following string I want to extract from: my $str = foo (3 bar): baz; and I want to to extract to end up with $p1 = foo; $p2 = 3; $p3 = baz; the complication is that the \s(\d\s.+) is optional, so in then $p2 may not be set. getting close was my ($p1, $p3) = $str =~ /^(.+):\s(.*)$/; How can I make the (3 bar) optional. Here's what I came up with: ($key, $lines, $value) = $_ =~ /^(.+?)(?:\s\((\d)\s.+\))?:\s(.*)$/; --- This email is free from viruses and malware because avast! Antivirus protection is active. http://www.avast.com -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: regexp puzzle
([^]+) \(([0-9]+).*\) ([a-z]+) On Mar 8, 2014 1:07 AM, Bill McCormick wpmccorm...@gmail.com wrote: I have the following string I want to extract from: my $str = foo (3 bar): baz; and I want to to extract to end up with $p1 = foo; $p2 = 3; $p3 = baz; the complication is that the \s(\d\s.+) is optional, so in then $p2 may not be set. getting close was my ($p1, $p3) = $str =~ /^(.+):\s(.*)$/; How can I make the (3 bar) optional. Thanks! --- This email is free from viruses and malware because avast! Antivirus protection is active. http://www.avast.com -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: regexp puzzle
On 3/8/2014 12:41 AM, shawn wilson wrote: my $str = foo (3 bar): baz; my $test = foo (3 bar): baz; my ($p1, $p2, $p3) = $test =~ /([^]+) \(([0-9]+).*\) ([a-z]+)/; print p1=[$p1] p2=[$p2] p3=[$p3]\n; Use of uninitialized value $p1 in concatenation (.) or string at ./lock_report.pl line 11. Use of uninitialized value $p2 in concatenation (.) or string at ./lock_report.pl line 11. Use of uninitialized value $p3 in concatenation (.) or string at ./lock_report.pl line 11. p1=[] p2=[] p3=[] P --- This email is free from viruses and malware because avast! Antivirus protection is active. http://www.avast.com -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: regexp puzzle
On Mar 8, 2014 1:41 AM, shawn wilson ag4ve...@gmail.com wrote: Oh and per optional, just do (?:\([0-9]+).*\)? You should probably use do my @match = $str =~ / ([^]+) (?:\([0-9]+).*\)? ([a-z]+)/; my ($a, $b, $c) = (scalar(@match) == 3 ? @match : $match[0], undef, $match[1]); ([^]+) \(([0-9]+).*\) ([a-z]+) On Mar 8, 2014 1:07 AM, Bill McCormick wpmccorm...@gmail.com wrote: I have the following string I want to extract from: my $str = foo (3 bar): baz; and I want to to extract to end up with $p1 = foo; $p2 = 3; $p3 = baz; the complication is that the \s(\d\s.+) is optional, so in then $p2 may not be set. getting close was my ($p1, $p3) = $str =~ /^(.+):\s(.*)$/; How can I make the (3 bar) optional. Thanks! --- This email is free from viruses and malware because avast! Antivirus protection is active. http://www.avast.com -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: regexp puzzle
On Mar 7, 2014, at 10:05 PM, Bill McCormick wpmccorm...@gmail.com wrote: I have the following string I want to extract from: my $str = foo (3 bar): baz; and I want to to extract to end up with $p1 = foo; $p2 = 3; $p3 = baz; the complication is that the \s(\d\s.+) is optional, so in then $p2 may not be set. getting close was my ($p1, $p3) = $str =~ /^(.+):\s(.*)$/; You can make a substring optional by following it with the ? quantifier. If you substring is more than one character, you can group it with capturing parentheses or a non-capturing grouping construct (?: ). Here is a sample, using the extended regular expression syntax with the x option: my( $p1, $p2, $p3 ) = $str =~ m{ \A (\w+) \s+ (?: \( (\d+) \s+ \w+ \) )? : \s (\w+) }x; if( $p1 $p3 ) { print “p1=$p1, p2=$p2, p3=$p3\n”; }else{ print “No match\n”; } Always test the returned values to see if the match succeeded. So if '(3 bar)’ is not present, does the colon still remain? That will determine if the colon should be inside or outside the optional substring part. -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: regexp as hash value?
On Sat, Jan 25, 2014 at 03:51:53PM +0100, Luca Ferrari wrote: Hi, I'm just wondering if it is possible to place a regexp as a value into an hash so to use it later as something like: my $string =~ $hash_ref-{ $key }; Is it possible? Should I take into account something special? Yes, this is possible. You need to use qr// to construct your RE: $ perl -E '$h = { a = qr/y/ }; say $_ =~ $h-{a} for qw(x y z)' 1 $ -- Paul Johnson - p...@pjcj.net http://www.pjcj.net -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: regexp as hash value?
On Sat, Jan 25, 2014 at 4:12 PM, Paul Johnson p...@pjcj.net wrote: $ perl -E '$h = { a = qr/y/ }; say $_ =~ $h-{a} for qw(x y z)' Thanks, but then another doubt: having a look at http://perldoc.perl.org/perlop.html#Regexp-Quote-Like-Operators I dont understand how I can use the regexp for substitution, that is s/// as an hash value. The following is not working: my $hash = { q/regexp/ = qr/s,from,to,/ }; any clue? THanks, Luca -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: regexp as hash value?
On Sat, Jan 25, 2014 at 06:41:00PM +0100, Luca Ferrari wrote: On Sat, Jan 25, 2014 at 4:12 PM, Paul Johnson p...@pjcj.net wrote: $ perl -E '$h = { a = qr/y/ }; say $_ =~ $h-{a} for qw(x y z)' Thanks, but then another doubt: having a look at http://perldoc.perl.org/perlop.html#Regexp-Quote-Like-Operators I dont understand how I can use the regexp for substitution, that is s/// as an hash value. The following is not working: my $hash = { q/regexp/ = qr/s,from,to,/ }; You won't be able to do the full substitution this way, but you can use the RE in the substitution: $ perl -E '$h = { a = qr/y/ }; say $_ =~ s/$h-{a}/p/r for qw(x y z)' x p z $ If the replacement text is different for each substitution then you may be better served storing anonymous subs in your hash: $ perl -E '$h = { a = sub { s/y/p/r } }; say $h-{a}-() for qw(x y z)' x p z $ Good luck. -- Paul Johnson - p...@pjcj.net http://www.pjcj.net -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: regexp and parsing assistance
On Jun 8, 2013, at 8:06 PM, Noah wrote: Hi there, I am attempting to parse the following output and not quite sure how to do it. The text is in columns and spaced out that way regardless if there are 0 numbers in say col5 or Col 6 or not. If the column has an entry then I want to save it to a variable if there is no entry then that variable will be equal to 'blank' The first line is a header and can be ignored. C Col2 C Col4 Col5 Col6 Col7Col8 new_line * 123.456.789.101/85 A 803Reject new_line B 804 76 10 800.99.999.098765 78910 I new_line O 805 1234 1 800.9.999.1 98765 78910 I new_line If your data consists of constant-width fields, then the best approach is to use the unpack function. See 'perldoc -f unpack' for how to use it and 'perldoc -f pack' for the template parameters that describe your data. This statement will unpack the second and third data lines you have shown, presuming that you have read the lines into the variable $line: my @fields = unpack('A2 A19 A2 A3 A11 A11 A18 A5 A5 A1',$line); However, your data as shown has variable data in the first or second column. If that is really the case, then you will have to look at the first twenty columns of your data and determine where column three starts. Then you can use the unpack function to parse the rest of the columns. Maybe something like this: if( $line =~ /^\s{20}/ ) { # no data in first 20 columns, unpack remainder $line = substr($line,20); }else{ # data in first 20 columns -- remove first two fields $line =~ s/\S+\s\S+\s//; } my @fields = unpack('A2 A3 A11 A11 A18 A5 A5 A1',$line); Exactly what you need to do depends upon the exact nature of your data and how much it varies from line to line. Good luck! -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: regexp and parsing assistance
On 6/9/13 9:00 AM, Jim Gibson wrote: On Jun 8, 2013, at 8:06 PM, Noah wrote: Hi there, I am attempting to parse the following output and not quite sure how to do it. The text is in columns and spaced out that way regardless if there are 0 numbers in say col5 or Col 6 or not. If the column has an entry then I want to save it to a variable if there is no entry then that variable will be equal to 'blank' The first line is a header and can be ignored. C Col2 C Col4 Col5 Col6 Col7Col8 new_line * 123.456.789.101/85 A 803Reject new_line B 804 76 10 800.99.999.098765 78910 I new_line O 805 1234 1 800.9.999.1 98765 78910 I new_line If your data consists of constant-width fields, then the best approach is to use the unpack function. See 'perldoc -f unpack' for how to use it and 'perldoc -f pack' for the template parameters that describe your data. This statement will unpack the second and third data lines you have shown, presuming that you have read the lines into the variable $line: my @fields = unpack('A2 A19 A2 A3 A11 A11 A18 A5 A5 A1',$line); However, your data as shown has variable data in the first or second column. If that is really the case, then you will have to look at the first twenty columns of your data and determine where column three starts. Then you can use the unpack function to parse the rest of the columns. Maybe something like this: if( $line =~ /^\s{20}/ ) { # no data in first 20 columns, unpack remainder $line = substr($line,20); }else{ # data in first 20 columns -- remove first two fields $line =~ s/\S+\s\S+\s//; } my @fields = unpack('A2 A3 A11 A11 A18 A5 A5 A1',$line); Exactly what you need to do depends upon the exact nature of your data and how much it varies from line to line. Good luck! Thanks Jim, Here is the regexp I came up with. Works really well if ($line =~ /^\*\s(\S+)\s+(\S)\s+(\d+)\s{6}([\s\d]{0,5})\s{6}([\s\d]{0,5})[\s\]+(\S+)(.*)/) { then I strip the start and trailing spaces for the scalars I collect. Cheers, Noah -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
RE: regexp validation (arbitrary code execution) (regexp injection)
From: Stanislaw Findeisen Suppose you have a collection of books, and want to provide your users with the ability to search the book title, author or content using regular expressions. But you don't want to let them execute any code. How would you validate/compile/evaluate the user provided regex so as to provide maximum flexibility and prevent code execution? You want them to run an application without having to run an application? That doesn't make any sense. You have several options available to give your users access to a database. 1. Write a client application or applet they can copy or install on their workstation to access the database directly. 2. Write a simpler application or applet that accesses a non-DB server which in turn access the database. 3. Create a site on a web server they can access with a browser, which then accesses the database. There are any number of variations on these themes, but in each case, they have to run some application code somewhere in order to access the data. Bob McConnell -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: regexp validation (arbitrary code execution) (regexp injection)
On 2011-06-02 14:27, Bob McConnell wrote: From: Stanislaw Findeisen Suppose you have a collection of books, and want to provide your users with the ability to search the book title, author or content using regular expressions. But you don't want to let them execute any code. How would you validate/compile/evaluate the user provided regex so as to provide maximum flexibility and prevent code execution? You want them to run an application without having to run an application? That doesn't make any sense. This is a complete misunderstanding. Sorry, perhaps I wasn't clear enough. I was talking about users injecting *their* code via the regex. See for instance: http://perldoc.perl.org/perlretut.html#A-bit-of-magic:-executing-Perl-code-in-a-regular-expression or /e modifier for the built-in function s (search and replace). When doing: $string =~ $regex where $regex is user provided, arbitrary regular expression, anything can happen. -- Eisenbits - proven software solutions: http://www.eisenbits.com/ OpenPGP: E3D9 C030 88F5 D254 434C 6683 17DD 22A0 8A3B 5CC0 -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: regexp validation (arbitrary code execution) (regexp injection)
2011/6/1 Stanisław Findeisen s...@eisenbits.com Suppose you have a collection of books, and want to provide your users with the ability to search the book title, author or content using regular expressions. But you don't want to let them execute any code. How would you validate/compile/evaluate the user provided regex so as to provide maximum flexibility and prevent code execution? -- Eisenbits - proven software solutions: http://www.eisenbits.com/ OpenPGP: E3D9 C030 88F5 D254 434C 6683 17DD 22A0 8A3B 5CC0 -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/ Hi Stanisław, From what you are saying I think you are looking for an option to take a string and check it for any potential bad characters that would cause system to execute unwanted code. So a bit like this: In.*?forest$ is a safe string to feed into your regular expression but: .*/; open my $fh, , $0; close $fh; $_ = ~/ is an evil string causing you a lot of grief. At least that is how I understand your question... To be honest I am not sure if this is an issue as I suspect that the following construction. if ( $title =~ m/$userinput/ ) { do stuff... } will give you any issues as far as I can remember the variable that you are feeding here will not be treated as code by the interpreted but simply as a matching instructions which would mean that what ever your user throws at it perl will in the worst case return an failure to match. But please don't take my word for it try it in a very simple test and see what happens. If you do have to ensure that a user cannot execute any code you could simply prevent the user from entering the ; or smarter yet filter this out from the user input, to prevent a smart user from feeding it to your code via an method other then the front-end you provided. Without a means to close the previous regular expression the user can not really insert executable code into your regular expression. At least thats what I would try but I am by no means an expert in the area and I suspect there might be some people reading this and wondering why I didn't think of A, B or C if so please do speak up people ;-) Regards, Rob
Re: regexp validation (arbitrary code execution) (regexp injection)
Stanisław == Stanisław Findeisen s...@eisenbits.com writes: Stanisław But you don't want to let them execute any code. Unless use re 'eval' is in scope, /$a/ is safe even if $a came from an untrusted source, as long as you limit the run-time to a few seconds or so with an alarm. (Some regex can take nearly forever to fail.) See perldoc perlre for more details. -- Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777 0095 mer...@stonehenge.com URL:http://www.stonehenge.com/merlyn/ Smalltalk/Perl/Unix consulting, Technical writing, Comedy, etc. etc. See http://methodsandmessages.posterous.com/ for Smalltalk discussion -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: regexp validation (arbitrary code execution) (regexp injection)
On Wed, Jun 01, 2011 at 11:25:39PM +0200, Stanisław Findeisen wrote: Suppose you have a collection of books, and want to provide your users with the ability to search the book title, author or content using regular expressions. But you don't want to let them execute any code. How would you validate/compile/evaluate the user provided regex so as to provide maximum flexibility and prevent code execution? In general this shouldn't be a problem provided you don't turn on use re eval; $ perl -e '/$ARGV[0]/' '(?{ print hello })' Eval-group not allowed at runtime, use re 'eval' in regex m/(?{ print hello })/ at -e line 1. $ perl -Mre=eval -e '/$ARGV[0]/' '(?{ print hello })' hello Of course, you're not going to be too worried about people saying hello, but once you can execute arbitrary code all bets are off: $ perl -e '/$ARGV[0]/' '(?{ system sudo mailx -s ha baddie\@example.com /etc/shadow ])' Make sure you don't do the whole match as part of a string eval, and since you're only matching, you shouldn't have to worry about s///e. If you prefer a more paranoid approach you might want to restrict the characters you allow in the user input, but this doesn't provide maximum flexibility. -- Paul Johnson - p...@pjcj.net http://www.pjcj.net -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: regexp matching nummeric ranges
Kammen == Kammen van, Marco, Springer SBM NL marco.vankam...@springer.com writes: Kammen What am I doing wrong?? Using a regex when something else would be much better. Stop trying to pound a nail in with a wrench handle. -- Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777 0095 mer...@stonehenge.com URL:http://www.stonehenge.com/merlyn/ Smalltalk/Perl/Unix consulting, Technical writing, Comedy, etc. etc. See http://methodsandmessages.posterous.com/ for Smalltalk discussion -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: Regexp delimiters
On Dec 7, 9:38 am, p...@utilika.org (Jonathan Pool) wrote: Well, I have no idea why it does what it does, but I can tell you how to make it work: s¶3(456)7¶¶$1¶x; s§3(456)7§§$1§x; Hm, what platform and perl version? No errors here: c:\perl -wE say $^V,$^O;$_='123456789';s§3(456)7§$1§;say v5.12.1MSWin32 1245689 [...] -- Charles DeRykus -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: Regexp delimiters
On Dec 7, 9:38 am, p...@utilika.org (Jonathan Pool) wrote: Well, I have no idea why it does what it does, but I can tell you how to make it work: s¶3(456)7¶¶$1¶x; s§3(456)7§§$1§x; Oops. yes there is: c:\perl -Mutf8 -wE say $^V,$^O;$_='123456789'; s§3(456)7§$1§;say Malformed UTF-8 character (unexpected continuation byte 0xa7, with no preceding start byte) at -e line 1. Malformed UTF-8 character (unexpected continuation byte 0xa7, with no preceding start byte) at -e line 1. v5.12.1MSWin32 1245689 -- Charles DeRykus -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: Regexp delimiters
On Dec 7, 9:38 am, p...@utilika.org (Jonathan Pool) wrote: Well, I have no idea why it does what it does, but I can tell you how to make it work: s¶3(456)7¶¶$1¶x; s§3(456)7§§$1§x; Oops, sorry, yes there is: c:\perl -Mutf8 -wE say $^V,$^O;$_='123456789';s§3(456)7§$1§;say Malformed UTF-8 character (unexpected continuation byte 0xa7, with no preceding start byte) at -e line 1. Malformed UTF-8 character (unexpected continuation byte 0xa7, with no preceding start byte) at -e line 1. v5.12.1MSWin32 1245689 -- Charles DeRykus -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: Regexp delimiters
Hm, what platform and perl version? 5.8.8 and 5.12.2 on RHEL, and 5.10.0 on OS X 10.6. c:\perl -Mutf8 -wE say $^V,$^O;$_='123456789';s§3(456)7§$1§;say Malformed UTF-8 character (unexpected continuation byte 0xa7, with no preceding start byte) at -e line 1. Not the same error as I got. This one looks to me like submitting 256-bit ASCII, where the section sign is A7, after telling the host you would be submitting UTF-8, where the section sign is C2A7. (Sorry if my terminology is wrong.) ˉ -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: Regexp delimiters
c:\perl -wE say $^V,$^O;$_='123456789';s§3(456)7§$1§;say v5.12.1MSWin32 1245689 My equivalent that works is: perl -wE use utf8;my \$_='123456789';s§3(456)7§§\$1§;say; 1245689 If I stop treating this section-sign delimiter as a bracketing delimiter, it fails: perl -wE use utf8;my \$_='123456789';s§3(456)7§\$1§;say; Substitution replacement not terminated at -e line 1. ˉ -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: Regexp delimiters
Well, I have no idea why it does what it does, but I can tell you how to make it work: s¶3(456)7¶¶$1¶x; s§3(456)7§§$1§x; Amazing. Thanks very much. This seems to contradict the documentation. The perlop man page clearly says that there are exactly 4 bracketing delimiters: (), [], {}, and . Everything else should be non-bracketing. But, in fact, several characters that I have tried behave as bracketing delimiters. An exception seems to be combining characters. The two that I tried don't work as either regular or bracketing delimiters. I have tested this on Perl 5.8.8, 5.10.0, and 5.12.2. See the code below for results. The combining characters appear in the last 2 test pairs. #!/usr/bin/perl -w use warnings 'FATAL', 'all'; # Make every warning fatal. use strict; # Require strict checking of variable references, etc. use utf8; # Make Perl interpret the script as UTF-8. my $string = '123456789'; # Initialize a scalar. print The original string is $string\n; # $string =~ s%3(456)7%$1%; # Succeeds # $string =~ s%3(456)7%%$1%; # Fails # $string =~ s§3(456)7§$1§; # Fails # $string =~ s§3(456)7§§$1§; # Succeeds # $string =~ s–3(456)7–$1–; # Fails # $string =~ s–3(456)7––$1–; # Succeeds # $string =~ s“3(456)7“$1“; # Fails # $string =~ s“3(456)7““$1“; # Succeeds # $string =~ s‱3(456)7‱$1‱; # Fails # $string =~ s‱3(456)7‱‱$1‱; # Succeeds # $string =~ s⇧3(456)7⇧$1⇧; # Fails # $string =~ s⇧3(456)7⇧⇧$1⇧; # Succeeds # $string =~ s⃠3(456)7⃠$1⃠; # Fails (single U+20e0) # $string =~ s⃠3(456)7⃠⃠$1⃠; # Fails (double U+20e0) # $string =~ s̸3(456)7̸$1̸; # Fails (single U+0338) # $string =~ s̸3(456)7̸̸$1̸; # Fails (double U+0338) # Modify it (uncomment any one line above.) print The amended string is $string\n; -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: Regexp delimiters
Well, I have no idea why it does what it does, but I can tell you how to make it work: s¶3(456)7¶¶$1¶x; s§3(456)7§§$1§x; For whatever reason, Perl is treating those character as an 'opening' delimiter[0], so that when you write s¶3(456)7¶$1¶;, you are telling Perl that the regex part is delimited by '¶'s, but the substitution part is delimited by '$'s (think of something like s{}//;). Hopefully someone here will be able to enlighten us both further. Brian. [0] http://perldoc.perl.org/perlop.html#Gory-details-of-parsing-quoted-constructs On Sun, Dec 5, 2010 at 6:33 PM, Jonathan Pool p...@utilika.org wrote: The perlop document under s/PATTERN/REPLACEMENT/msixpogce says Any non-whitespace delimiter may replace the slashes.s I take this to mean that any non-whitespace character may be used instead of a slash. However, I am finding that some non-whitespace characters cause errors. For example, using a ¶ or § character instead of a slash causes an error, such as Bareword found where operator expected or Number found where operator expected. When I use a /, #, or ,, I get no error. Here is a script that demonstrates this problem: #!/usr/bin/perl -w use warnings 'FATAL', 'all'; use strict; use utf8; my $string = '123456789'; print The original string is $string\n; $string =~ s§3(456)7§$1§; print The amended string is $string\n; -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: Regexp delimiters
On 10-12-05 05:58 PM, Brian Fraser wrote: Well, I have no idea why it does what it does, but I can tell you how to make it work: s¶3(456)7¶¶$1¶x; s§3(456)7§§$1§x; For whatever reason, Perl is treating those character as an 'opening' delimiter[0], so that when you write s¶3(456)7¶$1¶;, you are telling Perl that the regex part is delimited by '¶'s, but the substitution part is delimited by '$'s (think of something like s{}//;). Hopefully someone here will be able to enlighten us both further. $ perl -e's¶3(456)7¶¶$1¶x;' Unrecognized character \xB6 in column 14 at -e line 1. $ perl -Mutf8 -e's¶3(456)7¶¶$1¶x;' You have to tell perl to use UTF-8. Add this line to the top of your script(s): use utf8; See `perldoc utf8` for more details. -- Just my 0.0002 million dollars worth, Shawn Confusion is the first step of understanding. Programming is as much about organization and communication as it is about coding. The secret to great software: Fail early often. Eliminate software piracy: use only FLOSS. -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: Regexp delimiters
You have to tell perl to use UTF-8. Add this line to the top of your script(s): use utf8; See `perldoc utf8` for more details. Hm, I don't mean to step on your toes or anything, but he is already using utf8. The problem is with some utf8 characters being interpreted as a paired delimiter, I think. Brian.
Re: Regexp delimiters
On 10-12-05 07:38 PM, Brian Fraser wrote: You have to tell perl to use UTF-8. Add this line to the top of your script(s): use utf8; See `perldoc utf8` for more details. Hm, I don't mean to step on your toes or anything, but he is already using utf8. The problem is with some utf8 characters being interpreted as a paired delimiter, I think. Brian. It works for me. What version of Perl is he running? 5.6 does not work well with UTF-8. -- Just my 0.0002 million dollars worth, Shawn Confusion is the first step of understanding. Programming is as much about organization and communication as it is about coding. The secret to great software: Fail early often. Eliminate software piracy: use only FLOSS. -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: Regexp delimiters
That's probably because you are using what I sent, rather than what the OP did: C:\perl -E s§3(456)7§$1§; Unrecognized character \x98 in column 16 at -e line 1. C:\perl -Mutf8 -E s§3(456)7§$1§; Substitution replacement not terminated at -e line 1. C:\perl -E s§3(456)7§§$1§; say Unrecognized character \x98 in column 14 at -e line 1. C:\perl -Mutf8 -E s§3(456)7§§$1§; say C:\ Brian.
Re: regexp matching nummeric ranges
On 30/11/2010 06:39, Uri Guttman wrote: GK == Guruprasad Kulkarniguruprasa...@gmail.com writes: GK Here is another way to do it: GK /^127\.0\.0\.([\d]|[1-9][\d]|[1][\d][\d]|[2]([0-4][\d]|[5][0-4]))$/) { why are you putting single chars inside a char class? [\d] is the same as \d and [1] is just 1. Also this is another solution that wrongly verifies 127.0.0.0. It also unnecessarily makes use of captures instead of grouping, and puts single values into character classes ([1], [\d] etc.). Perhaps it is better written: /^127\.0\.0\.(?: [1-9]\d? | # 1 .. 99 1\d\d| # 100 .. 199 2[0-4]\d | # 200 .. 249 25[0-4]# 250 .. 254 )$/x; But my feeling is that these long-winded pure regex solutions are more of a response to a challenge than a practical solution. At the very least they need commenting to explain what they are doing. Capturing the value of the last byte field, as I suggested, seems to describe the purpose of the code far better, with no significant penalty that I can think of. - Rob -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: regexp matching nummeric ranges
On 29/11/2010 14:22, Kammen van, Marco, Springer SBM NL wrote: Dear List, I've been struggeling with the following: #!/usr/bin/perl use strict; use warnings; my $ip = (127.0.0.255); if ($ip =~ /127\.0\.0\.[2..254]/) { print IP Matched!\n;; } else { print No Match!\n; } For a reason i don't understand: 127.0.0.1 doesn't match as expected... Everything between 127.0.0.2 and 127.0.0.299 matches... 127.0.0.230 doesn't match... What am I doing wrong?? Thanks! Hello Marco Regular expressions can't match a decimal string by value. The regex /[2..254]/ uses a character class which matches a SINGLE character, which may be '2', '5', '4' or '.'. It is the same as /[254.]/ as characters that appear a second time have no effect. To verify the value of a decimal substring you need to add an extra test: if ($ip =~ /^127\.0\.0\.([0-9]+)$/ and 2 = $1 and $1 = 254) { : } In the case of a successful match, this captures the fourth sequence of digits, leaving it in $1. This value can then be tested separately to make sure it is in the desired range. Note that I have added the beginning and end of line anchors ^ and $ which ensure that the the string doesn't just contain a valid IP address, otherwise anything like XXX.127.0.0.200.300.400 would pass the test. HTH, Rob -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: regexp matching nummeric ranges
Kammen van, Marco, Springer SBM NL wrote: Dear List, Hello, I've been struggeling with the following: #!/usr/bin/perl use strict; use warnings; my $ip = (127.0.0.255); if ($ip =~ /127\.0\.0\.[2..254]/) { print IP Matched!\n;; } else { print No Match!\n; } For a reason i don't understand: 127.0.0.1 doesn't match as expected... Everything between 127.0.0.2 and 127.0.0.299 matches... 127.0.0.230 doesn't match... What am I doing wrong?? As Rob said [2..254] is a character class that matches one character (so 127.0.0.230 should match also.) You also don't anchor the pattern so something like '765127.0.0.273646' would match as well. What you need is something like this: #!/usr/bin/perl use strict; use warnings; my $ip = '127.0.0.255'; my $IP_match = qr{ \A # anchor at beginning of string 127\.0\.0\. # match the literal characters (?: [2-9]# match one digit numbers 2 - 9 |# OR [0-9][0-9] # match any two digit number |# OR 1[0-9][0-9] # match 100 - 199 |# OR 2[0-4][0-9] # match 200 - 249 |# OR 25[0-4] # match 250 - 254 ) \z # anchor at end of string }x; if ( $ip =~ $IP_match ) { print IP Matched!\n;; } else { print No Match!\n; } Or, another way to do it: #!/usr/bin/perl use strict; use warnings; use Socket; my $ip = inet_aton '127.0.0.255'; my $start = inet_aton '127.0.0.2'; my $end = inet_aton '127.0.0.254'; if ( $ip ge $start $ip le $end ) { print IP Matched!\n;; } else { print No Match!\n; } John -- Any intelligent fool can make things bigger and more complex... It takes a touch of genius - and a lot of courage to move in the opposite direction. -- Albert Einstein -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: regexp matching nummeric ranges
On 29/11/2010 23:46, John W. Krahn wrote: Kammen van, Marco, Springer SBM NL wrote: Dear List, Hello, I've been struggeling with the following: #!/usr/bin/perl use strict; use warnings; my $ip = (127.0.0.255); if ($ip =~ /127\.0\.0\.[2..254]/) { print IP Matched!\n;; } else { print No Match!\n; } For a reason i don't understand: 127.0.0.1 doesn't match as expected... Everything between 127.0.0.2 and 127.0.0.299 matches... 127.0.0.230 doesn't match... What am I doing wrong?? As Rob said [2..254] is a character class that matches one character (so 127.0.0.230 should match also.) You also don't anchor the pattern so something like '765127.0.0.273646' would match as well. What you need is something like this: #!/usr/bin/perl use strict; use warnings; my $ip = '127.0.0.255'; my $IP_match = qr{ \A # anchor at beginning of string 127\.0\.0\. # match the literal characters (?: [2-9] # match one digit numbers 2 - 9 | # OR [0-9][0-9] # match any two digit number | # OR 1[0-9][0-9] # match 100 - 199 | # OR 2[0-4][0-9] # match 200 - 249 | # OR 25[0-4] # match 250 - 254 ) \z # anchor at end of string }x; if ( $ip =~ $IP_match ) { print IP Matched!\n;; } else { print No Match!\n; } Or, another way to do it: #!/usr/bin/perl use strict; use warnings; use Socket; my $ip = inet_aton '127.0.0.255'; my $start = inet_aton '127.0.0.2'; my $end = inet_aton '127.0.0.254'; if ( $ip ge $start $ip le $end ) { print IP Matched!\n;; } else { print No Match!\n; } This regex solution allows the IP address 127.0.0.01, which is out of range. - Rob -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: regexp matching nummeric ranges
Hi Marco, Here is another way to do it: #!/usr/bin/perl use strict; use warnings; my $ip = 127.0.0.1; if ($ip =~ /^127\.0\.0\.([\d]|[1-9][\d]|[1][\d][\d]|[2]([0-4][\d]|[5][0-4]))$/) { print IP Matched!\n;; } else { print No Match!\n; } On Tue, Nov 30, 2010 at 11:21 AM, Rob Dixon rob.di...@gmx.com wrote: On 29/11/2010 23:46, John W. Krahn wrote: Kammen van, Marco, Springer SBM NL wrote: Dear List, Hello, I've been struggeling with the following: #!/usr/bin/perl use strict; use warnings; my $ip = (127.0.0.255); if ($ip =~ /127\.0\.0\.[2..254]/) { print IP Matched!\n;; } else { print No Match!\n; } For a reason i don't understand: 127.0.0.1 doesn't match as expected... Everything between 127.0.0.2 and 127.0.0.299 matches... 127.0.0.230 doesn't match... What am I doing wrong?? As Rob said [2..254] is a character class that matches one character (so 127.0.0.230 should match also.) You also don't anchor the pattern so something like '765127.0.0.273646' would match as well. What you need is something like this: #!/usr/bin/perl use strict; use warnings; my $ip = '127.0.0.255'; my $IP_match = qr{ \A # anchor at beginning of string 127\.0\.0\. # match the literal characters (?: [2-9] # match one digit numbers 2 - 9 | # OR [0-9][0-9] # match any two digit number | # OR 1[0-9][0-9] # match 100 - 199 | # OR 2[0-4][0-9] # match 200 - 249 | # OR 25[0-4] # match 250 - 254 ) \z # anchor at end of string }x; if ( $ip =~ $IP_match ) { print IP Matched!\n;; } else { print No Match!\n; } Or, another way to do it: #!/usr/bin/perl use strict; use warnings; use Socket; my $ip = inet_aton '127.0.0.255'; my $start = inet_aton '127.0.0.2'; my $end = inet_aton '127.0.0.254'; if ( $ip ge $start $ip le $end ) { print IP Matched!\n;; } else { print No Match!\n; } This regex solution allows the IP address 127.0.0.01, which is out of range. - Rob -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
RE: regexp matching nummeric ranges
-Original Message- From: John W. Krahn [mailto:jwkr...@shaw.ca] Sent: Tuesday, November 30, 2010 12:47 AM To: Perl Beginners Subject: Re: regexp matching nummeric ranges As Rob said [2..254] is a character class that matches one character (so 127.0.0.230 should match also.) You also don't anchor the pattern so something like '765127.0.0.273646' would match as well. What you need is something like this: #!/usr/bin/perl use strict; use warnings; my $ip = '127.0.0.255'; my $IP_match = qr{ \A # anchor at beginning of string 127\.0\.0\. # match the literal characters (?: [2-9]# match one digit numbers 2 - 9 |# OR [0-9][0-9] # match any two digit number |# OR 1[0-9][0-9] # match 100 - 199 |# OR 2[0-4][0-9] # match 200 - 249 |# OR 25[0-4] # match 250 - 254 ) \z # anchor at end of string }x; if ( $ip =~ $IP_match ) { print IP Matched!\n;; } else { print No Match!\n; } Thanks for all the good pointers... This is something I can work with! Marco. -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: regexp matching nummeric ranges
GK == Guruprasad Kulkarni guruprasa...@gmail.com writes: GK Here is another way to do it: GK /^127\.0\.0\.([\d]|[1-9][\d]|[1][\d][\d]|[2]([0-4][\d]|[5][0-4]))$/) { why are you putting single chars inside a char class? [\d] is the same as \d and [1] is just 1. also please don't quote entire emails below your post. learn to bottom post and edit the quoted emails. we read from top to bottom so post that way too. uri -- Uri Guttman -- u...@stemsystems.com http://www.sysarch.com -- - Perl Code Review , Architecture, Development, Training, Support -- - Gourmet Hot Cocoa Mix http://bestfriendscocoa.com - -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: regexp matching nummeric ranges
Rob Dixon wrote: On 29/11/2010 23:46, John W. Krahn wrote: As Rob said [2..254] is a character class that matches one character (so 127.0.0.230 should match also.) You also don't anchor the pattern so something like '765127.0.0.273646' would match as well. What you need is something like this: #!/usr/bin/perl use strict; use warnings; my $ip = '127.0.0.255'; my $IP_match = qr{ \A # anchor at beginning of string 127\.0\.0\. # match the literal characters (?: [2-9] # match one digit numbers 2 - 9 | # OR [0-9][0-9] # match any two digit number This regex solution allows the IP address 127.0.0.01, which is out of range. Yes, sorry, that should be: [1-9][0-9] | # OR 1[0-9][0-9] # match 100 - 199 | # OR 2[0-4][0-9] # match 200 - 249 | # OR 25[0-4] # match 250 - 254 ) \z # anchor at end of string }x; if ( $ip =~ $IP_match ) { print IP Matched!\n;; } else { print No Match!\n; } John -- Any intelligent fool can make things bigger and more complex... It takes a touch of genius - and a lot of courage to move in the opposite direction. -- Albert Einstein -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: Regexp to remove spaces
sftriman wrote: Dr.Ruud: sub trim { ... }#trim You're missing the tr to squash space down To trim() is to remove from head and tail only. Just use it as an example to build a trim_and_normalize(). So I think it can boil down to: sub fixsp7 { s#\A\s+##, s#\s+\z##, tr/ \t\n\r\f/ /s foreach @_; return; } Best remove from the end before removing from the start. -- Ruud -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: Regexp to remove spaces
On Dec 23, 2:31 am, rvtol+use...@isolution.nl (Dr.Ruud) wrote: sftriman wrote: 1ST PLACE - THE WINNER: 5.0s average on 5 runs # Limitation - pointer sub fixsp5 { ${$_[0]}=~tr/ \t\n\r\f/ /s; ${$_[0]}=~s/\A //; ${$_[0]}=~s/ \z//; } Just decide to change in-place, based on the defined-ness of wantarray. sub trim { no warnings 'uninitialized'; if ( defined wantarray ) { # need to return scalar / list my @values= @_; s#^\s+##s, s#\s+$##s foreach @values; return wantarray ? @values : $values[0]; } # need to change in-place s#^\s+##s, s#\s+$##s foreach @_; return; } #trim -- Ruud Hi there, You're missing the tr to squash space down, but I see what you're doing. I never need to trim an array at this point, but if I did... So I think it can boil down to: sub fixsp7 { s#\A\s+##, s#\s+\z##, tr/ \t\n\r\f/ /s foreach @_; return; } This is in keeping consistent with my other 6 test cases. I run it against several test strings including some with line breaks to make sure the results are always the same. Note I am using \A and \z and not ^ and $. Still, I think this has the flavor of what you intended. Result: 5 trial runs over the same data set, 1,000,000 times, average time was 16.30s. All things considered, this puts it in a 4-way tie for 3rd place with the other methods. IF - the times above still stand... And in fact, they don't. Why? CPU usage is high on my box right now. So I baselined the other methods in the 6.0s range, and they are now coming in at 25s! So maybe this one is the fastest! I'll have to do more testing. To be fair, I had to rewrite the former winner as: sub fixsp1a { ${$_[0]}=~s/\A\s+//; ${$_[0]}=~s/\s+\z//; ${$_[0]}=~s/\s+/ /g; } using \A and \z. I wonder how expensive that foreach is. Knowing that it is exactly one argument, is there a faster way for this to run, not using foreach? Even so, this may not be the fastest trim method - in place, no pointer, one line, with the foreach @_ as written. David -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: Regexp to remove spaces
sftriman wrote: So I think it can boil down to: sub fixsp7 { s#\A\s+##, s#\s+\z##, tr/ \t\n\r\f/ /s foreach @_; return; } sub fixsp7 { tr/ \t\n\r\f/ /s, s#\A\s##, s#\s\z## foreach @_; return; } Placing the tr/// first reduces the number of characters scanned for s#\s\z## which makes things slightly faster. -- Just my 0.0002 million dollars worth, Shawn Programming is as much about organization and communication as it is about coding. I like Perl; it's the only language where you can bless your thingy. -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: Regexp to remove spaces
sftriman wrote: 1ST PLACE - THE WINNER: 5.0s average on 5 runs # Limitation - pointer sub fixsp5 { ${$_[0]}=~tr/ \t\n\r\f/ /s; ${$_[0]}=~s/\A //; ${$_[0]}=~s/ \z//; } Just decide to change in-place, based on the defined-ness of wantarray. sub trim { no warnings 'uninitialized'; if ( defined wantarray ) { # need to return scalar / list my @values= @_; s#^\s+##s, s#\s+$##s foreach @values; return wantarray ? @values : $values[0]; } # need to change in-place s#^\s+##s, s#\s+$##s foreach @_; return; }#trim -- Ruud -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: Regexp to remove spaces
Thanks to everyone for their input! So I've tried out many of the methods, first making sure that each works as I intended it. Which is, I'm not concerned with multi-line text, just single line data. That said, I have noted that I should use \A and \z in general over ^ and $. I wrote a 176 byte string for testing, and ran each method 1,000,000 times to time the speed. The winner is: 3 regexp, using tr for intra-string spaces. I found I could make this even faster using a pointer to the variable versus passing in the variable as a local input parameter, modifying, then returning it. (In all cases, my goal is to write a sub for general use anywhere I want it, so I wrote each possibility as a sub. There ARE cases where I need to compare the the original string with the cleaned string, but I can deal with that as need be with local variables.) 1ST PLACE - THE WINNER: 5.0s average on 5 runs # Limitation - pointer sub fixsp5 { ${$_[0]}=~tr/ \t\n\r\f/ /s; ${$_[0]}=~s/\A //; ${$_[0]}=~s/ \z//; } 2nd PLACE - same as above, but with local variables - 6.0s average on 5 runs sub fixsp4 { my ($x)=...@_; $x=~tr/ \t\n\r\f/ /s; $x=~s/\A //; $x=~s/ \z//; return $x; } [ QUESTION - any difference usingmy $x=shift;??? ] 3rd PLACE - 3 way tie, my method, either as variable in, change in place, or pointer - 17.0s average sub fixsp0 { my ($x)=...@_; $x=~s/^\s+//; $x=~s/\s+$//; $x=~s/\s+/ /g; return $x; } # Limitation: pointer sub fixsp1 { ${$_[0]}=~s/^\s+//; ${$_[0]}=~s/\s+$//; ${$_[0]}=~s/\s+/ /g; } # Limitation: change in place sub fixsp2 { $_[0]=~s/^\s+//; $_[0]=~s/\s+$//; $_[0]=~s/\s+/ /g; } 4TH PLACE - 20.0s average on 5 runs (did not try change in place or as pointer) sub fixsp6 { my ($x)=...@_; s/\s+\z//, s/\A\s+//, s/\s+/ /g, for $x; return $x; } 5TH PLACE - DEAD LAST! (or DFL in some parlance) - 62.0s average on 3 runs sub fixsp3 { my ($x)=...@_; $x=~s/^(\s+)|(\s+)$//g; $x=~s/\s+/ /g; return $x; } Any and all comments welcome. David -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: Regexp to remove spaces
sftriman wrote: I use this series of regexp all over the place to clean up lines of text: $x=~s/^\s+//g; $x=~s/\s+$//g; $x=~s/\s+/ /g; in that order, and note the final one replace \s+ with a single space. The g-modifier on the first 2 is bogus (unless you would add an m-modifier). I currently tend to write it like this: s/\s+\z//, s/\A\s+//, s/\s+/ /g, for $x; So first remove tail spaces (less to lshift next). Then remove head spaces. Then normalize. For a multi-line buffer you can do it like this: perl -wle ' my $x = EOT; 123456 \t abc def \t\t\t\t\t\t\t\t *** *** *** \t EOT s/^\s+//mg, s/\s+$//mg, s/[^\S\n]+/ /g for $x; $x =~ s/\n/\n/g; print $x, ; ' 123 456 abc def *** *** *** -- Ruud -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: Regexp to remove spaces
Shawn H Corey wrote: $text =~ tr{\t}{ }; $text =~ tr{\n}{ }; $text =~ tr{\r}{ }; $text =~ tr{\f}{ }; $text =~ tr{ }{ }s; That can be written as: tr/\t\n\r\f/ /, tr/ / /s for $text; But it doesn't remove all leading nor all trailing spaces. -- Ruud -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: Regexp to remove spaces
2009/12/20 Dr.Ruud rvtol+use...@isolution.nl rvtol%2buse...@isolution.nl sftriman wrote: I use this series of regexp all over the place to clean up lines of text: $x=~s/^\s+//g; $x=~s/\s+$//g; $x=~s/\s+/ /g; in that order, and note the final one replace \s+ with a single space. The g-modifier on the first 2 is bogus (unless you would add an m-modifier). I currently tend to write it like this: s/\s+\z//, s/\A\s+//, s/\s+/ /g, for $x; So first remove tail spaces (less to lshift next). Then remove head spaces. Then normalize. For a multi-line buffer you can do it like this: perl -wle ' my $x = EOT; 123456 \t abc def \t\t\t\t\t\t\t\t *** *** *** \t EOT s/^\s+//mg, s/\s+$//mg, s/[^\S\n]+/ /g for $x; I know what it does, but I haven't seen this form of *for* before. Where can I find the description of this syntax in perldoc? Thanks. $x =~ s/\n/\n/g; print $x, ; ' 123 456 abc def *** *** *** -- Ruud -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/ -- missing the days we spend together
Re: Regexp to remove spaces
At 6:11 PM +0800 12/21/09, Albert Q wrote: 2009/12/20 Dr.Ruud rvtol+use...@isolution.nl rvtol%2buse...@isolution.nl For a multi-line buffer you can do it like this: perl -wle ' my $x = EOT; 123456 \t abc def \t\t\t\t\t\t\t\t *** *** *** \t EOT s/^\s+//mg, s/\s+$//mg, s/[^\S\n]+/ /g for $x; I know what it does, but I haven't seen this form of *for* before. Where can I find the description of this syntax in perldoc? That is a question about Perl syntax, so look in perldoc perlsyn. Search for the section on Statement Modifiers, and realize that for and foreach are synonyms. -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: Regexp to remove spaces
2009/12/20 sftriman dal...@gmail.com: I use this series of regexp all over the place to clean up lines of text: $x=~s/^\s+//g; $x=~s/\s+$//g; $x=~s/\s+/ /g; You can probably use $x=~s/^(\s+)|(\s+)$//g; But I don't think it will use any less CPU than the 3 regex option, the nature of Perl's regex engine being what it is. -- Erez The government forgets that George Orwell's 1984 was a warning, and not a blueprint http://www.nonviolent-conflict.org/ -- http://www.whyweprotest.org/ -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: Regexp to remove spaces
sftriman wrote: I use this series of regexp all over the place to clean up lines of text: $x=~s/^\s+//g; $x=~s/\s+$//g; $x=~s/\s+/ /g; in that order, and note the final one replace \s+ with a single space. Basically, it's (1) remove all leading space, (2) remove all trailing space, and (3) replace all multi-space with a single space [which, at this point, should only occur on interior characters]. Is there a handy way to do this in one regexp? And, fast? I've been using Devel::NYTProf to study code timing and see that some regexp, especially mine, can be CPU expensive/intensive. Thanks! David tr/// is generally faster than s/// $text =~ tr{\t}{ }; $text =~ tr{\n}{ }; $text =~ tr{\r}{ }; $text =~ tr{\f}{ }; $text =~ tr{ }{ }s; -- Just my 0.0002 million dollars worth, Shawn Programming is as much about organization and communication as it is about coding. I like Perl; it's the only language where you can bless your thingy. -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: Regexp to remove spaces
Shawn H Corey wrote: sftriman wrote: I use this series of regexp all over the place to clean up lines of text: $x=~s/^\s+//g; $x=~s/\s+$//g; $x=~s/\s+/ /g; in that order, and note the final one replace \s+ with a single space. Basically, it's (1) remove all leading space, (2) remove all trailing space, and (3) replace all multi-space with a single space [which, at this point, should only occur on interior characters]. Is there a handy way to do this in one regexp? And, fast? I've been using Devel::NYTProf to study code timing and see that some regexp, especially mine, can be CPU expensive/intensive. tr/// is generally faster than s/// $text =~ tr{\t}{ }; $text =~ tr{\n}{ }; $text =~ tr{\r}{ }; $text =~ tr{\f}{ }; $text =~ tr{ }{ }s; That can be reduced to: $text =~ tr/ \t\n\r\f/ /s; But that still doesn't remove leading and trailing whitespace so add two more lines: $text =~ tr/ \t\n\r\f/ /s; $text =~ s/\A //; $text =~ s/ \z//; John -- The programmer is fighting against the two most destructive forces in the universe: entropy and human stupidity. -- Damian Conway -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: Regexp to remove spaces
John W. Krahn wrote: That can be reduced to: $text =~ tr/ \t\n\r\f/ /s; But that still doesn't remove leading and trailing whitespace so add two more lines: $text =~ tr/ \t\n\r\f/ /s; $text =~ s/\A //; $text =~ s/ \z//; That was left as an exercise to the reader. Come now, you don't expect the bestest of code early Sunday morning...before I finish my first cup of coffee? If so, I must say that your optimism is only overshadowed by your hope. :) -- Just my 0.0002 million dollars worth, Shawn Programming is as much about organization and communication as it is about coding. I like Perl; it's the only language where you can bless your thingy. -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: Regexp to remove spaces
On Sat, Dec 19, 2009 at 9:13 PM, sftriman dal...@gmail.com wrote: I use this series of regexp all over the place to clean up lines of text: $x=~s/^\s+//g; $x=~s/\s+$//g; $x=~s/\s+/ /g; in that order, and note the final one replace \s+ with a single space. Basically, it's (1) remove all leading space, (2) remove all trailing space, and (3) replace all multi-space with a single space [which, at this point, should only occur on interior characters]. Take a look at the String::Util module. The crunch function, for example, also removes leading/trailing/multiple spaces. -- Robert Wohlfarth
Re: regexp question
Noah Garrett Wallach wrote: Okay I am having troubles finding this. in the perldoc modules. Is there a slicker way to write the following? if ($line =~ /(Blah1)(.*)/) { At this point we know that the pattern matched so $1 will contain the string Blah1 and $2 will contain a string of zero or more non-newline characters. $blah = $1 if $1; The test is redundant because we already know that Blah1 is true. $blah2 = $2 if $2; This only assigns the string in $2 to $blah2 if it is not or 0. } John -- Those people who think they know everything are a great annoyance to those of us who do.-- Isaac Asimov -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: regexp question
On Fri, Sep 4, 2009 at 04:08, Noah Garrett Wallachnoah-l...@enabled.com wrote: Hi there, is there any way to search for the following text? In some cases the text that I am search could be one-two-three- or sometimes the text could be one-two- what is a nice easy why to parse the above - quotes not included. snip What you are looking for is quantifiers. Quantifiers state how many times the preceding pattern must match: if (/^one-two-(?:three-){0,1}$/} { print matched\n } In that example the quantifer {0,1} says that the pattern (?:three-) should be between zero and one times. This is a common enough quantifier that it has gotten a shortcut: if (/^one-two-(?:three-)?$/} { print matched\n } Other shortcuts are + (for {1,}, i.e. one or more) and * (for {0,}, i.e. zero or more). -- Chas. Owens wonkden.net The most important skill a programmer can have is the ability to read. -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: regexp question
2009/9/4 Noah Garrett Wallach noah-l...@enabled.com: is there any way to search for the following text? In some cases the text that I am search could be one-two-three- or sometimes the text could be one-two- If you're looking for this specific text then a good answer was already given, but if that's an example, and what you want is someword-otherword-anotherone- then you may need something more generic like /(?:[a-z]+-){2,3}/ which looks for a series of characters followed by a - that recurrs 2 or 3 times. -- Erez The government forgets that George Orwell's 1984 was a warning, and not a blueprint http://www.nonviolent-conflict.org/ -- http://www.whyweprotest.org/ -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: regexp question
On Fri, Sep 4, 2009 at 23:06, Noah Garrett Wallachnoah-l...@enabled.com wrote: Okay I am having troubles finding this. in the perldoc modules. Is there a slicker way to write the following? if ($line =~ /(Blah1)(.*)/) { $blah = $1 if $1; $blah2 = $2 if $2; } snip In list context a regex will return its captures: my ($blah, $blah2) = $line =~ /(Blah1)(.*)/; -- Chas. Owens wonkden.net The most important skill a programmer can have is the ability to read. -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: Regexp-Question: What is backtracking?
2009/3/12 Deviloper devilo...@slived.net Can somebody explain what Backtracking is? thanx, B. In a nutshell, consider the following regex: /foo((b+)ar)/ a regex engine will check every character in the string that is checked against until it reaches the first f. When reached, it will mark the place and check the next character, if its not an 'o', the mark will be removed. If it is, the next character will be checked and so on. When the string's end will be reached, it will report whether a match was found (and where it was found). Now, in Perl, we would like to use that ((b+)ar) as $1, $2 and so on. This means, that when the engine needs to mark not only the location of the 'f', but also the 'b', when reached. Assuming the string 'snafoobbbar', the engine will need to backtrack over to the first 'b', and each time save the 'b+' string for future reference. When the first 'b' is reached, $1 is now (assuming a match) 'b' and $2 is (potentially) 'bar'. The next character is also a 'b', and with that $1 is now 'bb' and $2 (potentially) 'bbar'. The engine have to make those adjustments accordingly, with each iteration forcing it to readjust all the variables it saves. I do recommend J. Freidl's Mastering Regular Expressions for a better understanding of how those engines function, at least theoretically (as the actual algorithmic implementation might differ much from my basic explanation, or from the descriptions in the book). -- Erez It's time to grow out of the browser; we have nothing to lose but our metaphors. -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
RE: RegExp Problem using Substitutions.
Try this: $s=~s/b|ab*a//g; -Original Message- From: Deviloper [mailto:devilo...@slived.net] Sent: Tuesday, February 24, 2009 4:03 PM To: beginners@perl.org Subject: RegExp Problem using Substitutions. Hi there! I have a string bbbababbaaassass. I want to get a string without any double a 'aa' or and without the 'b's. but if I do: my $s = bbbababbaaassass; $s=~ s/aa|b//g; as a result I will get a string aaassass. (I understand WHY I get this result. But I don´t know how to avoid this. (Using only one substition.)) Thx, B. Please do not print this email unless it is absolutely necessary. The information contained in this electronic message and any attachments to this message are intended for the exclusive use of the addressee(s) and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately and destroy all copies of this message and any attachments. WARNING: Computer viruses can be transmitted via email. The recipient should check this email and any attachments for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. www.wipro.com -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: RegExp Problem using Substitutions.
Deviloper wrote: Hi there! Hello, I have a string bbbababbaaassass. I want to get a string without any double a 'aa' or and without the 'b's. but if I do: my $s = bbbababbaaassass; $s=~ s/aa|b//g; as a result I will get a string aaassass. (I understand WHY I get this result. But I don´t know how to avoid this. (Using only one substition.)) I think you want something like this: $s =~ s/[ab]*(?=a)|b+//g; John -- Those people who think they know everything are a great annoyance to those of us who do.-- Isaac Asimov -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: RegExp Problem using Substitutions.
John W. Krahn wrote: Deviloper wrote: Hi there! Hello, I have a string bbbababbaaassass. I want to get a string without any double a 'aa' or and without the 'b's. but if I do: my $s = bbbababbaaassass; $s=~ s/aa|b//g; as a result I will get a string aaassass. (I understand WHY I get this result. But I don´t know how to avoid this. (Using only one substition.)) I think you want something like this: $s =~ s/[ab]*(?=a)|b+//g; Or perhaps this: $s =~ tr/ab/a/sd; John -- Those people who think they know everything are a great annoyance to those of us who do.-- Isaac Asimov -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: RegExp Problem using Substitutions.
Deviloper wrote: Hi there! I have a string bbbababbaaassass. I want to get a string without any double a 'aa' or and without the 'b's. but if I do: my $s = bbbababbaaassass; $s=~ s/aa|b//g; as a result I will get a string aaassass. (I understand WHY I get this result. But I don´t know how to avoid this. (Using only one substition.)) Why do you insist on using a single substitution? Unless you are entering a competition you should be looking for functionality and clarity - not adherence to some arbitrary challenge. Rob -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: RegExp Searching - part deux
That was easier: #!/usr/bin/perl use strict; use warnings; #my $line = 1elem21elema2a 1 bad13elema2eone 1 bad 1elemb2bone 2 bad1elemc2c13elemc2btwo13elemb2etwo13elem2; my $line = elem1elemaa bad/elemae1 bad elembb1 badelemcc/elemcb2/elembe2/elem1; my $cnt = 0; my @insides = $line =~ m{ elem1(.*?)\/elem1 }gmsx; for my $inside ( @insides ){ print $inside; while( $inside =~ m{ ([^\s\/]*) }gmsx ){ my $element = $1; $cnt = $cnt +1; # unless( $element =~ m{ \A \/ }msx ){ print \n$cnt=$1\n; } #} } 4 any one to use :) Let me know if you see any problems... Thanks again everyone! --- On Thu, 1/8/09, Paul M pjm...@yahoo.com wrote: From: Paul M pjm...@yahoo.com Subject: RegExp Searching - part deux To: beginners@perl.org Date: Thursday, January 8, 2009, 8:18 AM What happens if I have a simple string: my $line = 1elem21elema2a 1 bad13elema2 1 bad elemb2 bad 2 z 1elemc2c13elemc2b13elemb2e13elem2; That must follow simply rules: Find every alpha character string between the numbers one and two. The string may not include the number one two or three. SO: 1 bad13elema2 = no good, contains 1 and 3 1 bad 1elemb2 = no good, contains space 13elema2 = no good, contains 3 ???
Re: RegExp Searching within
$_ = elem1elema/elemaelembelemc/elemc/elemb/elem1; while (/(.*?)(.*?)\/\1/g) { print tag $1 which has $2 inside\n; } Paul M wrote: Hi: Given the following list: elem1 elema/elema elemb elemc/elemc /elemb /elem1 I want to know all the elements within elem1. (Note: It is seriously MALFORMED XML, that is why I am attempting to use regexp). Any ideas. I can get $1 equal to contents of elem1. But after that, somewhat lost. perl -e '$line = elem1elema/elemaelembelemc/elemc/elemb/elem1; $line =~/elem1(.*)\/elem1$/; print $1' My regex to retrieve inner tags is likely to be something simply like ([^]*) ??? -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: RegExp Searching within
From: mer...@stonehenge.com (Randal L. Schwartz) Paul == Paul M pjm...@yahoo.com writes: Paul Note: It is seriously Paul MALFORMED XML That's a nonsense phrase, like somewhat pregnant. It's either XML, or it isn't. And if it isn't, get after the vendor for spewing angle-bracketish stuff at you. Yeah, just like there is no malformed HTML, just angle-bracketish stuff that resembles HTML, no malformed Perl, just line-noisy stuff that resembles Perl, no misspellled English, just strings of characters ... Sometimes bendnig backwards to accommodate the vendor's notion of XML (or some other format) is the only thing you can do. Jenda = je...@krynicky.cz === http://Jenda.Krynicky.cz = When it comes to wine, women and song, wizards are allowed to get drunk and croon as much as they like. -- Terry Pratchett in Sourcery -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: RegExp Searching within
If it were true XML, I would say all children's Node Names. so: elema elemb elemc --- On Mon, 1/5/09, Mr. Shawn H. Corey shawnhco...@magma.ca wrote: From: Mr. Shawn H. Corey shawnhco...@magma.ca Subject: Re: RegExp Searching within To: pjm...@yahoo.com Cc: beginners@perl.org Date: Monday, January 5, 2009, 8:10 AM On Mon, 2009-01-05 at 08:02 -0800, Paul M wrote: I want to know all the elements within elem1. (Note: It is seriously MALFORMED XML, that is why I am attempting to use regexp). Do you want to know all the children or all the descendants? -- Just my 0.0002 million dollars worth, Shawn Programming is as much about organization and communication as it is about coding.
Re: RegExp Searching within
Indeed. #!/usr/bin/perl use strict; use warnings; my $line = elem1elemaa bad/elemae1 bad elembb1 badelemcc/elemcb2/elembe2/elem1; my @insides = $line =~ m{ \elem1\ (.*?) \\/elem1\ }gmsx; for my $inside ( @insides ){ while( $inside =~ m{ \G \([^^\s]*)\[^]* }gmsx ){ my $element = $1; # unless( $element =~ m{ \A \/ }msx ){ print $1\n; # } } } Any ideas... The [^]* works to strip out descendants text nodes but not when and are present. --- On Mon, 1/5/09, Mr. Shawn H. Corey shawnhco...@magma.ca wrote: From: Mr. Shawn H. Corey shawnhco...@magma.ca Subject: Re: RegExp Searching within To: pjm...@yahoo.com Cc: beginners@perl.org Date: Monday, January 5, 2009, 8:55 AM On Mon, 2009-01-05 at 08:17 -0800, Paul M wrote: If it were true XML, I would say all children's Node Names. so: elema elemb elemc You mean all the descendants. The children of elem1 are elema and elemb. The descendants of elem1 are elema, elemb, and elemc. #!/usr/bin/perl use strict; use warnings; my $line = elem1elema/elemaelembelemc/elemc/elemb/elem1; my @insides = $line =~ m{ \elem1\ (.*?) \\/elem1\ }gmsx; for my $inside ( @insides ){ while( $inside =~ m{ \G \([^]*)\ }gmsx ){ my $element = $1; unless( $element =~ m{ \A \/ }msx ){ print $1\n; } } } -- Just my 0.0002 million dollars worth, Shawn Programming is as much about organization and communication as it is about coding.
Re: RegExp Searching within
On Mon, 2009-01-05 at 08:02 -0800, Paul M wrote: I want to know all the elements within elem1. (Note: It is seriously MALFORMED XML, that is why I am attempting to use regexp). Do you want to know all the children or all the descendants? -- Just my 0.0002 million dollars worth, Shawn Programming is as much about organization and communication as it is about coding. -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: RegExp Searching within
From: Paul M pjm...@yahoo.com Hi: Given the following list: elem1 elema/elema elemb elemc/elemc /elemb /elem1 I want to know all the elements within elem1. (Note: It is seriously MALFORMED XML, that is why I am attempting to use regexp). It's hard to say, but it might be easier to fix the XML and then use normal XML tools. Besides if we do not know how it is malformed, we can't be sure the solution we may give you can work. And what XML features we can safely ignore. Eg. are there any ![CDATA[ ... ]] ? Jenda = je...@krynicky.cz === http://Jenda.Krynicky.cz = When it comes to wine, women and song, wizards are allowed to get drunk and croon as much as they like. -- Terry Pratchett in Sourcery -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: RegExp Searching within
On Mon, 2009-01-05 at 08:17 -0800, Paul M wrote: If it were true XML, I would say all children's Node Names. so: elema elemb elemc You mean all the descendants. The children of elem1 are elema and elemb. The descendants of elem1 are elema, elemb, and elemc. #!/usr/bin/perl use strict; use warnings; my $line = elem1elema/elemaelembelemc/elemc/elemb/elem1; my @insides = $line =~ m{ \elem1\ (.*?) \\/elem1\ }gmsx; for my $inside ( @insides ){ while( $inside =~ m{ \G \([^]*)\ }gmsx ){ my $element = $1; unless( $element =~ m{ \A \/ }msx ){ print $1\n; } } } -- Just my 0.0002 million dollars worth, Shawn Programming is as much about organization and communication as it is about coding. -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: RegExp Searching within
Paul == Paul M pjm...@yahoo.com writes: Paul Note: It is seriously Paul MALFORMED XML That's a nonsense phrase, like somewhat pregnant. It's either XML, or it isn't. And if it isn't, get after the vendor for spewing angle-bracketish stuff at you. -- Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777 0095 mer...@stonehenge.com URL:http://www.stonehenge.com/merlyn/ Smalltalk/Perl/Unix consulting, Technical writing, Comedy, etc. etc. See http://methodsandmessages.vox.com/ for Smalltalk and Seaside discussion -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: regexp - end of line question
sftriman wrote: I have data such as: A|B|C|44 X|Y|Z|33,44 C|R|E|44,55,66 T|Q|I|88,33,44 I want to find all lines with 44 in the last field. I was trying: /[,\|]44[,\$]/ which logically is perfect - but the end of line \$ doesn't seem right. How do I write: comma or pipe followed by 44 followed by comma or end of line /[,|]44(?:,|$)/ John -- Perl isn't a toolbox, but a small machine shop where you can special-order certain sorts of tools at low cost and in short order.-- Larry Wall -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/
Re: Regexp: Correct braces
Mr. Shawn H. Corey wrote: On Fri, 2008-10-03 at 12:38 +0100, Rob Dixon wrote: Mr. Shawn H. Corey wrote: Note that if these structures can be nested, you will have to use a FSA with a push-down stack. That will match a line like [wrong) and (wrong] Rob Note that if these structures can be nested, you will have to use a FSA with a push-down stack. To rework the adage, When your only tool is an FSA, every problem looks like it's nested. My example wasn't a nested one. It was just an example of two incorrect brace matches as the OP described them. Rob -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/
Re: Regexp: Correct braces
On Mon, 2008-10-06 at 12:12 +0100, Rob Dixon wrote: Mr. Shawn H. Corey wrote: On Fri, 2008-10-03 at 12:38 +0100, Rob Dixon wrote: Mr. Shawn H. Corey wrote: Note that if these structures can be nested, you will have to use a FSA with a push-down stack. That will match a line like [wrong) and (wrong] Rob Note that if these structures can be nested, you will have to use a FSA with a push-down stack. To rework the adage, When your only tool is an FSA, every problem looks like it's nested. My example wasn't a nested one. It was just an example of two incorrect brace matches as the OP described them. Rob You make it sound like I just discovered them and haven't been using them for the past 30 years. On Fri, 2008-10-03 at 12:24 +0100, Rob Dixon wrote: Vyacheslav Karamov wrote: Hi All! I need to capture something in braces using regular expressions. But I don't need to capture wrong data: [Some text] - correct (Some text) - also correct [Some text) - wrong (Some text] - also wrong HTH, Rob use strict; use warnings; while (DATA) { while ( / ( \[[^])]+\] | \([^])]+\) ) /xg ) { print $1, \n; } } __DATA__ [correct] - correct (also correct) - also correct [wrong) - wrong (also wrong] - also wrong Let's add some more data: __DATA__ [correct] - correct (also correct) - also correct [wrong) - wrong (also wrong] - also wrong (correct (and) nested) - correct, matches: (correct (and) (wrong [and] nested) - wrong, matches: (wrong [and] (correct [and) nested] - correct, matches: (correct [and) The problem is that the OP has specified a data format with nested contexts, even though they may not realize it. If there are nested context or the meaning of the symbols changes, it cannot be parsed with just regular expressions; you need a FSA to parse it. If it has unbounded recursion, you need a FSA with a push-down stack. The sad thing is that no-one teaches how to recognize the different formats so the correct code can be written. -- Just my 0.0002 million dollars worth, Shawn Linux is obsolete. -- Andrew Tanenbaum -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/
Re: Regexp: Correct braces
Mr. Shawn H. Corey wrote: Rob Dixon wrote: My example wasn't a nested one. It was just an example of two incorrect brace matches as the OP described them. You make it sound like I just discovered them and haven't been using them for the past 30 years. If you have been using state machine algorithms for thirty years then I am very surprised that you need to put it about in the way that a teenager would say that he was driving a BMW. I presumed you were in your early twenties and fresh out of a computer science degree. Let's add some more data: This is what the OP posted. __DATA__ [correct] - correct (also correct) - also correct [wrong) - wrong (also wrong] - also wrong And without knowing any criteria you have added these (correct (and) nested) - correct, matches: (correct (and) (wrong [and] nested) - wrong, matches: (wrong [and] (correct [and) nested] - correct, matches: (correct [and) My example was simply [wrong) and (wrong] And then you say this The problem is that the OP has specified a data format with nested contexts, even though they may not realize it. If there are nested context or the meaning of the symbols changes, it cannot be parsed with just regular expressions; you need a FSA to parse it. If it has unbounded recursion, you need a FSA with a push-down stack. The problem is that you have imagined the possibility of nesting. I wonder if you would have said the same thing if the symbols had been quotation marks? Also, 'push-down' is the only sort of stack that I know about, and the only distinction with unbounded recursion is that you need an unbounded stack. The sad thing is that no-one teaches how to recognize the different formats so the correct code can be written. I think the sad thing is that you see nails everywhere. Rob -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/
Re: Regexp: Correct braces
Rob Coops пишет: Try this: (?:Some text not captured) The ?: at the beginning tels perl that even though you want it to see thsi whole group you would not like perl to capture the string. Look up perlre (http://perldoc.perl.org/perlre.html) for some more information on this particulair topic it will lead you to the other pages that hold more information about more fun things you can do with regex's. Regards, Rob On Fri, Oct 3, 2008 at 10:52 AM, Vyacheslav Karamov [EMAIL PROTECTED] mailto:[EMAIL PROTECTED] wrote: Hi All! I need to capture something in braces using regular expressions. But I don't need to capture wrong data: [Some text] - correct (Some text) - also correct [Some text) - wrong (Some text] - also wrong I was misunderstood. I need to capture something in braces (with braces or not. Its not important), but I need to capture if opening brace correspond closing one. -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/
Re: Regexp: Correct braces
On Fri, Oct 3, 2008 at 11:15 AM, Vyacheslav Karamov [EMAIL PROTECTED]wrote: Rob Coops пишет: Try this: (?:Some text not captured) The ?: at the beginning tels perl that even though you want it to see thsi whole group you would not like perl to capture the string. Look up perlre (http://perldoc.perl.org/perlre.html) for some more information on this particulair topic it will lead you to the other pages that hold more information about more fun things you can do with regex's. Regards, Rob On Fri, Oct 3, 2008 at 10:52 AM, Vyacheslav Karamov [EMAIL PROTECTED] mailto:[EMAIL PROTECTED] wrote: Hi All! I need to capture something in braces using regular expressions. But I don't need to capture wrong data: [Some text] - correct (Some text) - also correct [Some text) - wrong (Some text] - also wrong I was misunderstood. I need to capture something in braces (with braces or not. Its not important), but I need to capture if opening brace correspond closing one. Ah, ok thats a different story in that case I would go for somethign along these lines: m/((.*?\)|\[.*?\])/ That should get you a match of (as little stuff as possible in the middle) or [as little stuff as possible in the middle] so only your first fwo lines should match the other two should not, the only thing that might trip you up is a line with multiple braces like [)](] will still be seen as matching if you want to exclude those you could do somehting along the lines of: m/((^\(.*?\])|(^\[.*?\)))/ which should also match the two top lines but will not match [)](] at least I believe it should not I have not tested it as I am being kept quite busy today (delivery deadline today)
Re: Regexp: Correct braces
Vyacheslav Karamov wrote: Hi All! I need to capture something in braces using regular expressions. But I don't need to capture wrong data: [Some text] - correct (Some text) - also correct [Some text) - wrong (Some text] - also wrong HTH, Rob use strict; use warnings; while (DATA) { while ( / ( \[[^])]+\] | \([^])]+\) ) /xg ) { print $1, \n; } } __DATA__ [correct] - correct (also correct) - also correct [wrong) - wrong (also wrong] - also wrong -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/
Re: Regexp: Correct braces
On Fri, 2008-10-03 at 11:52 +0300, Vyacheslav Karamov wrote: Hi All! I need to capture something in braces using regular expressions. But I don't need to capture wrong data: [Some text] - correct (Some text) - also correct [Some text) - wrong (Some text] - also wrong #!/usr/bin/perl use strict; use warnings; while( ){ chomp; while( /\((.*?)\)|\[(.*?)\]/g ){ my $result = $1; $result = $2 unless defined $result; print $result\n; } } __END__ Note that if these structures can be nested, you will have to use a FSA with a push-down stack. -- Just my 0.0002 million dollars worth, Shawn Linux is obsolete. -- Andrew Tanenbaum -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/
Re: Regexp: Correct braces
Mr. Shawn H. Corey wrote: On Fri, 2008-10-03 at 11:52 +0300, Vyacheslav Karamov wrote: Hi All! I need to capture something in braces using regular expressions. But I don't need to capture wrong data: [Some text] - correct (Some text) - also correct [Some text) - wrong (Some text] - also wrong #!/usr/bin/perl use strict; use warnings; while( ){ chomp; while( /\((.*?)\)|\[(.*?)\]/g ){ my $result = $1; $result = $2 unless defined $result; print $result\n; } } __END__ Note that if these structures can be nested, you will have to use a FSA with a push-down stack. That will match a line like [wrong) and (wrong] Rob -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/
Re: Regexp: Correct braces
On Fri, 2008-10-03 at 12:38 +0100, Rob Dixon wrote: Mr. Shawn H. Corey wrote: Note that if these structures can be nested, you will have to use a FSA with a push-down stack. That will match a line like [wrong) and (wrong] Rob Note that if these structures can be nested, you will have to use a FSA with a push-down stack. -- Just my 0.0002 million dollars worth, Shawn Linux is obsolete. -- Andrew Tanenbaum -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/
Re: Regexp: Correct braces
Vyacheslav Karamov schreef: Hi All! I need to capture something in braces using regular expressions. But I don't need to capture wrong data: [Some text] - correct (Some text) - also correct [Some text) - wrong (Some text] - also wrong http://search.cpan.org look for Regexp::Common, more specifically Regexp::Common::balanced. -- Affijn, Ruud Gewoon is een tijger. -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/
Re: Regexp in PERL
$string = 234234; $string =~ s/(\s*)(?=\d+)//g; print ($string); anders wrote: Hi i have som text with and space and then a numbers eg. 234234 I tested to write $line =~ s/\ [0-9]/[0-9]/g; I like it to change 234 to 234 But it made [0-9] Anyone how should i have write to tell it to find and convert corrent. Best regards Anders -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/
Re: Regexp in PERL
$line =~ s/ ([0-9])/$1/g; Thanks this solve the problem for me. Anders -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/
Re: Regexp in PERL
2008/9/10 anders [EMAIL PROTECTED]: Hi i have som text with and space and then a numbers eg. 234234 I tested to write $line =~ s/\ [0-9]/[0-9]/g; I like it to change 234 to 234 If you just want to remove the space between and numbers, try: $line =~ s/\s+//; -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/
Re: Regexp in PERL
anders wrote: Hi i have som text with and space and then a numbers eg. 234234 I tested to write $line =~ s/\ [0-9]/[0-9]/g; I like it to change 234 to 234 But it made [0-9] Anyone how should i have write to tell it to find and convert corrent. $line =~ s/(?=) +(?=[0-9])//g; Or: $line =~ s/ ([0-9])/$1/g; John -- Perl isn't a toolbox, but a small machine shop where you can special-order certain sorts of tools at low cost and in short order.-- Larry Wall -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/
Re: Regexp
In this case my code was actually good, the problem was with the input file whice was, I am embarressed to say, empty. It would be good if you could explain your solution. This is a list for teaching Perl, not just for finding solutions to individual's problems. If you publish your working code it may well help someone else and you would also get the chance to have your working code critiqued by highly knowledgeable Perl programmers. -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/
Re: Regexp
Jim schreef: Given a wordlist @WordList and a file $content, how do I construct a regexp to search for every word in @WordList in $content. I have tried the following, which does not work: foreach $i (@WordList) { print Searching: . $i . \n\n; if($content =~ m/$i/) { print Found Word . $i . \n; } } What do you mean by does not work? Maybe you are looking for m/\b\Q$word\E\b/ or maybe you just forgot to chomp. -- Affijn, Ruud Gewoon is een tijger. -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/
Re: Regexp
I have solved the problem! Thank you for your time. -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/
Re: Regexp
Jim wrote: I have solved the problem! Thank you for your time. It would be good if you could explain your solution. This is a list for teaching Perl, not just for finding solutions to individual's problems. If you publish your working code it may well help someone else and you would also get the chance to have your working code critiqued by highly knowledgeable Perl programmers. Rob -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/
Re: regexp behin assertions ?!
Sorry I hadn't enought time /(?!\d;)(?!WORD.;\s{2})\b(\d+) it is ok a digit not precedeed by a digit and comma a word WORD a character and 2 spaces. Thanks for your help Best Regards On 2 juil, 16:57, [EMAIL PROTECTED] (Obdulio Santana) wrote: Hi Where is a solution(s) of this thread? It's a litle bit weird ask for help, and don't offer a posible solution. At least with a study case would be enough, don't mentioning further details. cheers -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/
Re: regexp behin assertions ?!
I can't it is confidential and I have found. Thanks a lot On 2 juil, 00:46, [EMAIL PROTECTED] (Gunnar Hjalmarsson) wrote: epanda wrote: Gunnar Hjalmarsson wrote: [ Please do not top-post! ] snip Maybe I'm dumb, but it's not clear to me what you want to achieve. It might be easier to help you if you showed us a few _examples_, both of strings that should match and strings that should not match. I can show you sample on hotmail if you want. files can't be shown What would Hotmail have to do with it? Can't you post sample strings here? Never mind; since you keep top-posting, I'm not inclined to help. -- Gunnar Hjalmarsson Email:http://www.gunnar.cc/cgi-bin/contact.pl -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/
Re: regexp behin assertions ?!
Hi Where is a solution(s) of this thread? It's a litle bit weird ask for help, and don't offer a posible solution. At least with a study case would be enough, don't mentioning further details. cheers -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/
Re: regexp behin assertions ?!
In fact I would like my number is not preceded by a word; or a number; I have tried that but error : s/(?!(\w|\d+;))(\d+)(?!(;\d+|\d+|;\s+\d+;\d+))/: variable length lookbehind not implemented at HashTableItems.pl line 207 On 1 juil, 01:23, [EMAIL PROTECTED] (Gunnar Hjalmarsson) wrote: epanda wrote: I would like to identify in a pattern a number wich is not preceded by another number a word or a ';' followed by ;number ~ ;\d+ I have tried this s/\d+(?!\w)/aWord/g without success. You probably want to reverse the order. /(?!\w)\d+/ -- Gunnar Hjalmarsson Email:http://www.gunnar.cc/cgi-bin/contact.pl -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/
Re: regexp behin assertions ?!
[ Please do not top-post! ] epanda wrote: Gunnar Hjalmarsson wrote: epanda wrote: I would like to identify in a pattern a number wich is not preceded by another numbera word or a ';' followed by ;number ~ ;\d+ I have tried this s/\d+(?!\w)/aWord/g without success. You probably want to reverse the order. /(?!\w)\d+/ In fact I would like my number is not preceded by a word; or a number; I have tried that but error : s/(?!(\w|\d+;))(\d+)(?!(;\d+|\d+|;\s+\d+;\d+))/: variable length lookbehind not implemented at HashTableItems.pl line 207 Yes, as is stated in perldoc perlre, (?!pattern) works only for fixed-width look-behind. Hence you may not use the + quantifier. You know that the \w character class includes digits, right? Maybe I'm dumb, but it's not clear to me what you want to achieve. It might be easier to help you if you showed us a few _examples_, both of strings that should match and strings that should not match. -- Gunnar Hjalmarsson Email: http://www.gunnar.cc/cgi-bin/contact.pl -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/
Re: regexp behin assertions ?!
I can show you sample on hotmail if you want. files can't be shown On 1 juil, 21:27, [EMAIL PROTECTED] (Gunnar Hjalmarsson) wrote: [ Please do not top-post! ] epanda wrote: Gunnar Hjalmarsson wrote: epanda wrote: I would like to identify in a pattern a number wich is not preceded by another number a word or a ';' followed by ;number ~ ;\d+ I have tried this s/\d+(?!\w)/aWord/g without success. You probably want to reverse the order. /(?!\w)\d+/ In fact I would like my number is not preceded by a word; or a number; I have tried that but error : s/(?!(\w|\d+;))(\d+)(?!(;\d+|\d+|;\s+\d+;\d+))/: variable length lookbehind not implemented at HashTableItems.pl line 207 Yes, as is stated in perldoc perlre, (?!pattern) works only for fixed-width look-behind. Hence you may not use the + quantifier. You know that the \wcharacter class includes digits, right? Maybe I'm dumb, but it's not clear to me what you want to achieve. It might be easier to help you if you showed us a few _examples_, both of strings that should match and strings that should not match. -- Gunnar Hjalmarsson Email:http://www.gunnar.cc/cgi-bin/contact.pl -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/
Re: regexp behin assertions ?!
epanda wrote: Gunnar Hjalmarsson wrote: [ Please do not top-post! ] snip Maybe I'm dumb, but it's not clear to me what you want to achieve. It might be easier to help you if you showed us a few _examples_, both of strings that should match and strings that should not match. I can show you sample on hotmail if you want. files can't be shown What would Hotmail have to do with it? Can't you post sample strings here? Never mind; since you keep top-posting, I'm not inclined to help. -- Gunnar Hjalmarsson Email: http://www.gunnar.cc/cgi-bin/contact.pl -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/
Re: regexp behin assertions ?!
epanda wrote: I would like to identify in a pattern a number wich is not preceded by another numbera word or a ';' followed by ;number ~ ;\d+ I have tried this s/\d+(?!\w)/aWord/g without success. You probably want to reverse the order. /(?!\w)\d+/ -- Gunnar Hjalmarsson Email: http://www.gunnar.cc/cgi-bin/contact.pl -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/