Re: Regex not working correctly
Hi punit, On Wed, 11 Dec 2013 21:04:39 +0530 punit jain contactpunitj...@gmail.com wrote: Hi, I have a requirement where I need to capture phone number from different strings. The strings could be :- 1. COMP TEL NO 919369721113 for computer science 2. For Best Discount reach 092108493, from 6-9 3. Your booking Confirmed, 9210833321 4. price for free consultation call92504060 5. price for free consultation call92504060number I created a regex as below :- #!/usr/bin/perl my $line= shift @ARGV; if($line =~ /(?:(?:\D+|\s+)(?:(91\d{10}|0\d{10}|[7-9]\d{9}|0\d{11})|(?:(?:ph|cal)(\d+|(?:(?:(91\d{10}|0\d{10}|[7-9]\d{9}|0\d{11})|(?:(?:ph|cal)(\d+)))(?:\D+|\s+))/) { print one = $1; } It works fine for 1, 2,3 and prints number however for 4 and 5 one I get number in $2 rather than $1 tough I have pipe operator to check it. Any clue how to fix this ? I suggest you use named captures (a feature of perl-5.10.x-and-above) and then you can do something like: my $my_capture = ($+{'capture1'} // $+{'capture2'}); I think this is the best way to do it. (You can also do $1 // $2, but please don't). Regards, Shlomi Fish -- - Shlomi Fish http://www.shlomifish.org/ The Case for File Swapping - http://shlom.in/file-swap Why can’t we ever attempt to solve a problem in this country without having a “War” on it? -- Rich Thomson, talk.politics.misc Please reply to list if it's a mailing list post - http://shlom.in/reply . -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: Regex not working correctly
On Dec 11, 2013, at 7:34 AM, punit jain contactpunitj...@gmail.com wrote: Hi, I have a requirement where I need to capture phone number from different strings. The strings could be :- 1. COMP TEL NO 919369721113 for computer science 2. For Best Discount reach 092108493, from 6-9 3. Your booking Confirmed, 9210833321 4. price for free consultation call92504060 5. price for free consultation call92504060number I created a regex as below :- #!/usr/bin/perl my $line= shift @ARGV; if($line =~ /(?:(?:\D+|\s+)(?:(91\d{10}|0\d{10}|[7-9]\d{9}|0\d{11})|(?:(?:ph|cal)(\d+|(?:(?:(91\d{10}|0\d{10}|[7-9]\d{9}|0\d{11})|(?:(?:ph|cal)(\d+)))(?:\D+|\s+))/) { print one = $1; } It works fine for 1, 2,3 and prints number however for 4 and 5 one I get number in $2 rather than $1 tough I have pipe operator to check it. Any clue how to fix this ? Your first step is to rewrite the regular expression using the extended syntax x modifier and add some whitespace: if($line =~ m{ (?: (?: \D+ | \s+ ) (?: ( 91\d{10} | 0\d{10} | [7-9]\d{9} | 0\d{11} ) | (?: (?: ph | cal ) (\d+) ) ) ) | (?: (?: ( 91\d{10} | 0\d{10} | [7-9]\d{9} | 0\d{11}) | (?: (?: ph | cal ) (\d+) ) ) (?: \D+ | \s+ ) ) }x ) { Then maybe you will have some hope of figuring out why it doesn’t work (I certainly can’t). I suggest you break it up into a series of if-then-else statements: if( $line =~ /91\d{10} | \\d{10} | [7-9]\d{9} | 0\d{11} ) { $number = $1; }elsif( $line =~ (?:ph|cal)\d+ ) { $number = $1; }elsif( … ) { }else{ print “No match for $line”; } You don’t need to do it all in one regex. Debugging each of those smaller regexes will be easier than debugging the whole thing. -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: Regex not working correctly
Thanks Shlomi, thats a good idea. However at the same time I was trying to understand if something is wrong in my regex. Why would $2 capture the number as I have used :- (?:(91\d{10}|0\d{10}|[7-9]\d{9}|0\d{11})|(?:(?:ph|cal)(\d+))) This would in my understanding match either number with regex 91\d{10}|0\d{10}|[7-9]\d{9}|0\d{11} or with call followed by digits. In my case 4 ( price for free consultation call92504060) why would $1 store an empty string and $2 actually stores the number part ? Regards, Punit
RE: Regex not working correctly
Hi, You can try the below pattern. if($line=~/([0-9]{3,})/gs) { print $1; } Thanks, Vijaya -- From: punit jain Sent: 12/11/2013 9:07 PM To: beginners@perl.org Subject: Regex not working correctly Hi, I have a requirement where I need to capture phone number from different strings. The strings could be :- 1. COMP TEL NO 919369721113 for computer science 2. For Best Discount reach 092108493, from 6-9 3. Your booking Confirmed, 9210833321 4. price for free consultation call92504060 5. price for free consultation call92504060number I created a regex as below :- #!/usr/bin/perl my $line= shift @ARGV; if($line =~ /(?:(?:\D+|\s+)(?:(91\d{10}|0\d{10}|[7-9]\d{9}|0\d{11})|(?:(?:ph|cal)(\d+|(?:(?:(91\d{10}|0\d{10}|[7-9]\d{9}|0\d{11})|(?:(?:ph|cal)(\d+)))(?:\D+|\s+))/) { print one = $1; } It works fine for 1, 2,3 and prints number however for 4 and 5 one I get number in $2 rather than $1 tough I have pipe operator to check it. Any clue how to fix this ?
Re: Regex not working correctly
On Wed, Dec 11, 2013 at 10:35 AM, punit jain contactpunitj...@gmail.comwrote: Thanks Shlomi, thats a good idea. However at the same time I was trying to understand if something is wrong in my regex. Why would $2 capture the number as I have used :- (?:(91\d{10}|0\d{10}|[7-9]\d{9}|0\d{11})|(?:(?:ph|cal)(\d+))) This would in my understanding match either number with regex 91\d{10}|0\d{10}|[7-9]\d{9}|0\d{11} or with call followed by digits. In my case 4 ( price for free consultation call92504060) why would $1 store an empty string and $2 actually stores the number part ? There are two sets of capturing parenthesis: * (91\d{10}|0\d{10}|[7-9]\d{9}|0\d{11}) = $1 * (\d+) = $2 The first set stores its match in $1 and the second set in $2. The pipe (or) does not reset the capture counter back to 1. The counter strictly goes from left to right. -- Robert Wohlfarth
Re: Regex not working correctly
That answers my question. Thanks Robert
Re: Regex not working as expected
Chris, I found your solution to work alright, with the exception that you probably don't want to escape | as in (?:http\|ftp\|file) but only (?:http|ftp|file) So that the test script below succeeds: use Test::More tests = 3; { my $url = ahttp://foo.org//a; $url =~s#((?:http|ftp|file)://.{50}).+/a#$1.../a#g; is($url, ahttp://foo.org//a); } { my $url = ahttp://www.washingtonpost.com/wp-srv/opinions/cartoonsandvideos/toles_main.html?name=Tolesamp;date=09012006/a; $url =~s#((?:http|ftp|file)://.{50}).+/a#$1.../a#g; is($url, ahttp://www.washingtonpost.com/wp-srv/opinions/cartoonsand.../a); } { my $url = ahttp://www.washingtonpost.com/wp-srv/opinions/cartoonsandvideos/toles_main.html?name=Toles/a; $url =~s#((?:http|ftp|file)://.{50}).+/a#$1.../a#g; is($url, ahttp://www.washingtonpost.com/wp-srv/opinions/cartoonsand.../a); } On 9/4/06, Chris Schults [EMAIL PROTECTED] wrote: I have a regular expression that is suppose to truncate long URLs at 50 characters and add ..., but can't figure out why it is not working with a particular URL. Here is the regex: $url =~s#((?:http\|ftp\|file)://.{50}).+/a#$1.../a#g; And here is the problem URL: a href=http://www.washingtonpost.com/wp-srv/opinions/cartoonsandvideos/toles_ main.html?name=Tolesamp;date=09012006http://www.washingtonpost.com/wp-srv /opinions/cartoonsandvideos/toles_main.html?name=Tolesamp;date=09012006/a Here is what the regex should return: a href=http://www.washingtonpost.com/wp-srv/opinions/cartoonsandvideos/toles_ main.html?name=Tolesamp;date=09012006http://www.washingtonpost.com/wp-srv /opinions/cartoonsand.../a Interestingly, the regex works fine on this modified version of the URL: http://www.washingtonpost.com/wp-srv/opinions/cartoonsandvideos/toles_main.h tml?name=Toles I think the digits might have something to do with it, but not sure. Any thoughts? -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/ http://learn.perl.org/first-response
Re: Regex not working as expected when doing multiple checks and updates
Wagner, David --- Senior Programmer Analyst --- WGO wrote: Here is a watered down version, but unclear what I am missing. You should be able to cut and past. It is self contained and I am running on XP Pro, using AS 5.8.4. What am I trying to do? Well I have to implement a new setup. So I pull the reports I use for the emails from one system and place on my test node. I then copy all the output from my first node to holding area. Then I run the processes on my test node. I then copy all the output from the test node to a holding area. I then bring up to a pc. I have a small script that reads the files, then does a count of carriage returns. If carriage returns are equal, then I compare the report output. What I have in the output is timestamps and run times which I need to remove otherwise will never be equal. This one seems so simple, yet it is eluding me. If you want to see more of the output, then you can uncomment the lines needed. Thanks for any insight and what I am doing wrong. What you are doing wrong is improperly using or omitting regular expression options, specifically /g and /m. cut starts on next line: #!perl use strict; use warnings; my @MyWrkP = (); my @MyWrkT = (); my $MyProdFile = 'a.pl026.02.P.txt'; my $MyTestFile = 'a.pl026.02.T.txt';; my $MyPtr = 1; my $MyWrkData = ''; my $MyProdCnt; my $MyTestCnt; while ( DATA ) { if ( /^\s*_{1,2}end\d_{1,2}\s*$/ig ) { You are using the /g option in scalar (boolean) context which is superfluous. if ( /^\s*_{1,2}end\d_{1,2}\s*$/i ) { if ( $MyPtr == 1 ) { $MyWrkP[0] = $MyWrkData; $MyPtr++; #$MyWrkData = $MyWrkP[0]; #$MyWrkData =~ s/\n/;/igm; You are using the /i option but there are no alphabetic characters in the regular expression so it is superfluous. You are using the /m option which affects the behaviour of the ^ and $ anchors which you aren't using so it is superfluous. If you want to translate one character to another character globally you should use the tr/// operator instead. #$MyWrkData =~ tr/\n/;/; #printf Data into Prod\n%s\n, $MyWrkData; #printf Number of ;(c/r): %d\n, ($MyWrkData =~ tr/;//); $MyWrkData = ''; next; }else { $MyWrkT[0] = $MyWrkData; $MyPtr++; #$MyWrkData = $MyWrkT[0]; #$MyWrkData =~ s/\n/;/igm; ^^^ see above. #printf Data into Test\n%s\n, $MyWrkData; #printf Number of ;(c/r): %d\n, ($MyWrkData =~ tr/;//); $MyWrkData = ''; last; } } $MyWrkData .= $_; } # # See if works here # if ( $MyProdFile =~ /\.0[23]\./g ) { ^ see above. $_ = $MyWrkP[0]; s/fes.//ig; if ( $MyTestFile =~ /pl026/ig ) { ^ see above. if ( ! s!^Run Date/Time of Report:\s+\d+\D\d+\D\d+\D\d+\D\d+\D\d+\s*\n!!ig ) { ^ see above. Your regular expression is anchored at the beginning of the string but it will never match there. Because the string contains multiple lines you need to use the /m option to have it match the correct line in the string. if ( ! s!^Run Date/Time of Report:\s+\d+\D\d+\D\d+\D\d+\D\d+\D\d+\s*\n!!im ) { printf No valid hit on change of date/time for pl026(P)\n; printf %s, $MyWrkP[0]; } }else { s/Date:\s+\d+\D\d+\D\d+\D\d+\D\d+\D\d+\s*$//igm; ^ see above. } $MyWrkP[0] = $_; $_ = $MyWrkT[0]; s/fes.//ig; if ( $MyProdFile =~ /pl026/ig ) { if ( ! s!^Run Date/Time of Report:\s+\d+\D\d+\D\d+\D\d+\D\d+\D\d+\s*\n!!ig ) { ^ see above. printf No valid hit on change of date/time for pl026(T)\n; printf %s, $MyWrkT[0]; } }else { s/Date:\s+\d+\D\d+\D\d+\D\d+\D\d+\D\d+\s*$//igm; ^ see above. } $MyWrkT[0] = $_; } $MyProdCnt = ( $MyWrkP[0] =~ tr/\n// ); $MyTestCnt = ( $MyWrkT[0] =~ tr/\n// ); if ( $MyProdCnt == $MyTestCnt ) { printf %6d == %6d-c/r , $MyProdCnt, $MyTestCnt; if ( $MyWrkP[0] eq $MyWrkT[0]) { printf %-13s, 'Contents =='; }else { printf %-13s, 'Contents !='; printf Prod:\n%s\n, $MyWrkP[0]; printf Test:\n%s\n, $MyWrkT[0]; } printf $MyProdFile\n; }else { printf %6d != %6d%-20s$MyProdFile\n, $MyProdCnt, $MyTestCnt, ' '; }
RE: regex is working , then not?
See inline comments -Original Message- From: Jerry Preston [mailto:[EMAIL PROTECTED]] Sent: Wednesday, October 09, 2002 10:36 AM To: Beginners Perl Subject: regex is working , then not? Hi! I do not understand why my regex works , then does not. regex: my (@dat) = /(\w+\s+\w+\s+)=\s+(\w+)_(\w+)_(\w+)_/; Works! Process Name = D4_jerry_5LM_1.91_BF This one has 3 _ (underscores) Returns: Process Name DM4 15C035 5LM Does NOT work: Process Name = d4_jerry_5lm This one has 2 _ (you are matching for 3 in your regex) perhaps you should gather the last half and then split on _: my (@dat) = /(\w+\s+\w+\s+)=\s+([.\w]+)/; push @dat = split /_/, pop @dat; [untested] Is there a better way to write this regex? Thanks, Jerry The views and opinions expressed in this email message are the sender's own, and do not necessarily represent the views and opinions of Summit Systems Inc. -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: regex is working , then not?
Jerry Preston wrote: Hi! I do not understand why my regex works , then does not. regex: my (@dat) = /(\w+\s+\w+\s+)=\s+(\w+)_(\w+)_(\w+)_/; ^ This last underscore is expected. Works! Process Name = D4_jerry_5LM_1.91_BF Returns: Process Name DM4 15C035 5LM Does NOT work: Process Name = d4_jerry_5lm So only Process Name = d4_jerry_5lm_ would work Is there a better way to write this regex? Just remove the unneccessary underscore or add a ? after it. BTW: I believe I would choose a completely different way: my ($key,$name) = split /\s+=\s+/; my @name_part = split /_/, $name; Greetings, -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: regex is working , then not?
OK! I see 2 as to 3. Is there a way to make this regex smart enough to handle both string? Is there a way that (\w+)_ can be changed to 2 to 10? my (@dat) = /(\w+\s+\w+\s+)=\s+(\w+)_(\w+)_(\w+)_/; Thanks, Jerry -Original Message- From: Nikola Janceski [mailto:[EMAIL PROTECTED]] Sent: Wednesday, October 09, 2002 9:41 AM To: '[EMAIL PROTECTED]'; Beginners Perl Subject: RE: regex is working , then not? See inline comments -Original Message- From: Jerry Preston [mailto:[EMAIL PROTECTED]] Sent: Wednesday, October 09, 2002 10:36 AM To: Beginners Perl Subject: regex is working , then not? Hi! I do not understand why my regex works , then does not. regex: my (@dat) = /(\w+\s+\w+\s+)=\s+(\w+)_(\w+)_(\w+)_/; Works! Process Name = D4_jerry_5LM_1.91_BF This one has 3 _ (underscores) Returns: Process Name DM4 15C035 5LM Does NOT work: Process Name = d4_jerry_5lm This one has 2 _ (you are matching for 3 in your regex) perhaps you should gather the last half and then split on _: my (@dat) = /(\w+\s+\w+\s+)=\s+([.\w]+)/; push @dat = split /_/, pop @dat; [untested] Is there a better way to write this regex? Thanks, Jerry The views and opinions expressed in this email message are the sender's own, and do not necessarily represent the views and opinions of Summit Systems Inc. -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: regex is working , then not?
... regex: my (@dat) = /(\w+\s+\w+\s+)=\s+(\w+)_(\w+)_(\w+)_/; Works! Process Name = D4_jerry_5LM_1.91_BF Returns: Process Name DM4 15C035 5LM Does NOT work: Process Name = d4_jerry_5lm ... Hi, is that 3rd _ intended ? If yes, it would work on Process Name = d4_jerry_5lm_ ? HTH, Thorsten -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: regex is working , then not?
Jerry Preston wrote: Is there a way to make this regex smart enough to handle both string? Is there a way that (\w+)_ can be changed to 2 to 10? my (@dat) = /(\w+\s+\w+\s+)=\s+(\w+)_(\w+)_(\w+)_/; Use the split function instead. Greetings, Janek -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: regex is working , then not?
Jerry Preston wrote: Hi! Hello, I do not understand why my regex works , then does not. regex: my (@dat) = /(\w+\s+\w+\s+)=\s+(\w+)_(\w+)_(\w+)_/; Works! Process Name = D4_jerry_5LM_1.91_BF Returns: Process Name DM4 15C035 5LM Does NOT work: Process Name = d4_jerry_5lm Is there a better way to write this regex? $ perl -le' $_ = q/Process Name = D4_jerry_5LM_1.91_BF/; @dat = split /\s*[=_]\s*/; print for @dat; $_ = q/Process Name = d4_jerry_5lm/; @dat = split /\s*[=_]\s*/; print for @dat; ' Process Name D4 jerry 5LM 1.91 BF Process Name d4 jerry 5lm John -- use Perl; program fulfillment -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]