Re: Regex not working correctly

2013-12-11 Thread Shlomi Fish
Hi punit,

On Wed, 11 Dec 2013 21:04:39 +0530
punit jain contactpunitj...@gmail.com wrote:

 Hi,
 
 I have a requirement where I need to capture phone number from different
 strings.
 
 The strings could be :-
 
 
 1. COMP TEL NO 919369721113  for computer science
 
 2. For Best Discount reach 092108493, from 6-9
 
 3. Your booking Confirmed, 9210833321
 
 4. price for free consultation call92504060
 
 5. price for free consultation call92504060number
 
 I created a regex as below :-
 
 #!/usr/bin/perl
 
 my $line= shift @ARGV;
 
 if($line =~
 /(?:(?:\D+|\s+)(?:(91\d{10}|0\d{10}|[7-9]\d{9}|0\d{11})|(?:(?:ph|cal)(\d+|(?:(?:(91\d{10}|0\d{10}|[7-9]\d{9}|0\d{11})|(?:(?:ph|cal)(\d+)))(?:\D+|\s+))/)
 {
 
 print one = $1;
 
 
 }
 It works fine for 1, 2,3 and prints number however for 4 and 5 one I get
 number in $2 rather than $1 tough I have pipe operator to check it.
 
 Any clue how to fix this ?

I suggest you use named captures (a feature of perl-5.10.x-and-above) and then
you can do something like:

my $my_capture = ($+{'capture1'} // $+{'capture2'});

I think this is the best way to do it. (You can also do $1 // $2, but please
don't).

Regards,

Shlomi Fish


-- 
-
Shlomi Fish   http://www.shlomifish.org/
The Case for File Swapping - http://shlom.in/file-swap

Why can’t we ever attempt to solve a problem in this country without having
a “War” on it? -- Rich Thomson, talk.politics.misc

Please reply to list if it's a mailing list post - http://shlom.in/reply .

--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/




Re: Regex not working correctly

2013-12-11 Thread Jim Gibson

On Dec 11, 2013, at 7:34 AM, punit jain contactpunitj...@gmail.com wrote:

 Hi,
 
 I have a requirement where I need to capture phone number from different 
 strings.
 
 The strings could be :-
 
 
 1. COMP TEL NO 919369721113  for computer science
 
 2. For Best Discount reach 092108493, from 6-9
 
 3. Your booking Confirmed, 9210833321
 
 4. price for free consultation call92504060
 
 5. price for free consultation call92504060number
 
 I created a regex as below :-
 
 #!/usr/bin/perl
 
 my $line= shift @ARGV;
 
 if($line =~ 
 /(?:(?:\D+|\s+)(?:(91\d{10}|0\d{10}|[7-9]\d{9}|0\d{11})|(?:(?:ph|cal)(\d+|(?:(?:(91\d{10}|0\d{10}|[7-9]\d{9}|0\d{11})|(?:(?:ph|cal)(\d+)))(?:\D+|\s+))/)
  {
 print one = $1;
 
 
 
 }
 
 It works fine for 1, 2,3 and prints number however for 4 and 5 one I get 
 number in $2 rather than $1 tough I have pipe operator to check it.
 
 Any clue how to fix this ?

Your first step is to rewrite the regular expression using the extended syntax 
x modifier and add some whitespace:
 
if($line =~ 
m{ 
  (?:
(?: \D+ | \s+ )
(?:
  ( 
91\d{10} | 
0\d{10} |
[7-9]\d{9} |
0\d{11}
  ) |
  (?:
(?:
  ph |
  cal
)
(\d+)
  )
)
  ) |
  (?: 
(?:
  ( 91\d{10} |
0\d{10} |
[7-9]\d{9} |
0\d{11}) |
  (?: 
(?:
  ph | 
  cal
) 
(\d+)
  )
)
(?:
  \D+ |
  \s+
)
  ) 
}x 
) {

Then maybe you will have some hope of figuring out why it doesn’t work (I 
certainly can’t). 

I suggest you break it up into a series of if-then-else statements:

  if( $line =~ /91\d{10} | \\d{10} | [7-9]\d{9} | 0\d{11} ) {
   $number = $1;
  }elsif( $line =~ (?:ph|cal)\d+ ) {
$number = $1;
  }elsif( … ) {
  }else{
print “No match for $line”;
  }

You don’t need to do it all in one regex. Debugging each of those smaller 
regexes will be easier than debugging the whole thing.



--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/




Re: Regex not working correctly

2013-12-11 Thread punit jain
Thanks Shlomi, thats a good idea. However at the same time I was trying to
understand if something is wrong in my regex. Why would $2 capture the
number as I have used :-

(?:(91\d{10}|0\d{10}|[7-9]\d{9}|0\d{11})|(?:(?:ph|cal)(\d+)))

This would in my understanding match either number with regex
91\d{10}|0\d{10}|[7-9]\d{9}|0\d{11}
or with call followed by digits.

In my case 4 ( price for free consultation call92504060) why would $1 store
an empty string and $2 actually stores the number part ?

Regards,
Punit


RE: Regex not working correctly

2013-12-11 Thread vijaya R
Hi,

You can try the below pattern.

if($line=~/([0-9]{3,})/gs) {
print $1;
}

Thanks,
Vijaya

--
From: punit jain
Sent: 12/11/2013 9:07 PM
To: beginners@perl.org
Subject: Regex not working correctly

Hi,

I have a requirement where I need to capture phone number from different
strings.

The strings could be :-


1. COMP TEL NO 919369721113  for computer science

2. For Best Discount reach 092108493, from 6-9

3. Your booking Confirmed, 9210833321

4. price for free consultation call92504060

5. price for free consultation call92504060number

I created a regex as below :-

#!/usr/bin/perl

my $line= shift @ARGV;

if($line =~
/(?:(?:\D+|\s+)(?:(91\d{10}|0\d{10}|[7-9]\d{9}|0\d{11})|(?:(?:ph|cal)(\d+|(?:(?:(91\d{10}|0\d{10}|[7-9]\d{9}|0\d{11})|(?:(?:ph|cal)(\d+)))(?:\D+|\s+))/)
{

print one = $1;


}
It works fine for 1, 2,3 and prints number however for 4 and 5 one I get
number in $2 rather than $1 tough I have pipe operator to check it.

Any clue how to fix this ?


Re: Regex not working correctly

2013-12-11 Thread Robert Wohlfarth
On Wed, Dec 11, 2013 at 10:35 AM, punit jain contactpunitj...@gmail.comwrote:


 Thanks Shlomi, thats a good idea. However at the same time I was trying to
 understand if something is wrong in my regex. Why would $2 capture the
 number as I have used :-

 (?:(91\d{10}|0\d{10}|[7-9]\d{9}|0\d{11})|(?:(?:ph|cal)(\d+)))

 This would in my understanding match either number with regex 
 91\d{10}|0\d{10}|[7-9]\d{9}|0\d{11}
 or with call followed by digits.

 In my case 4 ( price for free consultation call92504060) why would $1
 store an empty string and $2 actually stores the number part ?


There are two sets of capturing parenthesis:
* (91\d{10}|0\d{10}|[7-9]\d{9}|0\d{11}) = $1
* (\d+) = $2

The first set stores its match in $1 and the second set in $2. The pipe
(or) does not reset the capture counter back to 1. The counter strictly
goes from left to right.

-- 
Robert Wohlfarth


Re: Regex not working correctly

2013-12-11 Thread punit jain
That answers my question.

Thanks Robert


Re: Regex not working as expected

2006-09-04 Thread Adriano Ferreira

Chris,

I found your solution to work alright, with the exception that you
probably don't want to escape | as in

   (?:http\|ftp\|file)

but only

   (?:http|ftp|file)

So that the test script below succeeds:

   use Test::More tests = 3;

   {
   my $url = ahttp://foo.org//a;
   $url =~s#((?:http|ftp|file)://.{50}).+/a#$1.../a#g;
   is($url, ahttp://foo.org//a);
   }

   {
   my $url = 
ahttp://www.washingtonpost.com/wp-srv/opinions/cartoonsandvideos/toles_main.html?name=Tolesamp;date=09012006/a;
   $url =~s#((?:http|ftp|file)://.{50}).+/a#$1.../a#g;
   is($url, 
ahttp://www.washingtonpost.com/wp-srv/opinions/cartoonsand.../a);
   }

   {
   my $url = 
ahttp://www.washingtonpost.com/wp-srv/opinions/cartoonsandvideos/toles_main.html?name=Toles/a;
   $url =~s#((?:http|ftp|file)://.{50}).+/a#$1.../a#g;
   is($url, 
ahttp://www.washingtonpost.com/wp-srv/opinions/cartoonsand.../a);
   }


On 9/4/06, Chris Schults [EMAIL PROTECTED] wrote:

I have a regular expression that is suppose to truncate long URLs at 50
characters and add ..., but can't figure out why it is not working with a
particular URL.

Here is the regex:

$url =~s#((?:http\|ftp\|file)://.{50}).+/a#$1.../a#g;

And here is the problem URL:

a
href=http://www.washingtonpost.com/wp-srv/opinions/cartoonsandvideos/toles_
main.html?name=Tolesamp;date=09012006http://www.washingtonpost.com/wp-srv
/opinions/cartoonsandvideos/toles_main.html?name=Tolesamp;date=09012006/a

Here is what the regex should return:

a
href=http://www.washingtonpost.com/wp-srv/opinions/cartoonsandvideos/toles_
main.html?name=Tolesamp;date=09012006http://www.washingtonpost.com/wp-srv
/opinions/cartoonsand.../a

Interestingly, the regex works fine on this modified version of the URL:

http://www.washingtonpost.com/wp-srv/opinions/cartoonsandvideos/toles_main.h
tml?name=Toles

I think the digits might have something to do with it, but not sure.

Any thoughts?


--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/ http://learn.perl.org/first-response




Re: Regex not working as expected when doing multiple checks and updates

2005-02-11 Thread John W. Krahn
Wagner, David --- Senior Programmer Analyst --- WGO wrote:
Here is a watered down version, but unclear what I am missing. You should
be able to cut and past. It is self contained and I am running on XP Pro,
using AS 5.8.4.
What am I trying to do? Well I have to implement a new setup. So I pull
the reports I use for the emails from one system and place on my test node.
 I then copy all the output from my first node to holding area. Then I run
the processes on my test node. I then copy all the output from the test
node to a holding area.
I then bring up to a pc.
I have a small script that reads the files, then does a count of carriage
returns. If carriage returns are equal, then I compare the report output.
What I have in the output is timestamps and run times which I need to
remove otherwise will never be equal.
This one seems so simple, yet it is eluding me.  If you want to see more of
the output, then you can uncomment the lines needed.
Thanks for any insight and what I am doing wrong.
What you are doing wrong is improperly using or omitting regular expression 
options, specifically /g and /m.


cut starts on next line:
#!perl 

use strict;
use warnings;
my @MyWrkP = ();
my @MyWrkT = ();
my $MyProdFile = 'a.pl026.02.P.txt';
my $MyTestFile = 'a.pl026.02.T.txt';;
my $MyPtr = 1;
my $MyWrkData = '';
my $MyProdCnt;
my $MyTestCnt;
while ( DATA ) {
if ( /^\s*_{1,2}end\d_{1,2}\s*$/ig ) {
You are using the /g option in scalar (boolean) context which is 
superfluous.
  if ( /^\s*_{1,2}end\d_{1,2}\s*$/i ) {

if ( $MyPtr == 1 ) {
$MyWrkP[0] = $MyWrkData;
$MyPtr++;
#$MyWrkData = $MyWrkP[0];
#$MyWrkData =~ s/\n/;/igm;
You are using the /i option but there are no alphabetic characters in the 
regular expression so it is superfluous.  You are using the /m option which 
affects the behaviour of the ^ and $ anchors which you aren't using so it is 
superfluous.  If you want to translate one character to another character 
globally you should use the tr/// operator instead.

#$MyWrkData =~ tr/\n/;/;

#printf Data into Prod\n%s\n, $MyWrkData;
#printf Number of ;(c/r): %d\n, ($MyWrkData =~ tr/;//);
$MyWrkData = '';
next;
 }else {
$MyWrkT[0] = $MyWrkData;
$MyPtr++;
#$MyWrkData = $MyWrkT[0];
#$MyWrkData =~ s/\n/;/igm;
^^^  see above.
#printf Data into Test\n%s\n, $MyWrkData;
#printf Number of ;(c/r): %d\n, ($MyWrkData =~ tr/;//);
$MyWrkData = '';
last;
 }
 }
$MyWrkData .= $_;
 }
 
#
# See if works here
#
if ( $MyProdFile =~ /\.0[23]\./g ) {
 ^  see above.

$_ = $MyWrkP[0];
s/fes.//ig;
if ( $MyTestFile =~ /pl026/ig ) {
  ^  see above.

if ( ! s!^Run Date/Time of Report:\s+\d+\D\d+\D\d+\D\d+\D\d+\D\d+\s*\n!!ig ) {
  ^  see above.
Your regular expression is anchored at the beginning of the string but it will 
never match there.  Because the string contains multiple lines you need to use 
the /m option to have it match the correct line in the string.

 if ( ! s!^Run Date/Time of 
Report:\s+\d+\D\d+\D\d+\D\d+\D\d+\D\d+\s*\n!!im ) {


printf No valid hit on change of date/time for pl026(P)\n;
printf %s, $MyWrkP[0];
 }
 }else {
s/Date:\s+\d+\D\d+\D\d+\D\d+\D\d+\D\d+\s*$//igm;
   ^  see above.

 }
$MyWrkP[0] = $_;
$_ = $MyWrkT[0];
s/fes.//ig;
if ( $MyProdFile =~ /pl026/ig ) {
if ( ! s!^Run Date/Time of 
Report:\s+\d+\D\d+\D\d+\D\d+\D\d+\D\d+\s*\n!!ig ) {
  ^  see above.

printf No valid hit on change of date/time for pl026(T)\n;
printf %s, $MyWrkT[0];
 }
 }else {
s/Date:\s+\d+\D\d+\D\d+\D\d+\D\d+\D\d+\s*$//igm;
   ^  see above.

 }
$MyWrkT[0] = $_;
 }
$MyProdCnt = ( $MyWrkP[0] =~ tr/\n// );
$MyTestCnt = ( $MyWrkT[0] =~ tr/\n// );
if ( $MyProdCnt == $MyTestCnt ) {
printf   %6d == %6d-c/r  ,
$MyProdCnt,
$MyTestCnt;
if ( $MyWrkP[0] eq $MyWrkT[0]) {
printf %-13s,
'Contents ==';
 }else {
printf %-13s,
'Contents !=';
printf Prod:\n%s\n, $MyWrkP[0];
printf Test:\n%s\n, $MyWrkT[0];
 }
printf $MyProdFile\n;
 }else {
printf   %6d != %6d%-20s$MyProdFile\n,
$MyProdCnt,
$MyTestCnt,
' ';
 }

RE: regex is working , then not?

2002-10-09 Thread Nikola Janceski

See inline comments

 -Original Message-
 From: Jerry Preston [mailto:[EMAIL PROTECTED]]
 Sent: Wednesday, October 09, 2002 10:36 AM
 To: Beginners Perl
 Subject: regex is working , then not?
 
 
 Hi!
 
 I do not understand why my regex works , then does not.
 
 regex:
 
 my  (@dat) = /(\w+\s+\w+\s+)=\s+(\w+)_(\w+)_(\w+)_/;
 
 
 Works!
 
 Process Name = D4_jerry_5LM_1.91_BF 
This one has 3 _ (underscores)

 
 Returns:
 
Process Name DM4 15C035 5LM
 
 Does NOT work:
 
Process Name = d4_jerry_5lm 
This one has 2 _ (you are matching for 3 in your regex)

perhaps you should gather the last half and then split on _:
my  (@dat) = /(\w+\s+\w+\s+)=\s+([.\w]+)/;
push @dat = split /_/, pop @dat;
[untested]


 
 Is there a better way to write this regex?
 
 Thanks,
 
 Jerry
 



The views and opinions expressed in this email message are the sender's
own, and do not necessarily represent the views and opinions of Summit
Systems Inc.


-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




Re: regex is working , then not?

2002-10-09 Thread Janek Schleicher

Jerry Preston wrote:

 Hi!
 
 I do not understand why my regex works , then does not.
 
 regex:
 
 my  (@dat) = /(\w+\s+\w+\s+)=\s+(\w+)_(\w+)_(\w+)_/;

^
This last underscore is expected.


 
 Works!
 
 Process Name = D4_jerry_5LM_1.91_BF 
 
 Returns:
 
Process Name DM4 15C035 5LM
 
 Does NOT work:
 
Process Name = d4_jerry_5lm 


So only
Process Name = d4_jerry_5lm_
would work


 Is there a better way to write this regex?

Just remove the unneccessary underscore or add a ? after it.

BTW: I believe I would choose a completely different way:

my ($key,$name) = split /\s+=\s+/;
my @name_part   = split /_/, $name;

Greetings,


-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




RE: regex is working , then not?

2002-10-09 Thread Jerry Preston

OK!

I see 2 as to 3.

Is there a way to make this regex smart enough to handle both string? Is
there a way that (\w+)_ can be changed to 2 to 10?

my  (@dat) = /(\w+\s+\w+\s+)=\s+(\w+)_(\w+)_(\w+)_/;

Thanks,

Jerry


-Original Message-
From: Nikola Janceski [mailto:[EMAIL PROTECTED]]
Sent: Wednesday, October 09, 2002 9:41 AM
To: '[EMAIL PROTECTED]'; Beginners Perl
Subject: RE: regex is working , then not?


See inline comments

 -Original Message-
 From: Jerry Preston [mailto:[EMAIL PROTECTED]]
 Sent: Wednesday, October 09, 2002 10:36 AM
 To: Beginners Perl
 Subject: regex is working , then not?


 Hi!

 I do not understand why my regex works , then does not.

 regex:

 my  (@dat) = /(\w+\s+\w+\s+)=\s+(\w+)_(\w+)_(\w+)_/;


 Works!

 Process Name = D4_jerry_5LM_1.91_BF
This one has 3 _ (underscores)


 Returns:

Process Name DM4 15C035 5LM

 Does NOT work:

Process Name = d4_jerry_5lm
This one has 2 _ (you are matching for 3 in your regex)

perhaps you should gather the last half and then split on _:
my  (@dat) = /(\w+\s+\w+\s+)=\s+([.\w]+)/;
push @dat = split /_/, pop @dat;
[untested]



 Is there a better way to write this regex?

 Thanks,

 Jerry




The views and opinions expressed in this email message are the sender's
own, and do not necessarily represent the views and opinions of Summit
Systems Inc.


-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




Re: regex is working , then not?

2002-10-09 Thread Thorsten Dieckhoff

 ...
 regex:

 my  (@dat) = /(\w+\s+\w+\s+)=\s+(\w+)_(\w+)_(\w+)_/;


 Works!

 Process Name = D4_jerry_5LM_1.91_BF

 Returns:

Process Name DM4 15C035 5LM

 Does NOT work:

Process Name = d4_jerry_5lm
 ...

Hi, is that 3rd _ intended ? If yes, it would work on Process Name =
d4_jerry_5lm_  ? HTH, Thorsten


-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




Re: regex is working , then not?

2002-10-09 Thread Janek Schleicher

Jerry Preston wrote:

 Is there a way to make this regex smart enough to handle both string? Is
 there a way that (\w+)_ can be changed to 2 to 10?
 
 my  (@dat) = /(\w+\s+\w+\s+)=\s+(\w+)_(\w+)_(\w+)_/;

Use the split function instead.


Greetings,
Janek


-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




Re: regex is working , then not?

2002-10-09 Thread John W. Krahn

Jerry Preston wrote:
 
 Hi!

Hello,

 I do not understand why my regex works , then does not.
 
 regex:
 my  (@dat) = /(\w+\s+\w+\s+)=\s+(\w+)_(\w+)_(\w+)_/;
 Works!
 
 Process Name = D4_jerry_5LM_1.91_BF
 Returns:
Process Name DM4 15C035 5LM
 
 Does NOT work:
Process Name = d4_jerry_5lm
 
 Is there a better way to write this regex?


$ perl -le'
$_ = q/Process Name = D4_jerry_5LM_1.91_BF/;
@dat = split /\s*[=_]\s*/;
print for @dat;
$_ = q/Process Name = d4_jerry_5lm/;
@dat = split /\s*[=_]\s*/;
print for @dat;
'
Process Name
D4
jerry
5LM
1.91
BF
Process Name
d4
jerry
5lm



John
-- 
use Perl;
program
fulfillment

-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]