regex help - only one value returned

2020-12-02 Thread Gary Stainburn
I have an array of regex expressions that I apply to text returned from 
tesseract.


Each match that I get then gets stored for future processing. However, 
I'm struggling with one regex.


The problem is that:

1) with brackets round the titles it returns two matches.
2) without brackets, it returns nothing.

Can anyone point me at the correct syntax please.

Gary

[root@dev dev]# ./t
match1='Miss Jayne Doe' 'Miss'
[root@dev dev]# cat t
#!/usr/bin/perl

use strict;
use warnings;

my $T=

Re: regex help - only one value returned

2020-12-02 Thread Vlado Keselj


Well, it seems that the first one is what you want, but you just need to 
use $1 and ignore $2.

You do need parentheses in '(mr|mrs|miss|dr|prof|sir)' but if you do not 
want for them to be captured in $2, you can use:
'(?:mr|mrs|miss|dr|prof|sir)'.  For example:

print "match3='$1' '$2'\n" if
($T=~/^((?:mr|mrs|miss|dr|prof|sir) .{5,}?)\n/smi);

would give output:

match3='Miss Jayne Doe' ''

On Wed, 2 Dec 2020, Gary Stainburn wrote:

> I have an array of regex expressions that I apply to text returned from
> tesseract.
> 
> Each match that I get then gets stored for future processing. However, I'm
> struggling with one regex.
> 
> The problem is that:
> 
> 1) with brackets round the titles it returns two matches.
> 2) without brackets, it returns nothing.
> 
> Can anyone point me at the correct syntax please.
> 
> Gary
> 
> [root@dev dev]# ./t
> match1='Miss Jayne Doe' 'Miss'
> [root@dev dev]# cat t
> #!/usr/bin/perl
> 
> use strict;
> use warnings;
> 
> my $T=< Customer name and address
> Miss Jayne Doe
> 19 Their Street
> Somewehere
> In Yorkshire
> IN1 3YY
> EOF
> 
> print "match1='$1' '$2'\n" if ($T=~/^((mr|mrs|miss|dr|prof|sir)
> .{5,}?)\n/smi);
> print "match2='$1' '$2'\n" if ($T=~/^(mr|mrs|miss|dr|prof|sir .{5,}?)\n/smi);
> [root@dev dev]#
> 
> -- 
> To unsubscribe, e-mail: beginners-unsubscr...@perl.org
> For additional commands, e-mail: beginners-h...@perl.org
> http://learn.perl.org/
> 
> 

-- 
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/




Re: regex help - only one value returned

2020-12-02 Thread Gary Stainburn

On 02/12/2020 13:56, Vlado Keselj wrote:

Well, it seems that the first one is what you want, but you just need to
use $1 and ignore $2.

You do need parentheses in '(mr|mrs|miss|dr|prof|sir)' but if you do not
want for them to be captured in $2, you can use:
'(?:mr|mrs|miss|dr|prof|sir)'.  For example:

print "match3='$1' '$2'\n" if
($T=~/^((?:mr|mrs|miss|dr|prof|sir) .{5,}?)\n/smi);

would give output:

match3='Miss Jayne Doe' ''

Perfect, thank you.

I can't ignore $2 as it's in a loop with other regex that genuinely 
returns multiple matches.  The amendment to the REGEX worked perfectly.


Gary

--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/




Re: regex help - only one value returned

2020-12-02 Thread Jim Gibson
In your original example:

print "match1='$1' '$2'\n" if ($T=~/^((mr|mrs|miss|dr|prof|sir) .{5,}?)\n/smi);
print "match2='$1' '$2'\n" if ($T=~/^(mr|mrs|miss|dr|prof|sir .{5,}?)\n/smi);

the interior parentheses in example one terminates the alternation, so the last 
string is ’sir’.

In example two, the alternation is not terminated until the first ‘)', so the 
last string is ’sir .{5,}?’. followed in the regular expression by the “\n” 
character. Since in $T ‘miss’ is not followed by an \n, the match fails. Vlado 
has explained how to group and terminate the alternation without capturing the 
match result.


> On Dec 2, 2020, at 6:08 AM, Gary Stainburn  
> wrote:
> 
> On 02/12/2020 13:56, Vlado Keselj wrote:
>> Well, it seems that the first one is what you want, but you just need to
>> use $1 and ignore $2.
>> 
>> You do need parentheses in '(mr|mrs|miss|dr|prof|sir)' but if you do not
>> want for them to be captured in $2, you can use:
>> '(?:mr|mrs|miss|dr|prof|sir)'.  For example:
>> 
>> print "match3='$1' '$2'\n" if
>> ($T=~/^((?:mr|mrs|miss|dr|prof|sir) .{5,}?)\n/smi);
>> 
>> would give output:
>> 
>> match3='Miss Jayne Doe' ''
> Perfect, thank you.
> 
> I can't ignore $2 as it's in a loop with other regex that genuinely returns 
> multiple matches.  The amendment to the REGEX worked perfectly.

It is always best to save the results of a match with capturing in another 
variable. The capturing variables $1, $2, etc. are not reassigned if a match 
fails, so if you use them after a failed match, they will be the values left 
over from a previous match. So do this:

my $salutation = $1;
my $name = $2;

If you don’t want a possible undefined value, so this instead:

my $name = $2 || '';


Jim Gibson
j...@gibson.org

--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/