Brilliant - a first class "teach-in".  Many thanks for the description - it
all works.  Brilliant.


-----Original Message-----
From: Rob Dixon [mailto:rob.di...@gmx.com] 
Sent: 26 May 2012 2:13 PM
To: beginners@perl.org
Cc: Christopher Gray
Subject: Re: Help required to extract multiple text fields from a text
string

On 25/05/2012 21:51, Christopher Gray wrote:
> Good day,
>
> I have a text file containing records.  While I can extract single 
> sub-strings, I cannot extract multiple sub-strings.
>
> The records are of multiple types - only about a third of which have 
> the data I need.
>
> An example of a "good" record is
>
>   Abc1234 STATUS   open  DESCRIPTION "A basket of melons" :: { fruittype
1}
>
> I'm trying to extract the first (Abc1234), second (open), third (A 
> basket of
> melons) and fourth (1) strings.
>
> I can extract each of them separately - but not together.
>
> So - for example:
>
>        while (<FILE>) {
>            chomp;
>              next if !/\{\s+fruittype\s+(\d+)\s+}/;
>              my $Temp =$1;
>        }
>
> Extracts the fruittype.  However, when I try and have multiple extracts:
>
>    ...
>             next if !/\STATUS\s+(\w+)\s+\{\s+fruittype\s+(\d+)\s+}/;
>    ...
> It fails.
>
> What have I done wrong?

Hi Chris

Lets look at your regex

   /\STATUS\s+(\w+)\s+\{\s+fruittype\s+(\d+)\s+}/

First of all you start with a \S, which will actually match any non-space
character, not the S that you intended. But that wouldn't break your regex.

The regex as a whole is looking for

   'STATUS'
   some whitespace
   some 'word' characters (0..9, A..Z, a..z and _)
   some whitespace
   an open brace '{'
   some whitespace
   'fruittype'
   some whitespace
   some digits (0..9)
   some whitespace
   a closing brace (which should reall be escaped but Perl forgives you)

This will match a string like

   '---XTATUS  www  {  fruittype  999  }'

But since "STATUS open " is followed by "DESCRIPTION" in your record instead
of an opening brace the match fails.

To match multiple fields, you can write a regex that matches the entire
string with parentheses around the parts that must be captured. This will do
what you want

   while (<DATA>) {
     my @data =
/(\w+)\s+STATUS\s+(\w+)\s+DESCRIPTION\s+"([^"]+)"\s+::\s+\{\s*fruittype\s+(\
d+)\s*\}/;
     print "$_\n" for @data;
   }

**output**

   Abc1234
   open
   A basket of melons
   1

but you may need to adjust the regex if the quoted string can be unquoted if
is doesn't contain spaces.

HTH,

Rob

--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional
commands, e-mail: beginners-h...@perl.org http://learn.perl.org/





-- 
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/


Reply via email to