On 25/05/2012 21:51, Christopher Gray wrote:
Good day,

I have a text file containing records.  While I can extract single
sub-strings, I cannot extract multiple sub-strings.

The records are of multiple types - only about a third of which have the
data I need.

An example of a "good" record is

  Abc1234 STATUS   open  DESCRIPTION "A basket of melons" :: { fruittype 1}

I'm trying to extract the first (Abc1234), second (open), third (A basket of
melons) and fourth (1) strings.

I can extract each of them separately - but not together.

So - for example:

       while (<FILE>) {
           chomp;
             next if !/\{\s+fruittype\s+(\d+)\s+}/;
             my $Temp =$1;
       }

Extracts the fruittype.  However, when I try and have multiple extracts:

   ...
            next if !/\STATUS\s+(\w+)\s+\{\s+fruittype\s+(\d+)\s+}/;
   ...
It fails.

What have I done wrong?

Hi Chris

Lets look at your regex

  /\STATUS\s+(\w+)\s+\{\s+fruittype\s+(\d+)\s+}/

First of all you start with a \S, which will actually match any non-space character, not the S that you intended. But that wouldn't break your regex.

The regex as a whole is looking for

  'STATUS'
  some whitespace
  some 'word' characters (0..9, A..Z, a..z and _)
  some whitespace
  an open brace '{'
  some whitespace
  'fruittype'
  some whitespace
  some digits (0..9)
  some whitespace
  a closing brace (which should reall be escaped but Perl forgives you)

This will match a string like

  '---XTATUS  www  {  fruittype  999  }'

But since "STATUS open " is followed by "DESCRIPTION" in your record
instead of an opening brace the match fails.

To match multiple fields, you can write a regex that matches the entire
string with parentheses around the parts that must be captured. This
will do what you want

  while (<DATA>) {
my @data = /(\w+)\s+STATUS\s+(\w+)\s+DESCRIPTION\s+"([^"]+)"\s+::\s+\{\s*fruittype\s+(\d+)\s*\}/;
    print "$_\n" for @data;
  }

**output**

  Abc1234
  open
  A basket of melons
  1

but you may need to adjust the regex if the quoted string can be
unquoted if is doesn't contain spaces.

HTH,

Rob

--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/


Reply via email to