Brano Gerzo schreef:
> Dr.Ruud [DR], on Thursday, July 13, 2006 at 21:05 (+0200) made these
> points:
>
>> I don't understand what you try to match with "[\w\s\+:]+". It
>> matches any series of characters that belong to the character class
>> containing [[:word:]], [[:space:]], a plus and a colon. So "a b :c"
>> would match.
>
> yes, my example was ambiguous sorry, for that. Here are more examples:
>
> word
> word word
> word word word
> 1 word
> 1 word word word
> 1 word en,pt,sk
> 1 word en 1cd
>
> so:
> - first digits are optional
> - then it is followed by word(s), which are mandatory
> - then it should be 1 language (en), or set of any number of
> languages (en,sk,pt)
> - digit(cd) is optional
>
> Thats all
>
> Thank you for your nice code!

Slight revision, that fails on the last line:

#!/usr/bin/perl
  use warnings ;
  use strict ;

  sub sp       { '[[:blank:]]+' }
  sub capture  { "(@_)" }
  sub optional { "(?:@_)?" }
  sub optimany { "(?:@_)*" }

  sub REnumber { '\d+' }
  sub REword   { '\w+' }
  sub RElang   { '
(?:
a[ly]|b[gs]|cs|d[ae]|e[nst]|
f[ir]|gr|h[eruy]|it|ja|kk|lv|nl|
p[blt]|r[ou]|s[klqrv]|t[hr]|uk|zh)
' }

  sub REwordlist { REword . optimany( sp . REword ) . '(?='.sp.'|$)' }
  sub RElanglist { RElang . optimany( ',' . RElang ) }

  my $re = optional(capture(REnumber).sp)
         . capture(REwordlist)
         . optional(sp.capture(RElanglist))
         . optional(sp.capture(REnumber).'cd') ;

  print "re/$re/\n\n\n" ;

  my $qr = qr/ $re /x ;

  while ( <DATA> )
  {
    no warnings ;
    print "\n" ;
    print ;
    /$qr/ and print "($1) ($2) ($3) ($4)\n" ;
  }

__DATA__
word
word word
word word word
1 word
1 word word word
1 word en,pt,sk
1 word en 1cd
################################

That last line "1 word en lcd" can be parsed differently if (for
example) "word" can't start with a digit, etc.

-- 
Affijn, Ruud

"Gewoon is een tijger."



-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
<http://learn.perl.org/> <http://learn.perl.org/first-response>


Reply via email to