Brano Gerzo schreef: > Dr.Ruud [DR], on Thursday, July 13, 2006 at 21:05 (+0200) made these > points: > >> I don't understand what you try to match with "[\w\s\+:]+". It >> matches any series of characters that belong to the character class >> containing [[:word:]], [[:space:]], a plus and a colon. So "a b :c" >> would match. > > yes, my example was ambiguous sorry, for that. Here are more examples: > > word > word word > word word word > 1 word > 1 word word word > 1 word en,pt,sk > 1 word en 1cd > > so: > - first digits are optional > - then it is followed by word(s), which are mandatory > - then it should be 1 language (en), or set of any number of > languages (en,sk,pt) > - digit(cd) is optional > > Thats all > > Thank you for your nice code!
Slight revision, that fails on the last line: #!/usr/bin/perl use warnings ; use strict ; sub sp { '[[:blank:]]+' } sub capture { "(@_)" } sub optional { "(?:@_)?" } sub optimany { "(?:@_)*" } sub REnumber { '\d+' } sub REword { '\w+' } sub RElang { ' (?: a[ly]|b[gs]|cs|d[ae]|e[nst]| f[ir]|gr|h[eruy]|it|ja|kk|lv|nl| p[blt]|r[ou]|s[klqrv]|t[hr]|uk|zh) ' } sub REwordlist { REword . optimany( sp . REword ) . '(?='.sp.'|$)' } sub RElanglist { RElang . optimany( ',' . RElang ) } my $re = optional(capture(REnumber).sp) . capture(REwordlist) . optional(sp.capture(RElanglist)) . optional(sp.capture(REnumber).'cd') ; print "re/$re/\n\n\n" ; my $qr = qr/ $re /x ; while ( <DATA> ) { no warnings ; print "\n" ; print ; /$qr/ and print "($1) ($2) ($3) ($4)\n" ; } __DATA__ word word word word word word 1 word 1 word word word 1 word en,pt,sk 1 word en 1cd ################################ That last line "1 word en lcd" can be parsed differently if (for example) "word" can't start with a digit, etc. -- Affijn, Ruud "Gewoon is een tijger." -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] <http://learn.perl.org/> <http://learn.perl.org/first-response>