On Tue, Oct 18, 2011 at 12:32 AM, Chris Stinemetz
<chrisstinem...@gmail.com>wrote:

> /17|18|19|20|21+:(\d+):(\d+)+\n+\n+CELL\s+(\d+)\s+(.+?),.+?HEH/


Spot the issue:
/
     17    #Or
   | 18    #Or
   | 19    #Or
   | 20    #Or
   | 21+:(\d+):(\d+)+\n+\n+CELL\s+(\d+)\s+(.+?),.+?HEH
/x

For anything but 21, the regex is only two numbers! You need to enclose the
alternatives in () or (?:), depending on whenever you want to capture them
or not.

That aside, please be very mindful that \d and . are both code smells. The
former will match much, much more than just [0-9] -- grab the unichars[0]
program from Unicode::Tussle[1] if you want to see for yourself. Either use
the /a switch (or the more localized form (?a:), bot available in newer
Perls), or [0-9], or \p{PosixDigit}, or (your favorite way here. TIMTOWTDI
applies).

The dot is also problematic. You aren't using the /s switch, so it actually
matches [^\n]. Is that what you want? Are you certain that no one is going
to come and, after reading Perl Best Practices, will try to helpfully but
wrongly add the /smx flags and screw up your regex? If you -really- want to
match anything, use \p{Any}, or \X, and you have to know the difference
between the two, otherwise you are doing it wrong. See [2] and [3], though
you might want to make a cup of tea and sit somewhere comfortable first, as
they aren't easy nor quick reads.
But chances are that you don't want that. Which is actually much simpler! If
you want to match anything-until-the-next-comma, use [^,]+
(And if you really want [^\n], you could use \N, which is not-a-newline, or
even better, \V, which is not-a-vertical-space)

[0] https://www.metacpan.org/module/unichars
[1] https://metacpan.org/release/Unicode-Tussle
[2] http://www.nntp.perl.org/group/perl.perl5.porters/2011/07/msg174287.html
[3] http://www.nntp.perl.org/group/perl.perl5.porters/2011/07/msg174338.html

Reply via email to