Re: Regex to match odd numbers
On Sat, Aug 23, 2014 at 04:43:06PM +0100, Aaron Crane wrote: > Paul Makepeace wrote: > > Does anyone have any concrete examples where the locale affecting > > meaning/matching of \d causes real problems? > > In my experience, it's not necessarily the locale so much as Unicode > characters that the programmer wasn't expecting, which then cause > surprising behaviour. For example, this looks more-or-less sensible: > > say $arg + 17 if $arg =~ /\A\d+\z/ > > but if $arg is a digit other than 0..9, Perl will treat it as 0 and > emit a warning. (Which is particularly problematic if you also have > fatal warnings enabled.) Using [0-9] is much safer, but there are still gotchas. I've been bitten by: my ($value) = $str =~ /([0-9]+)/; if ($value) { # Don't divide by 0 $something /= $value; } My program crashed because it was dividing by zero... /[0-9]+/ matches "00", which is true, but numerically still 0. Abigail
Re: Regex to match odd numbers
Paul Makepeace wrote: > Does anyone have any concrete examples where the locale affecting > meaning/matching of \d causes real problems? In my experience, it's not necessarily the locale so much as Unicode characters that the programmer wasn't expecting, which then cause surprising behaviour. For example, this looks more-or-less sensible: say $arg + 17 if $arg =~ /\A\d+\z/ but if $arg is a digit other than 0..9, Perl will treat it as 0 and emit a warning. (Which is particularly problematic if you also have fatal warnings enabled.) > I'm assuming the worst case is it matches too much, e.g. picks up > spurious Chinese numerals, which seems like a wildly improbable edge > case for most datasets+patterns. "Improbable" sounds reasonable, but bear in mind that people often use regexes containing things like \d and \w for validating input from untrusted sources, so there's scope for significant brokenness there. > Presumably there isn't a situation > where \d _doesn't_ match [0-9] at least? In other words [0-9] is a > subset of \d for all locales. For all *sane* locales, sure. :-) One of the many unpleasant things about locales is that you never really know what you're going to get — and there's no shortage of OSes with broken locale definitions. > $ export LC_CTYPE=zh_CN.utf-8 > $ perl -Mlocale -Mutf8 -le 'print "一" =~ /\d/' # 1 > > Doesn't print 1 - why? I don't know what the expected behaviour is for the zh_CN.utf-8 locale, but that behaviour doesn't surprise me for Unicode: the hanzi numerals don't have the Unicode "numeric" property. More specifically, their general category is Lo ("other letter"), rather than (say) Nd ("decimal digit"): $ perl -MUnicode::UCD=charinfo -E \ > 'say charinfo($_)->{category}, " ", chr =~ /\d/u for 0x4e00, 0x661' Lo Nd 1 (U+0661 is ARABIC-INDIC DIGIT ONE.) > $ export LC_CTYPE=en_US.utf-8 > $ perl -Mlocale -Mutf8 -le 'print "三" =~ /[一-六]/' > 1 > > Why is it still 1? That's because /[一-六]/ matches the set of characters whose codepoints are in the range 0x4E00 through 0x516D (regardless of locale), and 三 is U+4E09 (which is in that range). Adding 'use re "debug"' to your program reveals more information about what's going on there: $ perl -Mlocale -Mutf8 -le 'use re "debug"; print "三" =~ /[一-六]/' Compiling REx "[%x{4e00}-%x{516d}]" Final program: 1: ANYOF{loc}[{unicode}4E00-516D] (12) 12: END (0) stclass ANYOF{loc}[{unicode}4E00-516D] minlen 1 Matching REx "[%x{4e00}-%x{516d}]" against "%x{4e09}" UTF-8 pattern and string... Matching stclass ANYOF{loc}[{unicode}4E00-516D] against "%x{4e09}" (3 bytes) 0 <> <%x{4e09}> | 1:ANYOF{loc}[{unicode}4E00-516D](12) 3 <%x{4e09}> <> | 12:END(0) Match successful! 1 Freeing REx: "[%x{4e00}-%x{516d}]" -- Aaron Crane ** http://aaroncrane.co.uk/
Re: Regex to match odd numbers
$thread->resurrect(); On Tue, May 27, 2014 at 12:37 PM, Mark Fowler wrote: > > On Tuesday, May 27, 2014, Sam Kington wrote: > > > > Sounds like you want something like > > > > / ( ^ 5[.] ( [79] | \d+ [13579] ) ) /x > > > > This is where I mention that \d matches characters other than [0-9] unless > you have the /a flag in effect (thanks Unicode!) Does anyone have any concrete examples where the locale affecting meaning/matching of \d causes real problems? I'm assuming the worst case is it matches too much, e.g. picks up spurious Chinese numerals, which seems like a wildly improbable edge case for most datasets+patterns. Presumably there isn't a situation where \d _doesn't_ match [0-9] at least? In other words [0-9] is a subset of \d for all locales. $ export LC_CTYPE=zh_CN.utf-8 $ perl -Mlocale -Mutf8 -le 'print "一" =~ /\d/' # 1 Doesn't print 1 - why? $ export LC_CTYPE=zh_CN.utf-8 $ perl -Mlocale -Mutf8 -le 'print "三" =~ /[一-六]/' # 3 in 1-6? Yes 1 $ export LC_CTYPE=en_US.utf-8 $ perl -Mlocale -Mutf8 -le 'print "三" =~ /[一-六]/' 1 Why is it still 1? OS X with Perl 5.16.2 Paul
Re: Regex to match odd numbers
On Tue, May 27, 2014 at 03:37:51PM -0400, Mark Fowler wrote: > On Tuesday, May 27, 2014, Sam Kington wrote: > > Sounds like you want something like > > / ( ^ 5[.] ( [79] | \d+ [13579] ) ) /x > This is where I mention that \d matches characters other than [0-9] unless > you have the /a flag in effect (thanks Unicode!) I'm sure that if the p5p cabal want to spoil peoples' day, they have far more entertaining things they can do than to release perl version 5.2Ù¥. -- David Cantrell | A machine for turning tea into grumpiness Irregular English: ladies glow; gentlemen perspire; brutes, oafs and athletes sweat
Re: Regex to match odd numbers
On Tue, May 27, 2014 at 05:08:16PM +0100, Jasper wrote: > Something like the prime regex would work: > ('1' x $foo) =~ /^(11)*1$/ Ahhh, the Right Sort Of Mad :-) -- David Cantrell | Nth greatest programmer in the world Irregular English: you have anecdotes; they have data; I have proof
Re: Regex to match odd numbers
On Tue, May 27, 2014 at 04:42:40PM +0100, Alex Balhatchet wrote: > Odd numbers end in [13579] so... m{5 [.] \d* [13579]}gmsx ... ? Oh yeah, so embarrassingly obvious. The solution is to *remember that version "numbers" aren't numbers, they're text*. -- David Cantrell | London Perl Mongers Deputy Chief Heretic Perl: the only language that makes Welsh look acceptable
Re: Regex to match odd numbers
On Tuesday, May 27, 2014, Sam Kington wrote: > > Sounds like you want something like > > / ( ^ 5[.] ( [79] | \d+ [13579] ) ) /x > This is where I mention that \d matches characters other than [0-9] unless you have the /a flag in effect (thanks Unicode!) Mark
Re: Regex to match odd numbers
On 27/05/14 17:21, Abigail wrote: On Tue, May 27, 2014 at 04:54:47PM +0100, Dirk Koopman wrote: On 27/05/14 16:22, David Cantrell wrote: As part of the nasty mess that is CPANdeps, I have this line of code: $record->{is_dev_perl} = ( $record->{perl} =~ /(^5\.(7|9|11|13|15|17|19|21)|rc|patch)/i ) ? 1 : 0; I'd like to not have to remember to add 23 to the list in a year or so's time. Can anyone think of a nice way of matching any odd number from 7 upwards? Obviously it's easy to do in a coupla lines of perl code instead of a regex, so I'm asking more out of curiosity than because I actually need it. $bool = ($record->{perl} > 7) & 1; # for example? $ perl -wE 'say +(8 > 7) & 1' 1 $ It would be very odd to consider 8 to be odd. Indeed, hence my follow up. But this works: perl -e 'for (0..30) {print ((($_ > 7) && ($_ & 1)) ? "$_ = 1\n" : "$_ = 0\n")}'; Dirk
Re: Regex to match odd numbers
And obviously I answered before letting your email sink in... "odd number from 7 upwards"! But still, my solution should just about work if you change \d* to \d+ and do the 9 manually :-) - Alex On 27 May 2014 16:42, Alex Balhatchet wrote: > Odd numbers end in [13579] so... m{5 [.] \d* [13579]}gmsx ... ? > > - Alex
Re: Regex to match odd numbers
Odd numbers end in [13579] so... m{5 [.] \d* [13579]}gmsx ... ? - Alex
Re: Regex to match odd numbers
On Tue, May 27, 2014 at 04:54:47PM +0100, Dirk Koopman wrote: > On 27/05/14 16:22, David Cantrell wrote: >> As part of the nasty mess that is CPANdeps, I have this line of code: >> >> $record->{is_dev_perl} = ( >>$record->{perl} =~ /(^5\.(7|9|11|13|15|17|19|21)|rc|patch)/i >> ) ? 1 : 0; >> >> I'd like to not have to remember to add 23 to the list in a year or so's >> time. Can anyone think of a nice way of matching any odd number from 7 >> upwards? Obviously it's easy to do in a coupla lines of perl code >> instead of a regex, so I'm asking more out of curiosity than because I >> actually need it. >> > > $bool = ($record->{perl} > 7) & 1; # for example? > $ perl -wE 'say +(8 > 7) & 1' 1 $ It would be very odd to consider 8 to be odd. Abigail
Re: Regex to match odd numbers
I'm furiously trying to think of an even stupider way of doing this... On 27 May 2014 17:09, Jasper wrote: > From 7 upwards is > > ('1' x $foo) =~ /^(11){3,}1$/ > > or something, I'm not even bothering to test this :S > > > On 27 May 2014 17:08, Jasper wrote: > >> Something like the prime regex would work: >> >> ('1' x $foo) =~ /^(11)*1$/ >> >> >> On 27 May 2014 16:43, Joel Bernstein wrote: >> >>> Surely you only need to examine the right-most digit to know if the >>> number >>> is odd? Your special requirement to (AIUI) consider 0..7 as even isn't >>> difficult to add. >>> >>> /joel >>> >>> >>> On 27 May 2014 16:22, David Cantrell wrote: >>> >>> > As part of the nasty mess that is CPANdeps, I have this line of code: >>> > >>> > $record->{is_dev_perl} = ( >>> > $record->{perl} =~ /(^5\.(7|9|11|13|15|17|19|21)|rc|patch)/i >>> > ) ? 1 : 0; >>> > >>> > I'd like to not have to remember to add 23 to the list in a year or >>> so's >>> > time. Can anyone think of a nice way of matching any odd number from 7 >>> > upwards? Obviously it's easy to do in a coupla lines of perl code >>> > instead of a regex, so I'm asking more out of curiosity than because I >>> > actually need it. >>> > >>> > -- >>> > David Cantrell | Official London Perl Mongers Bad Influence >>> > >>> > Today's previously unreported paraphilia is tomorrow's Internet >>> sensation >>> > >>> > >>> >> >> >> >> -- >> Jasper >> > > > > -- > Jasper > -- Jasper
Re: Regex to match odd numbers
>From 7 upwards is ('1' x $foo) =~ /^(11){3,}1$/ or something, I'm not even bothering to test this :S On 27 May 2014 17:08, Jasper wrote: > Something like the prime regex would work: > > ('1' x $foo) =~ /^(11)*1$/ > > > On 27 May 2014 16:43, Joel Bernstein wrote: > >> Surely you only need to examine the right-most digit to know if the number >> is odd? Your special requirement to (AIUI) consider 0..7 as even isn't >> difficult to add. >> >> /joel >> >> >> On 27 May 2014 16:22, David Cantrell wrote: >> >> > As part of the nasty mess that is CPANdeps, I have this line of code: >> > >> > $record->{is_dev_perl} = ( >> > $record->{perl} =~ /(^5\.(7|9|11|13|15|17|19|21)|rc|patch)/i >> > ) ? 1 : 0; >> > >> > I'd like to not have to remember to add 23 to the list in a year or so's >> > time. Can anyone think of a nice way of matching any odd number from 7 >> > upwards? Obviously it's easy to do in a coupla lines of perl code >> > instead of a regex, so I'm asking more out of curiosity than because I >> > actually need it. >> > >> > -- >> > David Cantrell | Official London Perl Mongers Bad Influence >> > >> > Today's previously unreported paraphilia is tomorrow's Internet >> sensation >> > >> > >> > > > > -- > Jasper > -- Jasper
Re: Regex to match odd numbers
Something like the prime regex would work: ('1' x $foo) =~ /^(11)*1$/ On 27 May 2014 16:43, Joel Bernstein wrote: > Surely you only need to examine the right-most digit to know if the number > is odd? Your special requirement to (AIUI) consider 0..7 as even isn't > difficult to add. > > /joel > > > On 27 May 2014 16:22, David Cantrell wrote: > > > As part of the nasty mess that is CPANdeps, I have this line of code: > > > > $record->{is_dev_perl} = ( > > $record->{perl} =~ /(^5\.(7|9|11|13|15|17|19|21)|rc|patch)/i > > ) ? 1 : 0; > > > > I'd like to not have to remember to add 23 to the list in a year or so's > > time. Can anyone think of a nice way of matching any odd number from 7 > > upwards? Obviously it's easy to do in a coupla lines of perl code > > instead of a regex, so I'm asking more out of curiosity than because I > > actually need it. > > > > -- > > David Cantrell | Official London Perl Mongers Bad Influence > > > > Today's previously unreported paraphilia is tomorrow's Internet sensation > > > > > -- Jasper
Re: Regex to match odd numbers
> As part of the nasty mess that is CPANdeps, I have this line of code: > > $record->{is_dev_perl} = ( > $record->{perl} =~ /(^5\.(7|9|11|13|15|17|19|21)|rc|patch)/i > ) ? 1 : 0; > > I'd like to not have to remember to add 23 to the list in a year or so's > time. Can anyone think of a nice way of matching any odd number from 7 > upwards? Obviously it's easy to do in a coupla lines of perl code > instead of a regex, so I'm asking more out of curiosity than because I > actually need it. Sounds like you want something like / ( ^ 5[.] ( [79] | \d+ [13579] ) ) /x Sam -- Website: http://www.illuminated.co.uk/
Re: Regex to match odd numbers
On Tue, May 27, 2014 at 04:22:07PM +0100, David Cantrell wrote: > As part of the nasty mess that is CPANdeps, I have this line of code: > > $record->{is_dev_perl} = ( > $record->{perl} =~ /(^5\.(7|9|11|13|15|17|19|21)|rc|patch)/i > ) ? 1 : 0; > > I'd like to not have to remember to add 23 to the list in a year or so's > time. Can anyone think of a nice way of matching any odd number from 7 > upwards? Obviously it's easy to do in a coupla lines of perl code > instead of a regex, so I'm asking more out of curiosity than because I > actually need it. You mean, 7, 9, or any number that requires at least two digits where the last one is odd? I'd write that as: /^0*(?:7|9|[1-9][0-9]*[13579])$/ allowing for leading 0s. Abigail
Re: Regex to match odd numbers
On 27/05/14 16:54, Dirk Koopman wrote: $bool = ($record->{perl} > 7) & 1; # for example? Clearly, I getting too old for this. Or my prevarication level too high...
Re: Regex to match odd numbers
Apologies, didn't see the 7 upward (and missed the 5\.) . This can almost certainly look nicer or smaller with a negative assertion or something. However this should do the job. $record->{perl} =~ /(^5\.(7|9|[1-9][0-9]*[13579])|rc|patch)/i On 27 May 2014 16:53, Gareth Harper wrote: > $record->{perl} =~ /(^[0-9]*[13579]|rc|patch))/i > > ? > > > On 27 May 2014 16:22, David Cantrell wrote: > >> As part of the nasty mess that is CPANdeps, I have this line of code: >> >> $record->{is_dev_perl} = ( >> $record->{perl} =~ /(^5\.(7|9|11|13|15|17|19|21)|rc|patch)/i >> ) ? 1 : 0; >> >> I'd like to not have to remember to add 23 to the list in a year or so's >> time. Can anyone think of a nice way of matching any odd number from 7 >> upwards? Obviously it's easy to do in a coupla lines of perl code >> instead of a regex, so I'm asking more out of curiosity than because I >> actually need it. >> >> -- >> David Cantrell | Official London Perl Mongers Bad Influence >> >> Today's previously unreported paraphilia is tomorrow's Internet sensation >> > >
Re: Regex to match odd numbers
On 27 May 2014, at 16:22, David Cantrell wrote: > As part of the nasty mess that is CPANdeps, I have this line of code: > > $record->{is_dev_perl} = ( > $record->{perl} =~ /(^5\.(7|9|11|13|15|17|19|21)|rc|patch)/i > ) ? 1 : 0; > > I'd like to not have to remember to add 23 to the list in a year or so's > time. Can anyone think of a nice way of matching any odd number from 7 > upwards? I may be sufficiently uncaffinated this late on a logical Monday to be missing something, but how about: $record->{perl} =~ /(^5\.(7|9|(\d+[13579])|rc|patch)/i which should be good for a while
Re: Regex to match odd numbers
On 27/05/14 16:22, David Cantrell wrote: As part of the nasty mess that is CPANdeps, I have this line of code: $record->{is_dev_perl} = ( $record->{perl} =~ /(^5\.(7|9|11|13|15|17|19|21)|rc|patch)/i ) ? 1 : 0; I'd like to not have to remember to add 23 to the list in a year or so's time. Can anyone think of a nice way of matching any odd number from 7 upwards? Obviously it's easy to do in a coupla lines of perl code instead of a regex, so I'm asking more out of curiosity than because I actually need it. $bool = ($record->{perl} > 7) & 1; # for example?
Re: Regex to match odd numbers
$record->{perl} =~ /(^[0-9]*[13579]|rc|patch))/i ? On 27 May 2014 16:22, David Cantrell wrote: > As part of the nasty mess that is CPANdeps, I have this line of code: > > $record->{is_dev_perl} = ( > $record->{perl} =~ /(^5\.(7|9|11|13|15|17|19|21)|rc|patch)/i > ) ? 1 : 0; > > I'd like to not have to remember to add 23 to the list in a year or so's > time. Can anyone think of a nice way of matching any odd number from 7 > upwards? Obviously it's easy to do in a coupla lines of perl code > instead of a regex, so I'm asking more out of curiosity than because I > actually need it. > > -- > David Cantrell | Official London Perl Mongers Bad Influence > > Today's previously unreported paraphilia is tomorrow's Internet sensation >
Re: Regex to match odd numbers
On Tue, May 27, 2014 at 04:22:07PM +0100, David Cantrell wrote: > As part of the nasty mess that is CPANdeps, I have this line of code: > > $record->{is_dev_perl} = ( > $record->{perl} =~ /(^5\.(7|9|11|13|15|17|19|21)|rc|patch)/i > ) ? 1 : 0; > > I'd like to not have to remember to add 23 to the list in a year or so's > time. Can anyone think of a nice way of matching any odd number from 7 > upwards? Obviously it's easy to do in a coupla lines of perl code > instead of a regex, so I'm asking more out of curiosity than because I > actually need it. You mean, 7, 9, or any number using more than two digits, ending in an odd one? That doesn't seem to be hard to write. /^0*(?:7|9|[1-9][0-9]*[13579])$/ Abigail
Re: Regex to match odd numbers
Surely you only need to examine the right-most digit to know if the number is odd? Your special requirement to (AIUI) consider 0..7 as even isn't difficult to add. /joel On 27 May 2014 16:22, David Cantrell wrote: > As part of the nasty mess that is CPANdeps, I have this line of code: > > $record->{is_dev_perl} = ( > $record->{perl} =~ /(^5\.(7|9|11|13|15|17|19|21)|rc|patch)/i > ) ? 1 : 0; > > I'd like to not have to remember to add 23 to the list in a year or so's > time. Can anyone think of a nice way of matching any odd number from 7 > upwards? Obviously it's easy to do in a coupla lines of perl code > instead of a regex, so I'm asking more out of curiosity than because I > actually need it. > > -- > David Cantrell | Official London Perl Mongers Bad Influence > > Today's previously unreported paraphilia is tomorrow's Internet sensation > >
Regex to match odd numbers
As part of the nasty mess that is CPANdeps, I have this line of code: $record->{is_dev_perl} = ( $record->{perl} =~ /(^5\.(7|9|11|13|15|17|19|21)|rc|patch)/i ) ? 1 : 0; I'd like to not have to remember to add 23 to the list in a year or so's time. Can anyone think of a nice way of matching any odd number from 7 upwards? Obviously it's easy to do in a coupla lines of perl code instead of a regex, so I'm asking more out of curiosity than because I actually need it. -- David Cantrell | Official London Perl Mongers Bad Influence Today's previously unreported paraphilia is tomorrow's Internet sensation