On Tue, Apr 22, 2014 at 4:16 AM, David Cantrell <da...@cantrell.org.uk> wrote: > On Sun, Apr 20, 2014 at 10:14:48PM -0400, Mark Fowler wrote: >> On Sunday, April 20, 2014, David Cantrell <da...@cantrell.org.uk> wrote: >> > Can anyone point me at some code on the CPAN that, given two regexes, >> > can figure out whether there are any bits of text that will be matched >> > by both? >> I'm not sure I understand the question here, or moreover why you want to do >> this..is it just an intellectual exercise? > > I do actually have a use for it, which would help to explain the > question. > > A large part of Number::Phone is based on data in google's > libphonenumber project. That has, for most countries, regular > expressions that match valid fixed lines and valid mobiles. For some > countries those two regexes can both match some of the same numbers. > Here's the data: > http://goo.gl/hTBAhZ > > If you look at the data for Barbados, they have for fixed lines: > 246[2-9]\d{6} > > and for mobiles: > 246(?:(?:2[346]|45|82)\d|25[0-4])\d{4}
I'd go out on a limb and say that the complete list of overlapping situations all share a /^prefix/ like this. (This doesn't necessarily help you since you'd have to exhaustively falsify it but if you're going for the quick win I bet just looking for prefixes gets you most/all of the way.) > then some strings will match both expressions - 2462303333, for example. > But if you look at the data for Jamaica there are no strings that match > both regexes. > > At the moment I detect these overlaps (and then throw the regexes away > as being unfit for my purpose) by just going through each country's > number space. This is practical for NANP countries as I can do it > all with only about a million comparisons in the worst possible case. It > would be impractical to apply this to the whole world though. If your goal is to simply identify overlaps rather than generate encompassing regexes, you could try attacking it with intelligently/heuristically generated random numbers. Paul > > -- > David Cantrell | Bourgeois reactionary pig