[issue12325] regex matches incorrectly on literal dot (99.9% confirmed)
New submission from Cal Leeming cal.leem...@simplicitymedialtd.co.uk: I believe I might have found a bug in the Python re libraries. Here is a complete debug of what is happening (my apologies for the nature of the actual text). I have ran this regex through RegexBuddy (and a few other tools), and all of them do the correct action (which is to not do any replacement), apart from Python. I haven't yet tried this in another language. ORIGINAL TEXT 313229176 me and a buddy and his girlfriend were watching tv once and this blabbering idiot starts talking about this scientific study she heard about where they built a fake city and only one guy didn't know that it was a fake. we all paused for a second and i said the truman show? and she says yeah! that was the name of it! me my buddy and his girlfriend all catch eyes and are baffled at how stupid she was TEXT AFTER REGEX SUB me and a buddy and his girlfriend were http://watching.tv once and this blabbering idiot starts talking about this scientific study she heard about where they built a fake city and only one guy didn't know that it was a fake.we all paused for a second and i said the truman show? and she says yeah! that was the name of it! me my buddy and his girlfriend all catch eyes and are baffled at how stupid she was --- --- REPLACED TEXT --- watching tv http://watching.tv --- REGEX _t = re.compile(r(^| )((?:[\w\-]{2,}?\.|)(?:[\w\-]{2,}?)(?:\.com|\.net|\.org|\.co\.uk|\.tv|\.ly)), flags = re.IGNORECASE | re.MULTILINE | re.DEBUG) COMMAND _t.sub(\\1http://\\2;, original_message_here) REGEX DEBUG subpattern 1 branch at at_beginning or literal 32 subpattern 2 subpattern None branch min_repeat 2 65535 in category category_word literal 45 literal 46 or subpattern None min_repeat 2 65535 in category category_word literal 45 subpattern None literal 46 branch literal 99 literal 111 literal 109 or literal 110 literal 101 literal 116 or literal 111 literal 114 literal 103 or literal 99 literal 111 literal 46 literal 117 literal 107 or literal 116 literal 118 or literal 108 literal 121 -- components: Regular Expressions messages: 138234 nosy: Cal.Leeming priority: normal severity: normal status: open title: regex matches incorrectly on literal dot (99.9% confirmed) versions: Python 2.7 ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue12325 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12325] regex matches incorrectly on literal dot (99.9% confirmed)
Changes by Cal Leeming cal.leem...@simplicitymedialtd.co.uk: -- type: - behavior ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue12325 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12325] regex matches incorrectly on literal dot (99.9% confirmed)
Cal Leeming cal.leem...@simplicitymedialtd.co.uk added the comment: Take particular notice to the following: \.co\.uk or literal 99 literal 111 literal 46 literal 117 literal 107 map(lambda x: chr(x), [99,111,46,117,107]) ['c', 'o', '.', 'u', 'k'] It would appear it is ignoring the first \. But why?? -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue12325 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12325] regex matches incorrectly on literal dot (99.9% confirmed)
Robert Lehmann lehman...@gmail.com added the comment: I can not reproduce either of your findings. Could you provide us with your version information? re version 2.2.1, _sre 2.2.2, Python 2.6.6, Debian sid here. Also tested with Python 2.7.2rc1 (same RE). import re re.compile(r\.co\.uk, re.DEBUG) literal 46 literal 99 literal 111 literal 46 literal 117 literal 107 _sre.SRE_Pattern object at 0xb73b0860 re.compile(r(^| )((?:[\w\-]{2,}?\.|)(?:[\w\-]{2,}?)(?:\.com|\.net|\.org|\.co\.uk|\.tv|\.ly)), flags = re.IGNORECASE | re.MULTILINE | re.DEBUG).sub(\\1http://\\2;, me and a buddy and his girlfriend were watching tv once and this blabbering idiot starts talking about this scientific study she heard about where they built a fake city and only one guy didn't know that it was a fake. we all paused for a second and i said the truman show? and she says yeah! that was the name of it! me my buddy and his girlfriend all catch eyes and are baffled at how stupid she was) subpattern 1 ... 'me and a buddy and his girlfriend were watching tv once...' -- nosy: +lehmannro ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue12325 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12325] regex matches incorrectly on literal dot (99.9% confirmed)
Cal Leeming cal.leem...@simplicitymedialtd.co.uk added the comment: Oh jeez, you're going to think I'm such an idiot. I just ran a completely fresh test in the cli (away from the original source), and the issue disappeared (it was caused by caching - apparently). I'm really sorry to have bothered you guys, I should have thought and tested this outside the original code first. I'll make sure to do this before posting any bugs in the future. Thank you for your extremely fast response though! Cal -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue12325 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12325] regex matches incorrectly on literal dot (99.9% confirmed)
Changes by R. David Murray rdmur...@bitdance.com: -- resolution: - invalid stage: - committed/rejected status: open - closed ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue12325 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com