[issue12325] regex matches incorrectly on literal dot (99.9% confirmed)

2011-06-13 Thread Cal Leeming

New submission from Cal Leeming cal.leem...@simplicitymedialtd.co.uk:

I believe I might have found a bug in the Python re libraries. Here is a 
complete debug of what is happening (my apologies for the nature of the actual 
text). I have ran this regex through RegexBuddy (and a few other tools), and 
all of them do the correct action (which is to not do any replacement), apart 
from Python. I haven't yet tried this in another language.

 ORIGINAL TEXT 
313229176 
me and a buddy and his girlfriend were watching tv once and this blabbering 
idiot starts talking about this scientific study she heard about where they 
built a fake city and only one guy didn't know that it was a fake. we all 
paused for a second and i said the truman show? and she says yeah! that was 
the name of it! me my buddy and his girlfriend all catch eyes and are baffled 
at how stupid she was


 TEXT AFTER REGEX SUB 

me and a buddy and his girlfriend were http://watching.tv once and this 
blabbering idiot starts talking about this scientific study she heard about 
where they built a fake city and only one guy didn't know that it was a fake.we 
all paused for a second and i said the truman show? and she says yeah! that 
was the name of it! me my buddy and his girlfriend all catch eyes and are 
baffled at how stupid she was
---

--- REPLACED TEXT ---
 watching tv 
 http://watching.tv 
---


 REGEX 
_t = re.compile(r(^| 
)((?:[\w\-]{2,}?\.|)(?:[\w\-]{2,}?)(?:\.com|\.net|\.org|\.co\.uk|\.tv|\.ly)), 
flags = re.IGNORECASE | re.MULTILINE | re.DEBUG)

 COMMAND 
_t.sub(\\1http://\\2;, original_message_here)


 REGEX DEBUG 

subpattern 1
  branch
at at_beginning
  or
literal 32
subpattern 2
  subpattern None
branch
  min_repeat 2 65535
in
  category category_word
  literal 45
  literal 46
or
  subpattern None
min_repeat 2 65535
  in
category category_word
literal 45
  subpattern None
literal 46
branch
  literal 99
  literal 111
  literal 109
or
  literal 110
  literal 101
  literal 116
or
  literal 111
  literal 114
  literal 103
or
  literal 99
  literal 111
  literal 46
  literal 117
  literal 107
or
  literal 116
  literal 118
or
  literal 108
  literal 121

--
components: Regular Expressions
messages: 138234
nosy: Cal.Leeming
priority: normal
severity: normal
status: open
title: regex matches incorrectly on literal dot (99.9% confirmed)
versions: Python 2.7

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12325
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12325] regex matches incorrectly on literal dot (99.9% confirmed)

2011-06-13 Thread Cal Leeming

Changes by Cal Leeming cal.leem...@simplicitymedialtd.co.uk:


--
type:  - behavior

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12325
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12325] regex matches incorrectly on literal dot (99.9% confirmed)

2011-06-13 Thread Cal Leeming

Cal Leeming cal.leem...@simplicitymedialtd.co.uk added the comment:

Take particular notice to the following:

\.co\.uk

or
  literal 99
  literal 111
  literal 46
  literal 117
  literal 107


 map(lambda x: chr(x), [99,111,46,117,107])
['c', 'o', '.', 'u', 'k']

It would appear it is ignoring the first \. 

But why??

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12325
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12325] regex matches incorrectly on literal dot (99.9% confirmed)

2011-06-13 Thread Robert Lehmann

Robert Lehmann lehman...@gmail.com added the comment:

I can not reproduce either of your findings.  Could you provide us with your 
version information?  re version 2.2.1, _sre 2.2.2, Python 2.6.6, Debian sid 
here.  Also tested with Python 2.7.2rc1 (same RE).

 import re
 re.compile(r\.co\.uk, re.DEBUG)
literal 46
literal 99
literal 111
literal 46
literal 117
literal 107
_sre.SRE_Pattern object at 0xb73b0860
 re.compile(r(^| 
 )((?:[\w\-]{2,}?\.|)(?:[\w\-]{2,}?)(?:\.com|\.net|\.org|\.co\.uk|\.tv|\.ly)),
  flags = re.IGNORECASE | re.MULTILINE | re.DEBUG).sub(\\1http://\\2;, 
 me and a buddy and his girlfriend were watching tv once and this 
 blabbering idiot starts talking about this scientific study she heard about 
 where they built a fake city and only one guy didn't know that it was a 
 fake. we all paused for a second and i said the truman show? and she says 
 yeah! that was the name of it! me my buddy and his girlfriend all catch 
 eyes and are baffled at how stupid she was)
subpattern 1
...
'me and a buddy and his girlfriend were watching tv once...'

--
nosy: +lehmannro

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12325
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12325] regex matches incorrectly on literal dot (99.9% confirmed)

2011-06-13 Thread Cal Leeming

Cal Leeming cal.leem...@simplicitymedialtd.co.uk added the comment:

Oh jeez, you're going to think I'm such an idiot. I just ran a completely fresh 
test in the cli (away from the original source), and the issue disappeared (it 
was caused by caching - apparently).

I'm really sorry to have bothered you guys, I should have thought and tested 
this outside the original code first. I'll make sure to do this before posting 
any bugs in the future.

Thank you for your extremely fast response though!

Cal

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12325
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12325] regex matches incorrectly on literal dot (99.9% confirmed)

2011-06-13 Thread R. David Murray

Changes by R. David Murray rdmur...@bitdance.com:


--
resolution:  - invalid
stage:  - committed/rejected
status: open - closed

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12325
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com