New submission from Robert Lujo:

Hello, 

I assume I have hit some bug/misbehaviour in re module. I will provide you 
"working" example:

import re
RE_C_COMMENTS    = re.compile(r"/\*(.|\s)*?\*/", 
re.MULTILINE|re.DOTALL|re.UNICODE)
text = "Special section /* 
valves:\n\n\nsilicone\n\n\n\n\n\n\nHarness:\n\n\nmetal and plastic 
fibre\n\n\n\n\n\n\nInner 
frame:\n\n\nmultibutylene\n\n\n\n\n\n\nWeight:\n\n\n147 
g\n\n\n\n\n\n\n\n\n\n\n\n\n\nSelection guide\n"

and then this command takes forever:
RE_C_COMMENTS.sub(" ", text, re.MULTILINE|re.DOTALL|re.UNICODE)

and the same problem you can notice on first 90 chars, it takes 10s on my 
machine:
RE_C_COMMENTS.sub(" ", text[:90], re.MULTILINE|re.DOTALL|re.UNICODE)

Some clarification: I try to remove the C style comments from text with 
non-greedy regular expression, and in this case start of comment (/*) is found, 
and end of comment (*/) can not be found. Notice the multiline and other re 
options.

Python versions used: 

'2.7.11 (default, Jan 22 2016, 16:30:50) \n[GCC 4.2.1 Compatible Apple LLVM 6.0 
(clang-600.0.57)]' / macOs 10.12.13

and:
'2.7.12 (default, Nov 19 2016, 06:48:10) \n[GCC 5.4.0 20160609]' -> 
Linux 84-Ubuntu SMP Wed Feb 1 17:20:32 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

----------
components: Regular Expressions
messages: 291107
nosy: Robert Lujo, ezio.melotti, mrabarnett
priority: normal
severity: normal
status: open
title: re.sub stalls forever on an unmatched non-greedy case
type: performance
versions: Python 2.7

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue29977>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to