[issue40496] re.findall() takes a long time (100% cup usage) on Python 3.6.10

2020-05-05 Thread Sergio Rael

Sergio Rael  added the comment:

Thank you for your reply RĂ©mi.

I agree with you that the reason can be that the pattern is too complex. I just 
noticed that in Python 3.7 using the same pattern finish the searchall almost 
instantaneously, but in 3.6 the CPU goes to 100% and it takes ages to finish. 
In fact I don't know if this can finish at all because it takes so long that I 
had to stop it.
I tough it would be a good idea to let you know this behaviour. Of course, 
after this, I don't use 3.6 anymore.

Thanks again!

--

___
Python tracker 
<https://bugs.python.org/issue40496>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue40496] re.findall() takes a long time (100% cup usage) on Python 3.6.10

2020-05-04 Thread Sergio Rael


Sergio Rael  added the comment:

Sorry, this is not a deadlock. Python puts the CPU to 100% of usage, but it 
takes so long that a I didn't know if it can finish the task.

--
title: re.findall() deadlock on Python 3.6.10 -> re.findall() takes a long time 
(100% cup usage) on Python 3.6.10

___
Python tracker 
<https://bugs.python.org/issue40496>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue40496] re.findall() deadlock on Python 3.6.10

2020-05-04 Thread Sergio Rael


New submission from Sergio Rael :

I have found a deadlock using Python 3.6.10 that seems to have been solved on 
3.7.x. probably related to capture groups. To reproduce the deadlock just do 
something like this:

re.findall(
'\[et_pb_image(?:\w|=|"|\d|\.| 
|_|\/)*src="(https?:\/\/(?:www\.)?\w*\.\w*(?:\/|\w|\d|\.|-)*\.(?:png|jpg|jpeg|gif))"(?:\w|=|"|\d|\.|
 |_|\/|%|\|)*(?:\/?\])(?:\[\/et_pb_image\])?',
'[et_pb_image _builder_version="3.27.2" 
src="https://www.somewhere.com/wp-content/uploads/2019/08/stabilizers.jpg; 
box_shadow_horizontal_tablet="0px" box_shadow_vertical_tablet="0px" 
box_shadow_blur_tablet="40px" box_shadow_spread_tablet="0px" 
z_index_tablet="500" url="https://youtu.be/fTrC5gkyYBM; url_new_window="on" /]',
)

I noticed that the problem is related to having two image urls on the content. 
The regex says to look only for the one starting with "src=" so the one 
starting with "url=" should be ignored. If "url=\"XXX\"" is removed from the 
tag it works fine.

--
components: Regular Expressions
messages: 368026
nosy: ezio.melotti, mrabarnett, srael
priority: normal
severity: normal
status: open
title: re.findall() deadlock on Python 3.6.10
type: behavior
versions: Python 3.6

___
Python tracker 
<https://bugs.python.org/issue40496>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com