[issue40496] re.findall() takes a long time (100% cup usage) on Python 3.6.10
Sergio Rael added the comment: Thank you for your reply RĂ©mi. I agree with you that the reason can be that the pattern is too complex. I just noticed that in Python 3.7 using the same pattern finish the searchall almost instantaneously, but in 3.6 the CPU goes to 100% and it takes ages to finish. In fact I don't know if this can finish at all because it takes so long that I had to stop it. I tough it would be a good idea to let you know this behaviour. Of course, after this, I don't use 3.6 anymore. Thanks again! -- ___ Python tracker <https://bugs.python.org/issue40496> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue40496] re.findall() takes a long time (100% cup usage) on Python 3.6.10
Sergio Rael added the comment: Sorry, this is not a deadlock. Python puts the CPU to 100% of usage, but it takes so long that a I didn't know if it can finish the task. -- title: re.findall() deadlock on Python 3.6.10 -> re.findall() takes a long time (100% cup usage) on Python 3.6.10 ___ Python tracker <https://bugs.python.org/issue40496> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue40496] re.findall() deadlock on Python 3.6.10
New submission from Sergio Rael : I have found a deadlock using Python 3.6.10 that seems to have been solved on 3.7.x. probably related to capture groups. To reproduce the deadlock just do something like this: re.findall( '\[et_pb_image(?:\w|=|"|\d|\.| |_|\/)*src="(https?:\/\/(?:www\.)?\w*\.\w*(?:\/|\w|\d|\.|-)*\.(?:png|jpg|jpeg|gif))"(?:\w|=|"|\d|\.| |_|\/|%|\|)*(?:\/?\])(?:\[\/et_pb_image\])?', '[et_pb_image _builder_version="3.27.2" src="https://www.somewhere.com/wp-content/uploads/2019/08/stabilizers.jpg; box_shadow_horizontal_tablet="0px" box_shadow_vertical_tablet="0px" box_shadow_blur_tablet="40px" box_shadow_spread_tablet="0px" z_index_tablet="500" url="https://youtu.be/fTrC5gkyYBM; url_new_window="on" /]', ) I noticed that the problem is related to having two image urls on the content. The regex says to look only for the one starting with "src=" so the one starting with "url=" should be ignored. If "url=\"XXX\"" is removed from the tag it works fine. -- components: Regular Expressions messages: 368026 nosy: ezio.melotti, mrabarnett, srael priority: normal severity: normal status: open title: re.findall() deadlock on Python 3.6.10 type: behavior versions: Python 3.6 ___ Python tracker <https://bugs.python.org/issue40496> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com