Re: Regarding Regex timeout behavior to minimize CPU consumption

2020-12-06 Thread Dan Stromberg
On Sun, Dec 6, 2020 at 2:37 PM Barry  wrote:

> > On 5 Dec 2020, at 23:44, Peter J. Holzer  wrote:
> >
> > On 2020-12-05 23:42:11 +0100, sjeik_ap...@hotmail.com wrote:
> >>   Timeout: no idea. But check out re.compile and re.iterfind as they
> might
> >>   speed things up.
> >
> > I doubt that compiling regular expressions helps the OP much. Compiled
> > regular expressions are cached, but more importantly, if a match takes
> > long enough that specifying a timeout is useful, the time is almost
> > certainly not spent compiling, but matching - most likely backtracking
> > from lots of promising but ultimately unsuccessful partial matches.
> >
> >> regex = r'data-stid="section-room-list"[\s\S]*?>\s*([\s\S]*?)\s*' \
> >>
> >>
>  
> r'(?:class\s*=\s*"\s*sticky-book-now\s*"|\s*|id\s*=\s*"Location")'
> >> rooms_blocks_to_be_replace = re.findall(regex, html_template)
> >
> > This part:
> >
> >\s*([\s\S]*?)\s*'
> >
> > looks dangerous from a performance point of view. If that can be
> > rewritten with less potential for backtracking, it might help.
> >
> > Generally, it should be possible to implement a timeout for any
> > operation by either scheduling an alarm with signal.alarm or by
> > executing the operation in a separate thread and killing the thread if
> > it takes too long.
>
> I think that python ignores signals until the coeval loop is entered.
> And since the re.match will block that is not going to happen.
>
> Killing threads is not safe and if your OS allows it then you end up with
> the internal state of python messed up.
>
> To implement this I think requires the re code to implement the timeout.
>
> Better is for the OP to fix the re to not back track so much or to work on
> the
> input string in chunks.
>
If the regex is expensive enough to warrant it, you could use a subprocess
- they are killable.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Regarding Regex timeout behavior to minimize CPU consumption

2020-12-06 Thread Barry


> On 5 Dec 2020, at 23:44, Peter J. Holzer  wrote:
> 
> On 2020-12-05 23:42:11 +0100, sjeik_ap...@hotmail.com wrote:
>>   Timeout: no idea. But check out re.compile and re.iterfind as they might
>>   speed things up.
> 
> I doubt that compiling regular expressions helps the OP much. Compiled
> regular expressions are cached, but more importantly, if a match takes
> long enough that specifying a timeout is useful, the time is almost
> certainly not spent compiling, but matching - most likely backtracking
> from lots of promising but ultimately unsuccessful partial matches.
> 
>> regex = r'data-stid="section-room-list"[\s\S]*?>\s*([\s\S]*?)\s*' \
>>
>> 
>> r'(?:class\s*=\s*"\s*sticky-book-now\s*"|\s*|id\s*=\s*"Location")'
>> rooms_blocks_to_be_replace = re.findall(regex, html_template)
> 
> This part:
> 
>\s*([\s\S]*?)\s*'
> 
> looks dangerous from a performance point of view. If that can be
> rewritten with less potential for backtracking, it might help.
> 
> Generally, it should be possible to implement a timeout for any
> operation by either scheduling an alarm with signal.alarm or by
> executing the operation in a separate thread and killing the thread if
> it takes too long.

I think that python ignores signals until the coeval loop is entered.
And since the re.match will block that is not going to happen.

Killing threads is not safe and if your OS allows it then you end up with the 
internal state of python messed up.

To implement this I think requires the re code to implement the timeout.

Better is for the OP to fix the re to not back track so much or to work on the
input string in chunks.

Barry

> 
>hp
> 
> -- 
>   _  | Peter J. Holzer| Story must make more sense than reality.
> |_|_) ||
> | |   | h...@hjp.at |-- Charles Stross, "Creative writing
> __/   | http://www.hjp.at/ |   challenge!"
> -- 
> https://mail.python.org/mailman/listinfo/python-list

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Regarding Regex timeout behavior to minimize CPU consumption

2020-12-05 Thread Peter J. Holzer
On 2020-12-05 23:42:11 +0100, sjeik_ap...@hotmail.com wrote:
>Timeout: no idea. But check out re.compile and re.iterfind as they might
>speed things up.

I doubt that compiling regular expressions helps the OP much. Compiled
regular expressions are cached, but more importantly, if a match takes
long enough that specifying a timeout is useful, the time is almost
certainly not spent compiling, but matching - most likely backtracking
from lots of promising but ultimately unsuccessful partial matches.

>  regex = r'data-stid="section-room-list"[\s\S]*?>\s*([\s\S]*?)\s*' \
>     
>  
> r'(?:class\s*=\s*"\s*sticky-book-now\s*"|\s*|id\s*=\s*"Location")'
>  rooms_blocks_to_be_replace = re.findall(regex, html_template)

This part:

\s*([\s\S]*?)\s*'

looks dangerous from a performance point of view. If that can be
rewritten with less potential for backtracking, it might help.

Generally, it should be possible to implement a timeout for any
operation by either scheduling an alarm with signal.alarm or by
executing the operation in a separate thread and killing the thread if
it takes too long.

hp

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"


signature.asc
Description: PGP signature
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Regarding Regex timeout behavior to minimize CPU consumption

2020-12-05 Thread sjeik_appie
   Hi,
   Timeout: no idea. But check out re.compile and re.iterfind as they might
   speed things up. I often compile a regex once upon import, then use it in
   functions
   On 27 Nov 2020 13:33, Shahique Khan  wrote:

 Hi Team,

 I have noticed if our regex sometimes does not give a result and on that
 time regex took more time in returning response (empty response).

 My Question is can we set a timeout parameter (in seconds/millisecond)
 with
 re.find or anywhere in code to avoid CPU consumption if regex takes more
 time in execution.

 Below is the example, which take more time in execution: (in this case
 can
 we set timeout to kill the execution to avoid CPU consumption)

 regex = r'data-stid="section-room-list"[\s\S]*?>\s*([\s\S]*?)\s*' \
    
 
r'(?:class\s*=\s*"\s*sticky-book-now\s*"|\s*|id\s*=\s*"Location")'
 rooms_blocks_to_be_replace = re.findall(regex, html_template)

 Please help me, I will be very thankful for this.

 Thanks,
 --
 https://mail.python.org/mailman/listinfo/python-list
-- 
https://mail.python.org/mailman/listinfo/python-list


Regarding Regex timeout behavior to minimize CPU consumption

2020-11-27 Thread Shahique Khan
Hi Team,

I have noticed if our regex sometimes does not give a result and on that
time regex took more time in returning response (empty response).

My Question is can we set a timeout parameter (in seconds/millisecond) with
re.find or anywhere in code to avoid CPU consumption if regex takes more
time in execution.

Below is the example, which take more time in execution: (in this case can
we set timeout to kill the execution to avoid CPU consumption)

regex = r'data-stid="section-room-list"[\s\S]*?>\s*([\s\S]*?)\s*' \

r'(?:class\s*=\s*"\s*sticky-book-now\s*"|\s*|id\s*=\s*"Location")'
rooms_blocks_to_be_replace = re.findall(regex, html_template)


Please help me, I will be very thankful for this.

Thanks,
-- 
https://mail.python.org/mailman/listinfo/python-list