[issue23689] Memory leak in Modules/sre_lib.h

2022-04-03 Thread Serhiy Storchaka


Serhiy Storchaka  added the comment:

Thank you Ma Lin for all your work.

The fix changes interfaces of some internal functions which can be used in 
third-party code, and the bug occurs only in special circumstances, so it is 
not practical to backport it.

--
resolution:  -> fixed
stage: patch review -> resolved
status: open -> closed
versions: +Python 3.11 -Python 3.8

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23689] Memory leak in Modules/sre_lib.h

2022-04-03 Thread Serhiy Storchaka


Serhiy Storchaka  added the comment:


New changeset 6e3eee5c11b539e9aab39cff783acf57838c355a by Ma Lin in branch 
'main':
bpo-23689: re module, fix memory leak when a match is terminated by a signal or 
memory allocation failure (GH-32283)
https://github.com/python/cpython/commit/6e3eee5c11b539e9aab39cff783acf57838c355a


--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23689] Memory leak in Modules/sre_lib.h

2022-04-03 Thread Ma Lin


Change by Ma Lin :


--
pull_requests: +30344
pull_request: https://github.com/python/cpython/pull/32283

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23689] Memory leak in Modules/sre_lib.h

2022-03-31 Thread Ma Lin


Change by Ma Lin :


--
pull_requests: +30298
pull_request: https://github.com/python/cpython/pull/32223

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23689] Memory leak in Modules/sre_lib.h

2022-03-30 Thread Ma Lin


Change by Ma Lin :


--
pull_requests: +30265
pull_request: https://github.com/python/cpython/pull/32188

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23689] Memory leak in Modules/sre_lib.h

2022-03-29 Thread Serhiy Storchaka


Serhiy Storchaka  added the comment:

This looks promising. Please, go ahead! You are free to add any fields to any 
opcodes. It may break some third-party code which generates compiled patterns 
from a sequence of opcodes, it the stability of this interface was not 
promised. And they will be broken in any case due to reorganizing of internal 
code (issue47152).

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23689] Memory leak in Modules/sre_lib.h

2022-03-29 Thread Ma Lin


Ma Lin  added the comment:

My PR methods are suboptimal, so I closed them.

The number of REPEAT can be counted when compiling a pattern, and allocate a 
`SRE_REPEAT` array in `SRE_STATE` (with that number items).

It seem at any time, a REPEAT will only have one in active, so a `SRE_REPEAT` 
array is fine.
regex module does like this:
https://github.com/mrabarnett/mrab-regex/blob/hg/regex_3/_regex.c#L18287-L18288

Can the number of REPEAT be placed in `SRE_OP_INFO`?
And add a field to `SRE_OP_REPEAT` to indicate the index of this REPEAT.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23689] Memory leak in Modules/sre_lib.h

2019-03-04 Thread Ma Lin

Ma Lin  added the comment:

PR11926 (closed) tried to allocate SRE_REPEAT on state's stack.
It's feasible, but messes up the code in sre_lib.h, and reduces performance a 
little (roughly 6% slower), so I gave up this solution.

PR12160 uses a memory pool, this solution doesn't mess up the code.

🔸For infrequent alloc/free scenes, it adds a small overhead:

s = 'a'
p = re.compile(r'(a)?')
p.match(s)  # <- measure this statement

before patch: 316 ns  +- 19 ns
after patch:  324 ns  +- 11 ns, 2.5% slower.
(by perf module)

🔸For very frequent alloc/free scenes, it brings a speedup:

s = 200_000_000 * 'a'
p = re.compile(r'.*?(?:bb)+')
p.match(s)  # <- measure this statement

before patch: 7.16 sec
after patch:  5.82 sec, 18.7% faster.
(best of 10 tests)

🔸I tested in a real case that use 17 patterns to process 100MB data:

before patch: 27.09 sec
after patch:  26.78 sec, 1.1% faster.
(best of 4 tests)

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23689] Memory leak in Modules/sre_lib.h

2019-03-04 Thread Ma Lin


Change by Ma Lin :


--
pull_requests: +12158

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23689] Memory leak in Modules/sre_lib.h

2019-02-18 Thread Ma Lin


Ma Lin  added the comment:

Try to allocate SRE_REPEAT on state's stack, the performance has not changed 
significantly.

It passes the other tests, except this one (test_stack_overflow):
https://github.com/python/cpython/blob/v3.8.0a1/Lib/test/test_re.py#L1225-L1230

I'll try to fix issue35859, issue9134 first.

--
versions: +Python 3.8 -Python 2.7, Python 3.6, Python 3.7

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23689] Memory leak in Modules/sre_lib.h

2019-02-18 Thread Ma Lin


Change by Ma Lin :


--
pull_requests: +11951

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23689] Memory leak in Modules/sre_lib.h

2019-02-18 Thread Serhiy Storchaka


Change by Serhiy Storchaka :


--
nosy: +Ma Lin

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23689] Memory leak in Modules/sre_lib.h

2017-11-16 Thread Serhiy Storchaka

Change by Serhiy Storchaka :


--
versions: +Python 3.6, Python 3.7 -Python 3.4, Python 3.5

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23689] Memory leak in Modules/sre_lib.h

2015-05-16 Thread Serhiy Storchaka

Changes by Serhiy Storchaka :


--
assignee:  -> serhiy.storchaka

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23689] Memory leak in Modules/sre_lib.h

2015-03-18 Thread Alexei Romanov

Changes by Alexei Romanov :


--
nosy: +alexei.romanov

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23689] Memory leak in Modules/sre_lib.h

2015-03-18 Thread Evgeny Kapun

Evgeny Kapun added the comment:

This patch doesn't fix the issue. The problem is that the list starting with 
state->repeat doesn't necessarily contains all repeat contexts that are 
allocated. Indeed, here [1] and here [2] repeat contexts are temporarily 
removed from the list. If the match procedure terminates abruptly, they are not 
added back.

[1] https://hg.python.org/cpython/file/c89f7c34e356/Modules/sre_lib.h#l963
[2] https://hg.python.org/cpython/file/c89f7c34e356/Modules/sre_lib.h#l1002

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23689] Memory leak in Modules/sre_lib.h

2015-03-17 Thread STINNER Victor

STINNER Victor added the comment:

Oh cool, you wrote a script to reproduce the issue! And Serhiy wrote a patch, 
great! Great job guys.

sre_clean_repeat_data.patch looks good to me.

@Serhiy: Can you try the example to ensure that it fixes the issue? If yes, go 
ahead!

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23689] Memory leak in Modules/sre_lib.h

2015-03-17 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

May be this patch helps.

--
keywords: +patch
stage:  -> patch review
versions: +Python 2.7, Python 3.5
Added file: http://bugs.python.org/file38529/sre_clean_repeat_data.patch

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23689] Memory leak in Modules/sre_lib.h

2015-03-17 Thread Serhiy Storchaka

Changes by Serhiy Storchaka :


--
nosy: +serhiy.storchaka

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23689] Memory leak in Modules/sre_lib.h

2015-03-17 Thread Evgeny Kapun

Evgeny Kapun added the comment:

Tracemalloc code:

import re
import signal
import tracemalloc

class AlarmError(Exception):
pass
def handle_alarm(signal, frame):
raise AlarmError
signal.signal(signal.SIGALRM, handle_alarm)

s1 = tracemalloc.take_snapshot()
for _ in range(20):
try:
signal.alarm(1)
re.match('(?:a|a|(?=b)){1000}', 'a'*999)
raise RuntimeError
except AlarmError:
pass
s2 = tracemalloc.take_snapshot()
res = s2.compare_to(s1, 'lineno')
for e in res[:10]:
print(e)

For me, it shows almost 3 MiB allocated in re.py.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23689] Memory leak in Modules/sre_lib.h

2015-03-17 Thread Evgeny Kapun

Evgeny Kapun added the comment:

Memory leak only happens if match operation terminates abruptly, e.g. because 
of SIGINT. In this case, DO_JUMP doesn't come back.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23689] Memory leak in Modules/sre_lib.h

2015-03-17 Thread STINNER Victor

STINNER Victor added the comment:

There is maybe a bug. Can you show an example of regex and a text where the 
memory leak occurs? You can use the tracemalloc module to check if there is a 
memory leak. Or use sys.getcounts() if you compiled Python in debug mode.

sre_lib.h is very complex, it uses the C instruction "goto" with regex 
bytecodes.

"DO_JUMP(JUMP_REPEAT, jump_repeat, ctx->pattern+ctx->pattern[0]);" calls "goto 
entrace" to execute following bytecodes, but later it comes back after 
DO_JUMP() with the "jump_repeat:" label:

https://hg.python.org/cpython/file/c89f7c34e356/Modules/sre_lib.h#l1180

--
nosy: +haypo

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23689] Memory leak in Modules/sre_lib.h

2015-03-17 Thread Evgeny Kapun

New submission from Evgeny Kapun:

In Modules/sre_lib.h on line 882 [1], a block of memory is allocated. If 
SRE(match) function later terminates abruptly, either because of a signal or 
because subsequent memory allocation fails, this block is never released.

[1] https://hg.python.org/cpython/file/c89f7c34e356/Modules/sre_lib.h#l882

--
components: Regular Expressions
messages: 238313
nosy: abacabadabacaba, ezio.melotti, mrabarnett
priority: normal
severity: normal
status: open
title: Memory leak in Modules/sre_lib.h
type: resource usage
versions: Python 3.4

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com