[issue47248] Possible slowdown of regex searching in 3.11

2022-04-08 Thread Ma Lin
Ma Lin added the comment: > Possibly related to the new atomic grouping support from GH-31982? It seems not likely. I will do some benchmarks for this issue, more information (version/platform) is welcome. -- ___ Python tracker <

[issue47256] re: limit the maximum capturing group to 1, 073, 741, 823, reduce sizeof(match_context).

2022-04-08 Thread Ma Lin
Change by Ma Lin : -- keywords: +patch pull_requests: +30437 stage: -> patch review pull_request: https://github.com/python/cpython/pull/32411 ___ Python tracker <https://bugs.python.org/issu

[issue47256] re: limit the maximum capturing group to 1, 073, 741, 823, reduce sizeof(match_context).

2022-04-08 Thread Ma Lin
New submission from Ma Lin : These changes reduce sizeof(match_context): - 32-bit build: 36 bytes, no change. - 64-bit build: 72 bytes -> 56 bytes. sre uses stack and `match_context` struct to simulate recursive call, smaller struct brings: - deeper recursive call - less memory cons

[issue47248] Possible slowdown of regex searching in 3.11

2022-04-07 Thread Ma Lin
Ma Lin added the comment: Could you give the two versions? I will do a git bisect. I tested 356997c~1 and 356997c [1], msvc2022 non-pgo release build: # regex_dna ### an +- std dev: 151 ms +- 1 ms -> 152 ms +- 1 ms: 1.01x slower t significant # regex_effbot ### an +- std dev: 2.47

[issue47211] Remove re.template() and re.TEMPLATE

2022-04-06 Thread Ma Lin
Change by Ma Lin : -- nosy: +malin ___ Python tracker <https://bugs.python.org/issue47211> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.pyth

[issue47152] Reorganize the re module sources

2022-04-04 Thread Ma Lin
Ma Lin added the comment: > cryptic name In very early versions, "mark" was called register/region. https://github.com/python/cpython/blob/v1.0.1/Modules/regexpr.h#L48-L52 If span is accessed repeatedly, it's faster than Match.span(). Maybe consider renaming it, and make

[issue47152] Reorganize the re module sources

2022-04-04 Thread Ma Lin
Ma Lin added the comment: Match.regs is an undocumented attribute, it seems it has existed since 1991. Can it be removed? https://github.com/python/cpython/blob/ff2cf1d7d5fb25224f3ff2e0c678d36f78e1f3cb/Modules/_sre/sre.c#L2871 -- ___ Python

[issue23689] Memory leak in Modules/sre_lib.h

2022-04-03 Thread Ma Lin
Change by Ma Lin : -- pull_requests: +30344 pull_request: https://github.com/python/cpython/pull/32283 ___ Python tracker <https://bugs.python.org/issue23

[issue47152] Reorganize the re module sources

2022-04-02 Thread Ma Lin
Ma Lin added the comment: In `Modules` folder, there are _sre.c/sre.h/sre_constants.h/sre_lib.h files. Will them be put into a folder? -- ___ Python tracker <https://bugs.python.org/issue47

[issue47199] multiprocessing: micro-optimize Connection.send_bytes() method

2022-04-02 Thread Ma Lin
Change by Ma Lin : -- keywords: +patch pull_requests: +30318 stage: -> patch review pull_request: https://github.com/python/cpython/pull/32247 ___ Python tracker <https://bugs.python.org/issu

[issue47199] multiprocessing: micro-optimize Connection.send_bytes() method

2022-04-02 Thread Ma Lin
New submission from Ma Lin : `bytes(m)` can be replaced by memoryview.cast('B'), then no need for data copying. m = memoryview(buf) # HACK for byte-indexing of non-bytewise buffers (e.g. array.array) if m.itemsize > 1: m = memoryview(bytes(m))

[issue23689] Memory leak in Modules/sre_lib.h

2022-03-31 Thread Ma Lin
Change by Ma Lin : -- pull_requests: +30298 pull_request: https://github.com/python/cpython/pull/32223 ___ Python tracker <https://bugs.python.org/issue23

[issue47152] Reorganize the re module sources

2022-03-30 Thread Ma Lin
Change by Ma Lin : -- pull_requests: +30266 pull_request: https://github.com/python/cpython/pull/32188 ___ Python tracker <https://bugs.python.org/issue47

[issue23689] Memory leak in Modules/sre_lib.h

2022-03-30 Thread Ma Lin
Change by Ma Lin : -- pull_requests: +30265 pull_request: https://github.com/python/cpython/pull/32188 ___ Python tracker <https://bugs.python.org/issue23

[issue47152] Reorganize the re module sources

2022-03-29 Thread Ma Lin
Ma Lin added the comment: Please don't merge too close to the 3.11 beta1 release date, I'll submit PRs after this merged. -- ___ Python tracker <https://bugs.python.org/issue47

[issue23689] Memory leak in Modules/sre_lib.h

2022-03-29 Thread Ma Lin
Ma Lin added the comment: My PR methods are suboptimal, so I closed them. The number of REPEAT can be counted when compiling a pattern, and allocate a `SRE_REPEAT` array in `SRE_STATE` (with that number items). It seem at any time, a REPEAT will only have one in active, so a `SRE_REPEAT

[issue35859] Capture behavior depends on the order of an alternation

2022-03-29 Thread Ma Lin
Ma Lin added the comment: Thanks for your review. 3.11 has a more powerful re module, also thank you for rebasing the atomic grouping code. -- ___ Python tracker <https://bugs.python.org/issue35

[issue46864] Deprecate ob_shash in BytesObject

2022-03-24 Thread Ma Lin
Ma Lin added the comment: > I posted remove-bytes-hash.patch in this issue. Would you measure how this > affects whole application performance rather than micro benchmarks? I guess not much difference in benchmarks. But if put a bytes object into multiple dicts/sets, and len(byt

[issue46864] Deprecate ob_shash in BytesObject

2022-03-23 Thread Ma Lin
Ma Lin added the comment: If put a bytes object into multiple dicts/sets, the hash need to be computed multiple times. This seems a common usage. bytes is a very basic type, users may use it in various ways. And unskilled users may checking the same bytes object against dicts/sets many

[issue46864] Deprecate ob_shash in BytesObject

2022-03-22 Thread Ma Lin
Ma Lin added the comment: RAM is now relatively cheaper than CPU. 1 million bytes object additionally use 7.629 MiB RAM for ob_shash. (100_*8/1024/1024). This causes hash() performance regression anyway. -- ___ Python tracker <ht

[issue46864] Deprecate ob_shash in BytesObject

2022-03-22 Thread Ma Lin
Ma Lin added the comment: Since hash() is a public function, maybe some users use hash value to manage bytes objects in their own way, then there may be a performance regression. For a rough example, dispatch data to 16 servers. h = hash(b) sendto(server_number=h & 0xF, da

[issue46864] Deprecate ob_shash in BytesObject

2022-03-22 Thread Ma Lin
Ma Lin added the comment: If run this code, would it be slower? bytes_hash = hash(bytes_data) bytes_hash = hash(bytes_data) # get hash twice -- nosy: +malin ___ Python tracker <https://bugs.python.org/issue46

[issue47040] Fix confusing versionchanged note in crc32 and adler32

2022-03-19 Thread Ma Lin
Ma Lin added the comment: PR 32002 is for 3.10/3.9 branches. -- ___ Python tracker <https://bugs.python.org/issue47040> ___ ___ Python-bugs-list mailin

[issue47040] Fix confusing versionchanged note in crc32 and adler32

2022-03-19 Thread Ma Lin
Change by Ma Lin : -- pull_requests: +30090 pull_request: https://github.com/python/cpython/pull/32002 ___ Python tracker <https://bugs.python.org/issue47

[issue44439] stdlib wrongly uses len() for bytes-like object

2022-03-19 Thread Ma Lin
Ma Lin added the comment: `_Stream.write` method in tarfile.py also has this code: https://github.com/python/cpython/blob/v3.11.0a6/Lib/tarfile.py#L434 But this bug will not be triggered. When calling this method, always pass bytes data. `_ConnectionBase.send_bytes` method

[issue47040] Remove invalid versionchanged in doc

2022-03-17 Thread Ma Lin
Ma Lin added the comment: `binascii.crc32` doc also has this invalid document: doc: https://docs.python.org/3/library/binascii.html#binascii.crc32 3.0.0 code: https://github.com/python/cpython/blob/v3.0/Modules/binascii.c#L1035 In addition, `binascii.crc32` has an `USE_ZLIB_CRC32` code path

[issue47040] Remove an invalid versionchanged in doc

2022-03-16 Thread Ma Lin
Change by Ma Lin : -- keywords: +patch pull_requests: +30046 stage: -> patch review pull_request: https://github.com/python/cpython/pull/31955 ___ Python tracker <https://bugs.python.org/issu

[issue47040] Remove an invalid versionchanged in doc

2022-03-16 Thread Ma Lin
New submission from Ma Lin : Since CPython 3.0.0, the checksums are always truncated to `unsigned int`: https://github.com/python/cpython/blob/v3.0/Modules/zlibmodule.c#L930 https://github.com/python/cpython/blob/v3.0/Modules/zlibmodule.c#L950 -- assignee: docs@python components

[issue46255] Remove unnecessary check in _IOBase._check*() methods

2022-01-19 Thread Ma Lin
Change by Ma Lin : -- resolution: -> not a bug stage: patch review -> resolved status: open -> closed ___ Python tracker <https://bugs.python.or

[issue46255] Remove unnecessary check in _IOBase._check*() methods

2022-01-04 Thread Ma Lin
Change by Ma Lin : -- keywords: +patch pull_requests: +28606 stage: -> patch review pull_request: https://github.com/python/cpython/pull/30397 ___ Python tracker <https://bugs.python.org/issu

[issue46255] Remove unnecessary check in _IOBase._check*() methods

2022-01-04 Thread Ma Lin
New submission from Ma Lin : These methods are METH_NOARGS, in all cases the second parameter will be NULL. {"_checkClosed", _PyIOBase_check_closed, METH_NOARGS}, {"_checkSeekable", _PyIOBase_check_seekable, METH_NOARGS}, {"_checkReadable", _PyIOBa

[issue23224] bz2/lzma: Compressor/Decompressor objects are only initialized in __init__

2021-12-19 Thread Ma Lin
Ma Lin added the comment: These can be done in .__new__() method: - create thread lock - create (de)?compression context - initialize (de)?compressor states In .__init__() method, only set (de)?compression parameters. And prevent .__init__() method from being called multiple times

[issue44092] [sqlite3] Remove special rollback handling

2021-12-06 Thread Ma Lin
Ma Lin added the comment: If the special rollback handling is removed, the behavior of Connection.rollback() and 'ON CONFLICT ROLLBACK' clause will be consistent. See attached file on_conflict_rollback.py. -- Added file: https://bugs.python.org/file50481/on_conflict_rollback.py

[issue44092] [sqlite3] Remove special rollback handling

2021-12-06 Thread Ma Lin
Ma Lin added the comment: Imagine a person write a code with Python 3.11 and SQLite 3.8.7.2+, and then deploying it to Python 3.11 and SQLite 3.8.7.1-, error may occur. However, this situation is difficult to happen. > Can you provide a reproducer? We've run this change through the

[issue44092] [sqlite3] Remove special rollback handling

2021-12-06 Thread Ma Lin
Ma Lin added the comment: > How realistic is this scenario? If you compile with, for example 3.14.0 or > newer, you'd link with sqlite3_trace_v2, not sqlite3_trace, so the loader > would prevent you from running with anything pre 3.14. AFAIK, we've never > had such problems. I

[issue44092] [sqlite3] Remove special rollback handling

2021-12-05 Thread Ma Lin
Ma Lin added the comment: I think this change is no problem. Erlend E. Aasland's explanation is very clear. There is only one situation that a problem may occur. Write code with SQLite 3.8.7.2+ (2014-11-18), and run it on 3.7.15 (2012-12-12) ~ 3.8.7.1-, but this situation may be difficult

[issue33376] [pysqlite] Duplicate rows can be returned after rolling back a transaction

2021-11-27 Thread Ma Lin
Ma Lin added the comment: This issue is not resolved, but was covered by a problematic behavior. Maybe this issue will be solved in issue44092, I'll study that issue later. -- ___ Python tracker <https://bugs.python.org/issue33

[issue33376] [pysqlite] Duplicate rows can be returned after rolling back a transaction

2021-11-27 Thread Ma Lin
Ma Lin added the comment: Since 243b6c3b8fd3144450c477d99f01e31e7c3ebc0f (21-08-19), this bug can't be reproduced. In `pysqlite_do_all_statements()`, 243b6c3 resets statements like this: sqlite3_stmt *stmt = NULL; while ((stmt = sqlite3_next_stmt(self->db, s

[issue41735] Thread locks in zlib module may go wrong in rare case

2021-11-26 Thread Ma Lin
Ma Lin added the comment: Thanks for review! -- ___ Python tracker <https://bugs.python.org/issue41735> ___ ___ Python-bugs-list mailing list Unsubscribe:

[issue45816] Python does not support standalone MSVC v143 (VS 2022) Build Tools

2021-11-17 Thread Ma Lin
Ma Lin added the comment: They are LNK1268 error: LINK : fatal error LNK1268: inconsistent option 'pdbthreads:5' specified with /USEPROFILE but not with /GENPROFILE [e:\dev\cpython\PCbuild\_queue.vcx proj] LINK : fatal error LNK1268: inconsistent option 'pdbthreads:1' specified

[issue45816] Python does not support standalone MSVC v143 (VS 2022) Build Tools

2021-11-17 Thread Ma Lin
Ma Lin added the comment: There are 5 link errors when building the PGO build. Command: build --pgo -- nosy: +malin ___ Python tracker <https://bugs.python.org/issue45

[issue41735] Thread locks in zlib module may go wrong in rare case

2021-11-16 Thread Ma Lin
Ma Lin added the comment: Sorry, I found an omission. The previous PRs fixed the bug in these methods: zlib.Compress.compress() zlib.Decompress.decompress() This method also has this bug, fix in PR29587 (main/3.10) and PR29588 (3.9-): zlib.Decompress.flush() Attached file

[issue41735] Thread locks in zlib module may go wrong in rare case

2021-11-16 Thread Ma Lin
Change by Ma Lin : -- pull_requests: +27831 pull_request: https://github.com/python/cpython/pull/29588 ___ Python tracker <https://bugs.python.org/issue41

[issue41735] Thread locks in zlib module may go wrong in rare case

2021-11-16 Thread Ma Lin
Change by Ma Lin : -- pull_requests: +27830 pull_request: https://github.com/python/cpython/pull/29587 ___ Python tracker <https://bugs.python.org/issue41

[issue44439] stdlib wrongly uses len() for bytes-like object

2021-11-08 Thread Ma Lin
Ma Lin added the comment: Serhiy Storchaka: Sorry, I found `zipfile` module also has this bug, fixed in PR29468. This bug was reported & fixed by GitHub user `marcoffee` firstly, so I list him as a co-author, his work: https://github.com/animalize/pyzstd/issues/4 The second commit f

[issue44439] stdlib wrongly uses len() for bytes-like object

2021-11-08 Thread Ma Lin
Change by Ma Lin : -- pull_requests: +27721 pull_request: https://github.com/python/cpython/pull/29468 ___ Python tracker <https://bugs.python.org/issue44

[issue45116] Performance regression 3.10b1 and later on Windows: Py_DECREF() not inlined in PGO build

2021-10-08 Thread Ma Lin
Ma Lin added the comment: Today I tested with msvc2022-preview, `__forceinline` attribute will not hang the build. 64-bit PGO builds: 28d28e0~1,vc2022 : baseline 28d28e0~1+F,vc2022 : 1.02x slower <1> 28d28e0,vc2022 : 1.03x slower <2> 28d28e0+F,vc2022 :

[issue45116] Performance regression 3.10b1 and later on Windows: Py_DECREF() not inlined in PGO build

2021-09-29 Thread Ma Lin
Ma Lin added the comment: I think this is a bug of MSVC2019, not a really regression of CPython. So changing the code of CPython is just a workaround, maybe the right direction is to prompt MSVC to fix the bug, otherwise there will be more trouble when 3.11 is released a year later. Seeing

[issue45116] Performance regression 3.10b1 and later on Windows: Py_DECREF() not inlined in PGO build

2021-09-21 Thread Ma Lin
Ma Lin added the comment: PR28475: 64-bit build is 1.03x slower than 28d28e0~1 32-bit build is 1.04x slower than 28d28e0~1 28d28e0~1 is the last good commit. -- ___ Python tracker <https://bugs.python.org/issue45

[issue45116] Performance regression 3.10b1 and later on Windows: Py_DECREF() not inlined in PGO build

2021-09-19 Thread Ma Lin
Ma Lin added the comment: Like OP's benchmark, if convert the inline functions to macros in object.h, the 3.10 branch is 1.03x faster, but still 1.07x slower than 28d28e0~1. @vstinner could you prepare such a PR as a candidate fix. There seem to be two ways to solve it in short-term. 1

[issue45116] Performance regression 3.10b1 and later on Windows: Py_DECREF() not inlined in PGO build

2021-09-18 Thread Ma Lin
Ma Lin added the comment: > In my case, pgo got stuck on linking with the object.h. Me too. Since commit 28d28e0 (the first commit to slow down the PGO build), if add `__forceinline` attribute to _Py_DECREF() function in object.h, the PGO build hangs (>50 minutes). So PR 284

[issue45116] Performance regression 3.10b1 and later on Windows

2021-09-10 Thread Ma Lin
Ma Lin added the comment: MSVC 2019 has a /Ob3 option: https://docs.microsoft.com/en-us/cpp/build/reference/ob-inline-function-expansion >From the experience of another project, I conjecture /Ob3 increase the "global >budget" mentioned in the blog. I used /Ob3 for

[issue45116] Performance regression 3.10b1 and later on Windows

2021-09-08 Thread Ma Lin
Ma Lin added the comment: This article briefly introduces the inlining decisions in MSVC. https://devblogs.microsoft.com/cppblog/inlining-decisions-in-visual-studio/ -- nosy: +malin ___ Python tracker <https://bugs.python.org/issue45

[issue44912] doc: macOS supports os.fsync(fd)

2021-08-14 Thread Ma Lin
Ma Lin added the comment: Unix includes macOS. Very sorry, close as invalid. -- stage: -> resolved status: open -> closed ___ Python tracker <https://bugs.python.org/i

[issue44912] doc: macOS supports os.fsync(fd)

2021-08-13 Thread Ma Lin
New submission from Ma Lin : The doc of os.fsync() said: Availability: Unix, Windows. https://docs.python.org/3.11/library/os.html#os.fsync But it seems that macOS supports fsync. (I'm not a macOS user) -- assignee: docs@python components: Documentation, macOS messages: 399583

[issue44711] Optimize type check in pipes.py

2021-07-22 Thread Ma Lin
Ma Lin added the comment: > I suppose it is a very old code I also found a few old code may have performance loss. memoryview.cast() method was add in Python 3.3. This code doesn't use memoryview.cast(), which will bring extra memory overhead when the amount of data is very large. ht

[issue44549] BZip 1.0.6 Critical Vulnerability

2021-07-04 Thread Ma Lin
Ma Lin added the comment: If you update python/cpython-source-deps, I can submit a simple PR to python/cpython. I want to submit a PR to python/cpython-source-deps, but I think it’s better for a credible person to do this. -- nosy: +malin

[issue44439] stdlib wrongly uses len() for bytes-like object

2021-06-22 Thread Ma Lin
Change by Ma Lin : -- pull_requests: +25427 pull_request: https://github.com/python/cpython/pull/26846 ___ Python tracker <https://bugs.python.org/issue44

[issue44458] Duplicate symbol _BUFFER_BLOCK_SIZE when statically linking multiple modules

2021-06-22 Thread Ma Lin
Change by Ma Lin : -- nosy: +malin ___ Python tracker <https://bugs.python.org/issue44458> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.pyth

[issue44439] stdlib wrongly uses len() for bytes-like object

2021-06-21 Thread Ma Lin
Ma Lin added the comment: I am checking all the .py files in `Lib` folder. hmac.py has two len() bugs: https://github.com/python/cpython/blob/v3.10.0b3/Lib/hmac.py#L212 https://github.com/python/cpython/blob/v3.10.0b3/Lib/hmac.py#L214 I think PR 26764 is prepared, it fixes the len() bugs

[issue44439] PickleBuffer doesn't have __len__ method

2021-06-17 Thread Ma Lin
Change by Ma Lin : -- keywords: +patch pull_requests: +25350 stage: -> patch review pull_request: https://github.com/python/cpython/pull/26764 ___ Python tracker <https://bugs.python.org/issu

[issue44439] PickleBuffer doesn't have __len__ method

2021-06-17 Thread Ma Lin
Ma Lin added the comment: Ok, I'm working on a PR. -- ___ Python tracker <https://bugs.python.org/issue44439> ___ ___ Python-bugs-list mailing list Unsubscribe:

[issue44439] PickleBuffer doesn't have __len__ method

2021-06-16 Thread Ma Lin
New submission from Ma Lin : If run this code, it will raise an exception: import pickle import lzma import pandas as pd with lzma.open("test.xz", "wb") as file: pickle.dump(pd.DataFrame(range(1_000_000)), file, protocol=5) The exception: Tr

[issue44134] lzma: stream padding in xz files

2021-05-15 Thread Ma Lin
Change by Ma Lin : -- nosy: +malin ___ Python tracker <https://bugs.python.org/issue44134> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.pyth

[issue43650] MemoryError on zip.read in shutil._unpack_zipfile

2021-05-15 Thread Ma Lin
Change by Ma Lin : -- nosy: +malin ___ Python tracker <https://bugs.python.org/issue43650> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.pyth

[issue41486] Add _BlocksOutputBuffer for bz2/lzma/zlib module

2021-05-15 Thread Ma Lin
Ma Lin added the comment: Sorry, for the (init_size > UINT32_MAX) problem, I have a better solution. Please imagine this scenario: - before the patch - in 64-bit build - use zlib.decompress() function - the exact decompressed size is known and > UINT32_MAX (e.g. 10 GiB) If set the `b

[issue41486] Add _BlocksOutputBuffer for bz2/lzma/zlib module

2021-05-15 Thread Ma Lin
Change by Ma Lin : -- pull_requests: +24779 pull_request: https://github.com/python/cpython/pull/26143 ___ Python tracker <https://bugs.python.org/issue41

[issue44114] Incorrect function signatures in dictobject.c

2021-05-12 Thread Ma Lin
Change by Ma Lin : -- nosy: +methane ___ Python tracker <https://bugs.python.org/issue44114> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.pyth

[issue33376] [pysqlite] Duplicate rows can be returned after rolling back a transaction

2021-05-10 Thread Ma Lin
Ma Lin added the comment: Erlend, please take a look at this bug. -- ___ Python tracker <https://bugs.python.org/issue33376> ___ ___ Python-bugs-list mailin

[issue41486] Add _BlocksOutputBuffer for bz2/lzma/zlib module

2021-04-29 Thread Ma Lin
Ma Lin added the comment: Found a backward incompatible behavior. Before the patch, in 64-bit build, zlib module allows the initial size > UINT32_MAX. It creates a bytes object, and uses a sliding window to deal with the UINT32_MAX limit: https://github.com/python/cpython/blob/v3.

[issue41486] Add _BlocksOutputBuffer for bz2/lzma/zlib module

2021-04-29 Thread Ma Lin
Change by Ma Lin : -- pull_requests: +24429 pull_request: https://github.com/python/cpython/pull/25738 ___ Python tracker <https://bugs.python.org/issue41

[issue41486] Add _BlocksOutputBuffer for bz2/lzma/zlib module

2021-04-28 Thread Ma Lin
Ma Lin added the comment: Thanks for reviewing this big patch. Your review makes the code better. -- ___ Python tracker <https://bugs.python.org/issue41

[issue41486] Add _BlocksOutputBuffer for bz2/lzma/zlib module

2021-04-27 Thread Ma Lin
Ma Lin added the comment: The above changes were made in this commit: split core code and wrappers 55705f6dc28ff4dc6183e0eb57312c885d19090a After that commit, there is a new commit, it resolves the code conflicts introduced by PR 22126 one hour ago. Merge branch 'master

[issue41735] Thread locks in zlib module may go wrong in rare case

2021-04-27 Thread Ma Lin
Ma Lin added the comment: Thanks for review. -- ___ Python tracker <https://bugs.python.org/issue41735> ___ ___ Python-bugs-list mailing list Unsubscribe:

[issue41486] Add _BlocksOutputBuffer for bz2/lzma/zlib module

2021-04-26 Thread Ma Lin
Ma Lin added the comment: Very sorry for update at the last moment. But after the update, we should no need to touch it in the future, so I think it's worthy. Please review the last commit in PR 21740, the previous commits have not been changed. IMO if use a Git client such as TortoiseGit

[issue41486] Add _BlocksOutputBuffer for bz2/lzma/zlib module

2021-04-25 Thread Ma Lin
Ma Lin added the comment: > The defines of BOB_BUFFER_TYPE/BOB_SIZE_TYPE/BOB_SIZE_MAX are ugly. If put > the core code together, these defines can be put in a thin wrapper in > _bz2module.c/_lzmamodule.c/zlibmodule.c files. I tried, it looks well. I will updated the PR within o

[issue43787] Optimize BZ2File, GzipFile, and LZMAFile __iter__ method.

2021-04-12 Thread Ma Lin
Ma Lin added the comment: I think this change is safe. The behaviors should be exactly the same, except the iterators are different objects (obj vs obj._buffer). -- ___ Python tracker <https://bugs.python.org/issue43

[issue43787] Optimize BZ2File, GzipFile, and LZMAFile __iter__ method.

2021-04-12 Thread Ma Lin
Change by Ma Lin : -- nosy: +malin ___ Python tracker <https://bugs.python.org/issue43787> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.pyth

[issue41486] Add _BlocksOutputBuffer for bz2/lzma/zlib module

2021-04-11 Thread Ma Lin
Ma Lin added the comment: > I don't really _like_ that this is a .h file acting as a C template to inject > effectively the same static code into each module that wants to use it... > Which I think is the concern Victor is expressing in a comment above. I think so too. Th

[issue43785] Remove RLock from BZ2File

2021-04-09 Thread Ma Lin
Ma Lin added the comment: This change is backwards incompatible, it may break some code silently. If someone really needs better performance, they can write a BZ2File class without RLock by themselves, it should be easy. FYI, zlib module was added in 1997, bz2 module was added in 2002, lzma

[issue43785] bz2 performance issue.

2021-04-09 Thread Ma Lin
Change by Ma Lin : -- nosy: +malin ___ Python tracker <https://bugs.python.org/issue43785> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.pyth

[issue41486] Add _BlocksOutputBuffer for bz2/lzma/zlib module

2021-04-05 Thread Ma Lin
Ma Lin added the comment: ping -- ___ Python tracker <https://bugs.python.org/issue41486> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.pyth

[issue43305] A typo in /Modules/_io/bufferedio.c

2021-02-26 Thread Ma Lin
Ma Lin added the comment: Close as invalid. They the same effect: PyErr_GivenExceptionMatches(v, PyExc_BlockingIOError)) PyErr_GivenExceptionMatches(t, PyExc_BlockingIOError)) -- resolution: -> wont fix stage: -> resolved status: open -&g

[issue43305] A typo in /Modules/_io/bufferedio.c

2021-02-26 Thread Ma Lin
Ma Lin added the comment: I am trying to write a test-case. -- ___ Python tracker <https://bugs.python.org/issue43305> ___ ___ Python-bugs-list mailin

[issue43305] A typo in /Modules/_io/bufferedio.c

2021-02-23 Thread Ma Lin
New submission from Ma Lin : 654PyErr_Fetch(, , ); 655if (v == NULL || !PyErr_GivenExceptionMatches(v, PyExc_BlockingIOError)) { ↑ this should be t https://github.com/python/cpython/blob/v3.10.0a5/Modules/_io/bufferedio.c#L654-L655

[issue33376] [pysqlite] Duplicate rows can be returned after rolling back a transaction

2021-02-23 Thread Ma Lin
Change by Ma Lin : -- nosy: +erlendaasland ___ Python tracker <https://bugs.python.org/issue33376> ___ ___ Python-bugs-list mailing list Unsubscribe:

[issue43027] Calling _PyBytes_Resize() on 1-byte bytes may raise error

2021-01-25 Thread Ma Lin
New submission from Ma Lin : PyBytes_FromStringAndSize() uses a global cache for 1-byte bytes: https://github.com/python/cpython/blob/v3.10.0a4/Objects/bytesobject.c#L147 if (size == 1 && str != NULL) { struct _Py_bytes_state *state = get_bytes_state(); op

[issue43023] Remove a redundant check in _PyBytes_Resize()

2021-01-25 Thread Ma Lin
Ma Lin added the comment: Found a new issue, can be combined with this issue. -- stage: patch review -> resolved status: open -> closed ___ Python tracker <https://bugs.python.org/i

[issue43023] Remove a redundant check in _PyBytes_Resize()

2021-01-25 Thread Ma Lin
Change by Ma Lin : -- keywords: +patch pull_requests: +23149 stage: -> patch review pull_request: https://github.com/python/cpython/pull/24330 ___ Python tracker <https://bugs.python.org/issu

[issue43023] Remove a redundant check in _PyBytes_Resize()

2021-01-25 Thread Ma Lin
New submission from Ma Lin : Above code already cover this check: if (Py_SIZE(v) == newsize) { /* return early if newsize equals to v->ob_size */ return 0; } if (Py_SIZE(v) == 0) { - if (newsize == 0) { - return 0; - }

[issue42550] re库匹配问题

2020-12-02 Thread Ma Lin
Ma Lin added the comment: This issue can be closed. '0x' 2 'd26935a5ee4cd542e8a3a7e74fb7a99855975b59' 40 '\n' 1 2+40+1 = 43 -- nosy: +malin ___ Python tracker <https://bugs.python.org/issue42

[issue42366] Use MSVC2019 and /Ob3 option to compile Windows builds

2020-11-17 Thread Ma Lin
Ma Lin added the comment: Last benchmark was wrong, \Ob3 option was not enabled. Apply `pgo_ob3.diff`, it slows, so I close this issue. +-++--+ | Benchmark | py39_pgo_a | py39_pgo_b

[issue42369] Reading ZipFile not thread-safe

2020-11-16 Thread Ma Lin
Change by Ma Lin : -- nosy: +malin ___ Python tracker <https://bugs.python.org/issue42369> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.pyth

[issue42366] Use MSVC2019 and /Ob3 option to compile Windows builds

2020-11-16 Thread Ma Lin
Ma Lin added the comment: In PGO build, the improvement is not much. (3.9 branch, with PGO, build.bat -p X64 --pgo) +-+--+--+ | Benchmark | baseline-pgo | ob3-pgo

[issue42366] Use MSVC2019 and /Ob3 option to compile Windows builds

2020-11-16 Thread Ma Lin
Ma Lin added the comment: > Could you please try again with PGO? Please wait. BTW, this option was advised in another project. In that project, even enable `\Ob3`, it still slower than GCC 9 build. If you are interested, see: https://github.com/facebook/zstd/issues/2

[issue42366] Use MSVC2019 and /Ob3 option to compile Windows builds

2020-11-16 Thread Ma Lin
New submission from Ma Lin : MSVC2019 has a new option `/Ob3`, it specifies more aggressive inlining than /Ob2: https://docs.microsoft.com/en-us/cpp/build/reference/ob-inline-function-expansion?view=msvc-160 If use this option in MSVC2017, it will emit a warning: cl : Command line warning

[issue42304] [easy C] long type performance waste in 64-bit Windows build

2020-11-10 Thread Ma Lin
Ma Lin added the comment: > I do not think that this is suitable for newcomers because you need to have > deep understanding why it was written in such form at first place and what > will be changed if you change it. I agree contributors need to understand code, rather than simpl

[issue42304] [easy C] long type performance waste in 64-bit Windows build

2020-11-10 Thread Ma Lin
Ma Lin added the comment: > What is the problem exactly? There are several different problems, such as: https://github.com/python/cpython/blob/v3.10.0a2/Modules/mathmodule.c#L2033 In addition, `utf16_decode` also has this problem, I forgot this: https://github.com/python/cpython/b

[issue42304] [easy C] long type performance waste in 64-bit Windows build

2020-11-09 Thread Ma Lin
New submission from Ma Lin : C type `long` is 4-byte integer in 64-bit Windows build (MSVC behavior). [1] In other compilers, `long` is 8-byte integer in 64-bit build. This leads to a bit unnecessary performance waste, issue38252 fixed this problem in a situation. Search `SIZEOF_LONG

[issue41486] Add _BlocksOutputBuffer for bz2/lzma/zlib module

2020-10-28 Thread Ma Lin
Ma Lin added the comment: I modify lzma module to use different growth factors, see attached picture different_factors.png 1.5x should be the growth factor of _PyBytesWriter under Windows. So if change _PyBytesWriter to use memory blocks, maybe there will be no performance improvement

  1   2   3   4   >