[issue4751] Patch for better thread support in hashlib
Changes by ebfe knabberknusperh...@yahoo.de: Removed file: http://bugs.python.org/file12557/md5module_small_locks.diff ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue4751 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue4818] Patch for thread-support in md5module.c
New submission from ebfe knabberknusperh...@yahoo.de: Here is another patch, this time for the fallback-md5-module. I know that situations are rare where openssl is not present but threading is. However they might occur out there and the md5module needed some love anyway: - The MD5 class from the fallback module can now also use threads with 'small locks' - The behaviour regarding unicode data input is now consistent as to what the openssl-driven classes do. - Some code cleanup. I might act on the sha modules as way the next days. sha256.c still accepts 's#'... Also see issue #4751 -- files: md5module_small_locks.diff keywords: patch messages: 78947 nosy: ebfe severity: normal status: open title: Patch for thread-support in md5module.c Added file: http://bugs.python.org/file12565/md5module_small_locks.diff ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue4818 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue4751] Patch for better thread support in hashlib
ebfe knabberknusperh...@yahoo.de added the comment: Haypo, we can probably reduce overhead by defining ENTER_HASHLIB like this: #define ENTER_HASHLIB(obj) \ if ((obj)-lock) { \ if (!PyThread_acquire_lock((obj)-lock, 0)) { \ Py_BEGIN_ALLOW_THREADS \ PyThread_acquire_lock((obj)-lock, 1); \ Py_END_ALLOW_THREADS \ } \ } ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue4751 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue4738] Patch to make zlib-objects better support threads
Changes by ebfe knabberknusperh...@yahoo.de: Removed file: http://bugs.python.org/file12466/zlib_threads-2.diff ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue4738 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue4738] Patch to make zlib-objects better support threads
ebfe knabberknusperh...@yahoo.de added the comment: Here is a small test-script with concurrent access to a single compressosbj. The original patch will immediately deadlock. The patch attached releases the GIL before trying to get the zlib-lock. This allows the other thread to release the zlib-lock but comes at the cost of one additional GIL lock/unlock. Added file: http://bugs.python.org/file12531/zlib_threads-3.diff ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue4738 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue4738] Patch to make zlib-objects better support threads
ebfe knabberknusperh...@yahoo.de added the comment: test-script Added file: http://bugs.python.org/file12532/zlibtest2.py ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue4738 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue4751] Patch for better thread support in hashlib
ebfe knabberknusperh...@yahoo.de added the comment: Releasing the GIL is somewhat expensive and should be avoided if possible. I've moved LEAVE_HASHLIB in EVP_update so the object gets unlocked before we call Py_END_ALLOW_THREADS. This is *only* possible because EVP_update does not use the object beyond those lines. Here is a new patch and a small test-script. Added file: http://bugs.python.org/file12533/hashopenssl_threads-4.diff ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue4751 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue4751] Patch for better thread support in hashlib
ebfe knabberknusperh...@yahoo.de added the comment: test-script Added file: http://bugs.python.org/file12534/hashlibtest2.py ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue4751 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue4751] Patch for better thread support in hashlib
Changes by ebfe knabberknusperh...@yahoo.de: Removed file: http://bugs.python.org/file12461/hashopenssl_threads-3.diff ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue4751 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue4751] Patch for better thread support in hashlib
ebfe knabberknusperh...@yahoo.de added the comment: gnarf, actually it should be 'threads.append(Hasher(md))' in the script :-\ ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue4751 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue4751] Patch for better thread support in hashlib
ebfe knabberknusperh...@yahoo.de added the comment: I don't think this is actually worth the trouble. You run into situation where one thread might decide that it needs a lock now with other threads being in the to-be-locked-area at that time. ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue4751 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue4751] Patch for better thread support in hashlib
ebfe knabberknusperh...@yahoo.de added the comment: I don't think so. The interface should stay simple - python has very few such magic knobs. People will optimize for their own box as you said - and that code will run worse on all the others... Besides, we've lived so long with single-threaded openssl. Let's make HASHLIB_GIL_MINSIZE such that there is no risk of additional overhead introduced by this patch and refer to it's current value in the hashlib-module's documentation. ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue4751 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue4751] Patch for better thread support in hashlib
ebfe knabberknusperh...@yahoo.de added the comment: haypo, the patch will not compile when WITH_THREADS is not defined. The 'lock'-member in the object structure is not present without WITH_THREADS however the line 'if (self-lock == NULL view.len = HASHLIB_GIL_MINSIZE)' will always refer to it. ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue4751 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue4751] Patch for better thread support in hashlib
ebfe knabberknusperh...@yahoo.de added the comment: Here is another patch, this time for the fallback-md5-module. I know that situations are rare where openssl is not present but threading is. However they might occur out there and the md5module needed some love anyway: - The MD5 class from the fallback module can now also use threads with 'small locks' - The behaviour regarding unicode data input is now consistent as to what the openssl-driven classes do. - Some code cleanup. I might act on the sha modules as way the next days. sha256.c still accepts 's#'... I might a Added file: http://bugs.python.org/file12557/md5module_small_locks.diff ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue4751 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue4746] Misguiding wording 3.0 c-api reference
ebfe knabberknusperh...@yahoo.de added the comment: Whenever the documentation says you must not it really says don't do that or your application *will* crash, burn and die... Of course I can allocate storage for the string, copy it's content and then free or - nothing will happen. How would it cause a crash - it's my own pointer. That's exactly the line between not required to, should not and must not: The current wording suggests that I may not even touch e.g. malloc which is confusing and in fact to be ignored in it's current state. ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue4746 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue4757] reject unicode in zlib
ebfe knabberknusperh...@yahoo.de added the comment: I don't think Python 2.x should be changed - but 3.0 or 3.1 should be: - Characters don't mean a thing in zlib-land, all operations are based on bytes and their (implicit) default encoding. This behaviour is hidden and somewhat violates the rule of least surprise. - type(zlib.decompress(zlib.compress('abc'))) == bytes anyway - Changing from s* to y* forces the programmer to use .encode() on his strings (e.g. zlib.compress('abc'.encode()) which very clearly shows what's happening. If you want to compress and decompress Python3 strings, you *must* share the same character encoding; think of zlib.compress('hôńè') and str(zlib.decompress(x)) with different locales. - Other modules (hashlib comes to my mind...) already reject Unicode objects for the same argument. ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue4757 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue4732] Object allocation stress leads to segfault on RHEL
ebfe knabberknusperh...@yahoo.de added the comment: I can't reproduce the problem here. Python 2.5.2 running on Linux lueg-desktop 2.6.24-22-generic #1 SMP Mon Nov 24 18:32:42 UTC 2008 i686 GNU/Linux -- nosy: +ebfe ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue4732 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue4751] Patch for better thread support in hashlib
New submission from ebfe knabberknusperh...@yahoo.de: The hashlib functions provided by _hashopenssl.c hold the GIL all the time although the underlying openssl-library is basically thread-safe. I've attached a patch (svn diff) which basically does four things: * If python is compiled with thread-support, the EVPobject is extended by an additional PyThread_type_lock which protects the objects individually. * The 'update' function releases the GIL if the to-be-hashed object is a Bytes-object and therefor provides trustworthy locking (all other types, including subclasses, are not trustworthy!). This allows multiple threads to do hashing in parallel. * The EVP_hash function removes duplicated code. * The situation regarding unicode objects is now more meaningful. Upon passing a unicode-string to the .update() function, the original hashlib throws a TypeError: object supporting the buffer API required which is confusing. I think it's perfectly valid not to accept unicode-strings as input and people should required to call str.encode() upon their strings before hashing, so a well-defined byte-representation of their strings get hashed. Therefor I patched the MY_GET_BUFFER_VIEW_OR_ERROUT-macro to throw TypeError: Unicode-objects must be encoded before hashing. This also fixes issue #1118 I've tested this patch and did not run into problems. CPU occupancy relies on the buffer-size passed to .update() as releasing the GIL is basically not worth the effort for very small buffers. More testing may be needed... -- components: Library (Lib) files: hashopenssl_threads.diff keywords: patch messages: 78297 nosy: ebfe severity: normal status: open title: Patch for better thread support in hashlib type: performance versions: Python 3.0 Added file: http://bugs.python.org/file12453/hashopenssl_threads.diff ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue4751 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue4751] Patch for better thread support in hashlib
Changes by ebfe knabberknusperh...@yahoo.de: Removed file: http://bugs.python.org/file12453/hashopenssl_threads.diff ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue4751 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue4751] Patch for better thread support in hashlib
ebfe knabberknusperh...@yahoo.de added the comment: Thanks for the advices. Antoine, maybe you could clarify the situation regarding buffer-locks for me. In older versions of PEP 3118 the PyBUF_LOCK flag was still present but it doesn't seem to have made it's way into the final draft. Is it save to assume that a buffer-view will not change until release() is called - for all types supporting the buffer protocol in py3k ?? I've done some testing and the overhead of releasing and re-locking the GIL is definitely a performance problem when trying to hash many small strings (doubled runtime for 100.000 times b'abc'). I've taken on haypo's patch to release the GIL only when the buffer is larger than 10kb. Added file: http://bugs.python.org/file12461/hashopenssl_threads-3.diff ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue4751 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue4738] Patch to make zlib-objects better support threads
ebfe knabberknusperh...@yahoo.de added the comment: new svn diff attached - GIL is now released for adler32 and crc32 if the buffer is larger than 5kb (we don't want to risk burning cpu cycles by GIL-stuff) - adler32 got it's param by s# but now does s* - why s# anyway? - ENTER_ZLIB no longer gives away the GIL. It's dangerous and useless as there is no pressure on the object's lock. - deflateCopy() and inflateCopy() are not worth the trouble.u Added file: http://bugs.python.org/file12463/zlib_threads-2.diff ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue4738 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue4738] Patch to make zlib-objects better support threads
Changes by ebfe knabberknusperh...@yahoo.de: Removed file: http://bugs.python.org/file12448/zlib_threads.diff ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue4738 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue4751] Patch for better thread support in hashlib
ebfe knabberknusperh...@yahoo.de added the comment: Here is another simple benchmarker. For me it shows almost perfect scaling (2 cores = 196% performance) if the buffer put into .update() is large enough. I deliberately did not move Py_BEGIN_ALLOW_THREADS into EVP_hash as we might call this function without having some lock on the input buffer. The 10kb limit was based on my own computer (MacBook Pro 2x2.5GHz) and is somewhat more-safe-than-sorry. Hashing is *very* fast on modern CPUs and working on many small strings becomes very inefficient when releasing the GIL all the time. Just try to hash 10240 bytes vs. 10241 bytes. Added file: http://bugs.python.org/file12465/hashlibtest.py ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue4751 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue4738] Patch to make zlib-objects better support threads
ebfe knabberknusperh...@yahoo.de added the comment: new svn diff attached the indentation in this file is not my fault, it has tabs all over it... The 5kb limits protects from the overhead of releasing the GIL. With very small buffers the overall runtime in my benchmark tends to double. I set it based on my testing and it remains being arbitrary to a certain degree. Set the limit to 1 and try 1.000.000 times b'abc'... May I also suggest to change the zlib module not to accept s* but y*: - Internally zlib operates on bytes, characters don't mean a thing in zlib-land. - We rely on s* performing the encoding into default for us. This behaviour is hidden from the programmer and somewhat violates the rule of least surprise. - type(zlib.decompress(zlib.compress('abc'))) == bytes - Changing from s* to y* forces the programmer to use .encode() on his strings (e.g. zlib.compress('abc'.encode()) which very clearly shows what's happening. Added file: http://bugs.python.org/file12466/zlib_threads-2.diff ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue4738 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue4738] Patch to make zlib-objects better support threads
Changes by ebfe knabberknusperh...@yahoo.de: Removed file: http://bugs.python.org/file12463/zlib_threads-2.diff ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue4738 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue4746] Misguiding wording 3.0 c-api reference
New submission from ebfe knabberknusperh...@yahoo.de: Quote from http://docs.python.org/3.0/c-api/arg.html, regarding the s argument: s (string or Unicode object) [const char *] Convert a Python string or Unicode object to a C pointer to a character string. You must not provide storage for the string itself; a pointer to an existing string is stored into the character pointer variable whose address you pass. I guess the phrase you must not provide storage is a failed translation and not meant like that. It should say you are not required to provide storage. It's confusing to have such strong wording without reason. -- assignee: georg.brandl components: Documentation messages: 78281 nosy: ebfe, georg.brandl severity: normal status: open title: Misguiding wording 3.0 c-api reference versions: Python 3.0 ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue4746 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue4738] Patch to make zlib-objects better support threads
New submission from ebfe knabberknusperh...@yahoo.de: My application needs to pack and unpack workunits all day long and does this using multiple threading.Threads. I've noticed that the zlib module seems to use only one thread at a time when using [de]compressobj(). As the comment in the sourcefile zlibmodule.c already says the module uses a global lock to protect different threads from accessing the object. While the c-functions release the GIL while waiting for the global lock, only one thread at a time can use zlib. My app ends up using only one CPU to compress/decompress it's workunits... The patch (svn diff to ) attached here fixes this problem by extending the compressobj-structure by an additional member to create object-specific locks and removes the global lock. The lock protects each compressobj individually and allows multiple python threads to use zlib in parallel, utilizing all available CPUs. -- components: None files: zlib_threads.diff keywords: patch messages: 78266 nosy: ebfe severity: normal status: open title: Patch to make zlib-objects better support threads type: performance versions: Python 2.5, Python 2.6, Python 2.7, Python 3.0, Python 3.1 Added file: http://bugs.python.org/file12440/zlib_threads.diff ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue4738 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com