[Python-Dev] Re: license issues with profiler.py and md5.h/md5c.c
Gregory P. Smith wrote: I don't quite like the module name 'hashes' that i chose for the generic interface (too close to the builtin hash() function). Other suggestions on a module name? 'digest' comes to mind. hashtools, hashlib, and _hash are common names for helper modules like this. (you still provide md5 and sha wrappers, I hope) /F ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] 2.4 func.__name__ breakage
Tim Peters [EMAIL PROTECTED] writes: Rev 2.66 of funcobject.c made func.__name__ writable for the first time. That's great, but the patch also introduced what I'm pretty sure was an unintended incompatibility: after 2.66, func.__name__ was no longer *readable* in restricted execution mode. Yeah, my bad. I can't think of a good reason to restrict reading func.__name__, and it looks like this part of the change was an accident. So, unless someone objects soon, I intend to restore that func.__name__ is readable regardless of execution mode (but will continue to be unwritable in restricted execution mode). Objections? Well, I fixed it on reading the bug report and before getting to python-dev mail :) Sorry if this duplicated your work, but hey, it was only a two line change... Cheers, mwh -- The only problem with Microsoft is they just have no taste. -- Steve Jobs, (From _Triumph of the Nerds_ PBS special) and quoted by Aahz on comp.lang.python ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] [ python-Bugs-1124637 ] test_subprocess is far too slow (fwd)
Peter Astrand wrote: I'd like to have your opinion on this bug. Personally, I'd prefer to keep test_no_leaking as it is, but if you think otherwise... One thing that actually can motivate that test_subprocess takes 20% of the overall time is that this test is a good generic Python stress test - this test might catch some other startup race condition, for example. test_decimal has a short version which tests basic functionality and always runs, but enabling -udecimal also runs the specification tests (which take a fair bit longer). So keeping the basic subprocess tests unconditional, and running the long ones only if -uall or -usubprocess are given would seem reasonable. Cheers, Nick. -- Nick Coghlan | [EMAIL PROTECTED] | Brisbane, Australia --- http://boredomandlaziness.skystorm.net ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] Re: [ python-Bugs-1124637 ] test_subprocess is far tooslow (fwd)
Nick Coghlan wrote: One thing that actually can motivate that test_subprocess takes 20% of the overall time is that this test is a good generic Python stress test - this test might catch some other startup race condition, for example. test_decimal has a short version which tests basic functionality and always runs, but enabling -udecimal also runs the specification tests (which take a fair bit longer). So keeping the basic subprocess tests unconditional, and running the long ones only if -uall or -usubprocess are given would seem reasonable. does anyone ever use the -u options when running tests? /F ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Re: [ python-Bugs-1124637 ] test_subprocess is far tooslow (fwd)
Fredrik Lundh [EMAIL PROTECTED] writes: Nick Coghlan wrote: One thing that actually can motivate that test_subprocess takes 20% of the overall time is that this test is a good generic Python stress test - this test might catch some other startup race condition, for example. test_decimal has a short version which tests basic functionality and always runs, but enabling -udecimal also runs the specification tests (which take a fair bit longer). So keeping the basic subprocess tests unconditional, and running the long ones only if -uall or -usubprocess are given would seem reasonable. does anyone ever use the -u options when running tests? Yes, occasionally. Esp. with test_compiler a testall run is an overnight job but I try to do it every now and again. Cheers, mwh -- If design space weren't so vast, and the good solutions so small a portion of it, programming would be a lot easier. -- maney, comp.lang.python ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] 2.4 func.__name__ breakage
[Michael Hudson] ... Well, I fixed it on reading the bug report and before getting to python-dev mail :) Sorry if this duplicated your work, but hey, it was only a two line change... Na, the real work was tracking it down in the bowels of Zope's C-coded security machinery -- we'll let you do that part next time wink. Did you add a test to ensure this remains fixed? A NEWS blurb (at least for 2.4.1 -- the test failures under 2.4 are very visible in the Zope world, due to auto-generated test runner failure reports)? ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] 2.4 func.__name__ breakage
Tim Peters [EMAIL PROTECTED] writes: [Michael Hudson] ... Well, I fixed it on reading the bug report and before getting to python-dev mail :) Sorry if this duplicated your work, but hey, it was only a two line change... Na, the real work was tracking it down in the bowels of Zope's C-coded security machinery -- we'll let you do that part next time wink. Did you add a test to ensure this remains fixed? Yup. A NEWS blurb (at least for 2.4.1 -- the test failures under 2.4 are very visible in the Zope world, due to auto-generated test runner failure reports)? No, I'll do that now. I'm not very good at remembering NEWS blurbs... Cheers, mwh -- 6. The code definitely is not portable - it will produce incorrect results if run from the surface of Mars. -- James Bonfield, http://www.ioccc.org/2000/rince.hint ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Re: [ python-Bugs-1124637 ] test_subprocess is far tooslow (fwd)
[Fredrik Lundh] does anyone ever use the -u options when running tests? Yes -- I routinely do -uall, under both release and debug builds, but only on Windows. WinXP in particular seems to do a good job when hyper-threading is available -- running the tests doesn't slow down anything else I'm doing, except during the disk-intensive tests (test_largefile is a major pig on Windows). ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] 2.4 func.__name__ breakage
[sorry for the near-duplicate msgs -- looks like gmail lied when it claimed the first msg was still in draft status] Did you add a test to ensure this remains fixed? [mwh] Yup. Bless you. Did you attach a contributor agreement and mark the test as being contributed under said contributor agreement, adjacent to your valid copyright notice wink? A NEWS blurb ...? No, I'll do that now. I'm not very good at remembering NEWS blurbs... LOL -- sorry, I'm just imagining what NEWS would look like if we required a contributor-agreement notification on each blurb. I appreciate your work here, and will try to find a drug to counteract the ones I appear to have overdosed on this morning ... ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] 2.4 func.__name__ breakage
Tim Peters [EMAIL PROTECTED] writes: [sorry for the near-duplicate msgs -- looks like gmail lied when it claimed the first msg was still in draft status] Did you add a test to ensure this remains fixed? [mwh] Yup. Bless you. Did you attach a contributor agreement and mark the test as being contributed under said contributor agreement, adjacent to your valid copyright notice wink? Fortunately 2 lines 25 lines, so I think I'm safe on this one :) Cheers, mwh -- moshez glyph: I don't know anything about reality. -- from Twisted.Quotes ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] [ python-Bugs-1124637 ] test_subprocess is far too slow (fwd)
I'd like to have your opinion on this bug. Personally, I'd prefer to keep test_no_leaking as it is, but if you think otherwise... One thing that actually can motivate that test_subprocess takes 20% of the overall time is that this test is a good generic Python stress test - this test might catch some other startup race condition, for example. A suite of unit tests is a precious thing. We want to test as much as we can, and as thoroughly as possible; but at the same time we want the test to run reasonably fast. If the test takes too long, human nature being what it is, this will actually cause less thorough testing because developers don't feel like running the test suite after each small change, and then we get frequent problems where someone breaks the build because they couldn't wait to run the unit test. (For example, where I work we have a Java test suite that takes 25 minutes to run. The build is broken on a daily basis by developers (including me) who make a small change and check it in believing it won't break anything.) The Python test suite already has a way (the -u flag) to distinguish between regular broad-coverage testing and deep coverage for specific (or all) areas. Let's keep the really long-running tests out of the regular test suite. There used to be a farm of machines that did nothing but run the test suite (snake-farm). This seems to have stopped (it was run by volunteers at a Swedish university). Maybe we should revive such an effort, and make sure it runs with -u all. -- --Guido van Rossum (home page: http://www.python.org/~guido/) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
RE: [Python-Dev] [ python-Bugs-1124637 ] test_subprocess is far tooslow (fwd)
Let's keep the really long-running tests out of the regular test suite. For test_subprocess, consider adopting the technique used by test_decimal. When -u decimal is not specified, a small random selection of the resource intensive tests are run. That way, all of the tests eventually get run even if no one is routinely using -u all. Raymond ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] Five review rule on the /dev/ page?
I am frantically trying to get ready to be out of town for a week of vacation. Someone sent me some patches for datetime and asked me to look at them. I begged off but referred him to http://www.python.org/dev/ and made mention of the five patch review idea. Can someone make sure that's explained on the /dev/ site? Thx, Skip ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] [ python-Bugs-1124637 ] test_subprocess is far too slow (fwd)
Guido van Rossum wrote: [...] There used to be a farm of machines that did nothing but run the test suite (snake-farm). This seems to have stopped (it was run by volunteers at a Swedish university). Maybe we should revive such an effort, and make sure it runs with -u all. I've changed the job that produces the data for http://coverage.livinglogic.de/ to run python Lib/test/regrtest.py -uall -T -N Unfortunately this job currently produces only coverage info, the output of the test suite is thrown away. It should be easy to fix this, so that the output gets put into the database. Bye, Walter Dörwald ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] [ python-Bugs-1124637 ] test_subprocess is far tooslow (fwd)
Raymond Hettinger [EMAIL PROTECTED] writes: Let's keep the really long-running tests out of the regular test suite. For test_subprocess, consider adopting the technique used by test_decimal. When -u decimal is not specified, a small random selection of the resource intensive tests are run. That way, all of the tests eventually get run even if no one is routinely using -u all. I do like this strategy but I don't think it applies to this test -- it has to try to create more than 'ulimit -n' processes, if I understand it correctly. Which makes me think there might be other ways to write the test if the resource module is available... Cheers, mwh -- 34. The string is a stark data structure and everywhere it is passed there is much duplication of process. It is a perfect vehicle for hiding information. -- Alan Perlis, http://www.cs.yale.edu/homes/perlis-alan/quotes.html ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Five review rule on the /dev/ page?
On Thu, Feb 17, 2005, Skip Montanaro wrote: I am frantically trying to get ready to be out of town for a week of vacation. Someone sent me some patches for datetime and asked me to look at them. I begged off but referred him to http://www.python.org/dev/ and made mention of the five patch review idea. Can someone make sure that's explained on the /dev/ site? This should go into Brett's survey of the Python dev process, not as official documentation. It's simply an offer made by some of the prominent members of python-dev. -- Aahz ([EMAIL PROTECTED]) * http://www.pythoncraft.com/ The joy of coding Python should be in seeing short, concise, readable classes that express a lot of action in a small amount of clear code -- not in reams of trivial code that bores the reader to death. --GvR ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] builtin_id() returns negative numbers
Hi Tim, On Mon, Feb 14, 2005 at 10:41:35AM -0500, Tim Peters wrote: # This is a puzzle: there's no way to know the natural width of # addresses on this box (in particular, there's no necessary # relation to sys.maxint). Isn't this natural width nowadays available as: 256 ** struct.calcsize('P') ? Armin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] Re: Re: string find(substring) vs. substring in string
Raymond Hettinger wrote: but refactoring the contains code to use find_internal sounds like a good first step. any takers? I'm up for it. excellent! just fyi, unless my benchmark is mistaken, the Unicode implementation has the same problem: str in - 25.8 µsec per loop unicode in - 26.8 µsec per loop str.find() - 6.73 µsec per loop unicode.find() - 7.24 µsec per loop oddly enough, if I change the target string so it doesn't contain any partial matches at all, unicode.find() wins the race: str in - 24.5 µsec per loop unicode in - 24.6 µsec per loop str.find() - 2.86 µsec per loop unicode.find() - 2.16 µsec per loop /F ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Five review rule on the /dev/ page?
[removed pydotorg from people receiving this email] Aahz wrote: On Thu, Feb 17, 2005, Skip Montanaro wrote: I am frantically trying to get ready to be out of town for a week of vacation. Someone sent me some patches for datetime and asked me to look at them. I begged off but referred him to http://www.python.org/dev/ and made mention of the five patch review idea. Can someone make sure that's explained on the /dev/ site? This should go into Brett's survey of the Python dev process, not as official documentation. It's simply an offer made by some of the prominent members of python-dev. I am planning on adding that blurb in there. Actually, while I have everyone's attention, I might as well throw an idea out there about sprucing up yet again the docs on contributing. I was thinking of taking the current dev intro and have it just explain how things basically work around here. So the doc would become more of just a high-level overview of how we dev the language. But I would cut out the helping out section and spin that into another doc that would go into some more detail on how to make a contribution. So this would specify in more detail how to report a bug, how to comment on one, etc. (same goes for patches). This is where I would stick the 5-for-1 deal. Lastly, write up a doc that covers what one with CVS checkin rights needs to do when checking in code. So how one goes about getting checkin rights, getting initial checkins OK'ed by others, and then the usual steps taken for a checkin. Sound worth it to people? Not really needed so go back and do your homework, Brett? What? -Brett ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] [ python-Bugs-1124637 ] test_subprocess is far too slow (fwd)
Guido van Rossum wrote: The Python test suite already has a way (the -u flag) to distinguish between regular broad-coverage testing and deep coverage for specific (or all) areas. Let's keep the really long-running tests out of the regular test suite. There used to be a farm of machines that did nothing but run the test suite (snake-farm). This seems to have stopped (it was run by volunteers at a Swedish university). Maybe we should revive such an effort, and make sure it runs with -u all. Hello Guido and everybody else, I hacked together a simple distributed unittest runner for our projects. Requirements are a NFS-mounted home directory across the slave nodes and SSH-based automatic authentication, i.e. no passwords or passphrases necessary. It officially works-for-me for around three hosts (see below) so that cuts the time down basically to a third (real-life example ~600 seconds to ~200 seconds, so it does work :-). It also supports serialized tests, i.e. tests that must be run one after the other and cannot be run in parallel. http://mde.abo.fi/tools/disttest/ Comes with some problems; my blurb from advogato.org: Disttest is a distributed unittesting runner. You simply set the DISTTEST_HOSTS variable to a space-separated list of hostnames to connect to using SSH, and then run disttest. The nodes must all have the same filesystem (usually an NFS-mounted /home) and have the Disttest program installed. You even gain a bit with just one computer by setting the variable to localhost localhost. :-) There are currently two annoying problem with it, though. For some reason, 1) the unittest program connecting to the X server sometimes fails to provide the correct authentication, and 2) sometimes the actual connection to the X server can't be established. I think these are related to 1) congestion on the shared .Xauthority file, and 2) a too small listen() queue on the forwarding port by the SSH daemon. Both problems show up when using too many (over 4?) hosts, which is the whole point of the program! Sigh. Error checking probably bad. Anyway, feel free to check it out, modify, comment or anything. We're thinking of checking the assumptions in the blurb above, but no timetable is set. My guess is that the NFS-mounted home directory is the showstopper and people usually don't have lot's of machines hanging around, but that's for you to decide. Disclaimer: I don't know anything of CPython development nor of the tests in the CPython test suite. ;-) Best regards, and a big thank you for Python, Marcus ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
RE: [Python-Dev] Windows Low Fragementation Heap yields speedup of ~15%
Hi, what immediately comes to mind are Modules/cPickle.c and Modules/cStringIO.c, which (I believe) are heavily used by ZODB (which in turn is heavily used by the application). The lists also get fairly large, although not huge - up to typically 5 (complex) objects in the tests I've measured. As I said, I don't speak C, so I can only speculate - do the lists at some point grow beyond the upper limit of obmalloc, but are handled by the LFH (which has a higher upper limit, if I understood Tim Peters correctly)? Best regards, Martin -Original Message- From: Evan Jones [mailto:[EMAIL PROTECTED] Sent: Thursday, 17 Feb 2005 02:26 To: Python Dev Cc: Gfeller Martin; Martin v. Löwis Subject: Re: [Python-Dev] Windows Low Fragementation Heap yields speedup of ~15% On Feb 16, 2005, at 18:42, Martin v. Löwis wrote: I must admit that I'm surprised. I would have expected that most allocations in Python go through obmalloc, so the heap would only see large allocations. It would be interesting to find out, in your application, why it is still an improvement to use the low-fragmentation heaps. Hmm... This is an excellent point. A grep through the Python source code shows that the following files call the native system malloc (I've excluded a few obviously platform specific files). A quick visual inspection shows that most of these are using it to allocate some sort of array or string, so it likely *should* go through the system malloc. Gfeller, any idea if you are using any of the modules on this list? If so, it would be pretty easy to try converting them to call the obmalloc functions instead, and see how that affects the performance. Evan Jones Demo/pysvr/pysvr.c Modules/_bsddb.c Modules/_curses_panel.c Modules/_cursesmodule.c Modules/_hotshot.c Modules/_sre.c Modules/audioop.c Modules/bsddbmodule.c Modules/cPickle.c Modules/cStringIO.c Modules/getaddrinfo.c Modules/main.c Modules/pyexpat.c Modules/readline.c Modules/regexpr.c Modules/rgbimgmodule.c Modules/svmodule.c Modules/timemodule.c Modules/zlibmodule.c PC/getpathp.c Python/strdup.c Python/thread.c ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Windows Low Fragementation Heap yields speedup of ~15%
[Gfeller Martin] what immediately comes to mind are Modules/cPickle.c and Modules/cStringIO.c, which (I believe) are heavily used by ZODB (which in turn is heavily used by the application). I probably guessed right the first time wink: LFH doesn't help with the lists directly, but helps indirectly by keeping smaller objects out of the general heap where the list guts actually live. Say we have a general heap with a memory map like this, meaning a contiguous range of available memory, where 'f' means a block is free. The units of the block don't really matter, maybe one 'f' is one byte, maybe one 'f' is 4MB -- it's all the same in the end: fff Now you allocate a relatively big object (like the guts of a large list), and it's assigned a contiguous range of blocks marked 'b': bbb Then you allocate a small object, marked 's': bbbsfff The you want to grow the big object. Oops! It can't extend the block of b's in-place, because 's' is in the way. Instead it has to copy the whole darn thing: fffsbbb But if 's' is allocated from some _other_ heap, then the big object can grow in-place, and that's much more efficient than copying the whole thing. obmalloc has two primary effects: it manages a large number of very small (= 256 bytes) memory chunks very efficiently, but it _also_ helps larger objects indirectly, by keeping the very small objects out of the platform C malloc's way. LFH appears to be an extension of the same basic idea, raising the small object limit to 16KB. Now note that pymalloc and LFH are *bad* ideas for objects that want to grow. pymalloc and LFH segregate the memory they manage into blocks of different sizes. For example, pymalloc keeps a list of free blocks each of which is exactly 64 bytes long. Taking a 64-byte block out of that list, or putting it back in, is very efficient. But if an object that uses a 64-byte block wants to grow, pymalloc can _never_ grow it in-place, it always has to copy it. That's a cost that comes with segregating memory by size, and for that reason Python deliberately doesn't use pymalloc in several cases where objects are expected to grow over time. One thing to take from that is that LFH can't be helping list-growing in a direct way either, if LFH (as seems likely) also needs to copy objects that grow in order to keep its internal memory segregated by size. The indirect benefit is still available, though: LFH may be helping simply by keeping smaller objects out of the general heap's hair. The lists also get fairly large, although not huge - up to typically 5 (complex) objects in the tests I've measured. That's much larger than LFH can handle. Its limit is 16KB. A Python list with 50K elements requires a contiguous chunk of 200KB on a 32-bit machine to hold the list guts. As I said, I don't speak C, so I can only speculate - do the lists at some point grow beyond the upper limit of obmalloc, but are handled by the LFH (which has a higher upper limit, if I understood Tim Peters correctly)? A Python list object comprises two separately allocated pieces of memory. First is a list header, a small piece of memory of fixed size, independent of len(list). The list header is always obtained from obmalloc; LFH will never be involved with that, and neither will the system malloc. The list header has a pointer to a separate piece of memory, which contains the guts of a list, a contiguous vector of len(list) pionters (to Python objects). For a list of length n, this needs 4*n bytes on a 32-bit box. obmalloc never manages that space, and for the reason given above: we expect that list guts may grow, and obmalloc is meant for fixed-size chunks of memory. So the list guts will get handled by LFH, until the list needs more than 4K entries (hitting the 16KB LFH limit). Until then, LFH probably wastes time by copying growing list guts from size class to size class. Then the list guts finally get copied to the general heap, and stay there. I'm afraid the only you can know for sure is by obtaining detailed memory maps and analyzing them. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] license issues with profiler.py and md5.h/md5c.c
On Wed, 2005-02-16 at 22:53 -0800, Gregory P. Smith wrote: fyi - i've updated the python sha1/md5 openssl patch. it now replaces the entire sha and md5 modules with a generic hashes module that gives access to all of the hash algorithms supported by OpenSSL (including appropriate legacy interface wrappers and falling back to the old code when compiled without openssl). https://sourceforge.net/tracker/index.php?func=detailaid=1121611group_id=5470atid=305470 I don't quite like the module name 'hashes' that i chose for the generic interface (too close to the builtin hash() function). Other suggestions on a module name? 'digest' comes to mind. I just had a quick look, and have these comments (psedo patch review?). Apologies for the noise on the list... DESCRIPTION === This patch keeps the current md5c.c, md5module.c files and adds the following; _hashopenssl.c, hashes.py, md5.py, sha.py. The old md5 and sha extension modules get replaced by hashes.py, md5.py, and sha.py python modules that leverage off _hash (openssl) or _md5 and _sha (no openssl) extension modules. The new _hash extension module wraps the high level openssl EVP interface, which uses a string parameter to indicate what type of message digest algorithm to use. The advantage of this is it makes all openssl supported digests available, and if openssl adds more, we get them for free. A disadvantage of this is it is an abstraction level above the actual md5 and sha implementations, and this may add overheads. These overheads are probably negligible compared to the actual implementation speedups. The new _md5 and _sha extension modules are simply re-named versions of the old md5 and sha modules. The hashes.py module acts as an import wrapper for _hash, and falls back to using _md5 and _sha modules if _hash is not available. It provides an EVP style API (string hash name parameter), that supports only md5 and sha hashes if openssl is not available. The new md5.py and sha.py modules simply use hash.py. COMMENTS The introduction of a hashes module with a new API that supports many different digests (provided openssl is available) is extending Python, not just fixing the licenses of md5 and sha modules. If all we wanted to do was fix the md5 module, a simpler solution would be to change the md5c.c API to match openssl's implementation, and make md5module.c use it, conditionally compiling against md5c.c or linking against openssl in setup.py. A similar approach could be used for sha, but would require stripping the sha implementation out of shamodule.c I am mildly of concerned about the namespace/filespace clutter introduced by this implementation... it feels unnecessary, as does the tangled dependencies between them. With openssl, hashes.py duplicates the functionality of _hash. Without openssl, md5.py and sha.py duplicate _md5 and _sha, via a roundabout route through hash.py. The python wrappers seem overly complicated, with things like def new(name, string=None): if string: return _hash.new(name) else: return _hash.new.(name,string) being common where the following would suffice; def new(name,string=): return _hash.new(name,string) I think this is because _hash.new() uses an optional string parameter, but I have a feeling a C update with a zero length string is faster than this Python if. If it was a concern, the C implementation could check the value of the string length before calling update. Given the convenience methods for different hashes in hashes.py (which incidentally look like they are only available when _hash is not available... something else that needs fixing), the md5.py module could be simply coded as; from hashes import md5 new = md5 Despite all these nit-picks, it looks pretty good. It is orders of magnitude better than any of the other non-existent solutions, including the one I didn't code :-) -- Donovan Baarda [EMAIL PROTECTED] http://minkirri.apana.org.au/~abo/ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] Prospective Peephole Transformation
Based on some ideas from Skip, I had tried transforming the likes of x in (1,2,3) into x in frozenset([1,2,3]). When applicable, it substantially simplified the generated code and converted the O(n) lookup into an O(1) step. There were substantial savings even if the set contained only a single entry. When disassembled, the bytecode is not only much shorter, it is also much more readable (corresponding almost directly to the original source). The problem with the transformation was that it didn't handle the case where x was non-hashable and it would raise a TypeError instead of returning False as it should. That situation arose once in the email module's test suite. To get it to work, I would have to introduce a frozenset subtype: class Searchset(frozenset): def __contains__(self, element): try: return frozenset.__contains__(self, element) except TypeError: return False Then, the transformation would be x in Searchset([1, 2, 3]). Since the new Searchset object goes in the constant table, marshal would have to be taught how to save and restore the object. This is a more complicated than the original frozenset version of the patch, so I would like to get feedback on whether you guys think it is worth it. Raymond Hettinger ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com