[issue13031] small speed-up for tarfile.py when unzipping tarballs
Justin Peel added the comment: poq, You're quite right. I've added that change too. By the way, four unnecessary extra tuples are no longer being created for each call to this function too because of these changes. -- Added file: http://bugs.python.org/file23232/cpython_tarfile2.diff ___ Python tracker <http://bugs.python.org/issue13031> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue13031] [PATCH] small speed-up for tarfile.py when unzipping tarballs
Changes by Justin Peel : -- type: -> performance ___ Python tracker <http://bugs.python.org/issue13031> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue13031] [PATCH] small speed-up for tarfile.py when unzipping tarballs
New submission from Justin Peel : Attached small diff speeds up extracting a gzipped tarball on my machine using python 3.2 by 3-5%. It will probably be a larger percentage on machines that have faster hard drives (mine is 5400rpm). Basically, the changes speed up calculating the checksums by only doing one slice rather than four and call struct.unpack twice rather than four times. We are able to use less unpack calls because 'x' means to skip a byte. -- components: Library (Lib) files: cpython_tarfile.diff keywords: patch messages: 144436 nosy: jpeel priority: normal severity: normal status: open title: [PATCH] small speed-up for tarfile.py when unzipping tarballs versions: Python 2.7, Python 3.2 Added file: http://bugs.python.org/file23229/cpython_tarfile.diff ___ Python tracker <http://bugs.python.org/issue13031> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue10667] collections.Counter object in C
Justin Peel added the comment: Okay, I was done submitting. I thought that Antoine was asking for a version that kept a pure Python version so I did that. -- ___ Python tracker <http://bugs.python.org/issue10667> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue10667] collections.Counter object in C
Justin Peel added the comment: I've done as Antoine asked and made a pure Python PyCounter class and a C-enhanced Counter class that both use the mixin CounterBase class. I also added to the tests so that both PyCounter and Counter are tested. I left the update_fromsubs() method in Counter for now, but haven't made a Python equivalent function for PyCounter in case the method is rejected. Also, I realized that my build was still in debug mode so those benchmark weren't quite accurate (Oops). The Counter class is only 4-6x faster for most cases. That's still good, but not quite as good as I'd previously reported. Lastly, I thought of one more method that could be quite useful. Say that you have a list like [('apples',4),('oranges',5),('apples',8)] where the second element in each tuple indicates how many of them there are. The resultant Counter should have {'apples': 12, 'oranges':5}. We don't really have a good way to count this, but we could add a small method like the following: def fromitems(self, items): for key, value in items: self[key] += value and I could easily make a C function for it. I think that this simple method could be very useful to some people. What do you all think of this addition? -- Added file: http://bugs.python.org/file20004/counterpatch2.diff ___ Python tracker <http://bugs.python.org/issue10667> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue10667] collections.Counter object in C
New submission from Justin Peel : I put the Counter's update and __missing__ methods into C code. The rest of the existing methods remain the same. The new Counter is at least 2x-10x (or more) faster for updating depending on the input. I also added a new method, update_fromsubs, which is documented below the timings for updating a Counter. Here are a few examples to demonstrate the speed of the new update function: Counting list(range(100))*100 - Old Counter: 18.6 msec New Counter: 1.43 13x speed-up Counting range(1) - Old Counter: 21.5 msec New Counter: 3.84 msec 5.6x speed-up Counter generator i % 100 for i in range(1) - Old Counter: 36.7 msec New Counter: 17.5 msec 2.1x speed-up and the __missing__ method being in C makes getting missing keys about twice as fast. Also, I added a new method, update_fromsubs, which counts subarrays/substrings of a sequence. This method only works with immutable sequences like tuples, bytes, and unicode objects. The method is of the form update_fromsubs(seq, frame[, step[, lo[, hi]]]) where frame is the size of the subarrays, step is how far to move forward after each subarray, and lo and hi are the respective starting and ending indices to count subarrays within the sequence. For instance, c = Counter() c.update_fromsubs('abcd', 2) yields Counter({'cd': 1, 'ab': 1, 'bc': 1}) and d = Counter() d.update_fromsubs('abcd', 2, 2) yields Counter({'ab': 1, 'cd: 1}) These sorts of operations could be done with generators in a manner like Counter(s[i:i+2] for i in range(0, len(s)-1)) but it can be about twice as fast by using update_fromsubs. Here's an example Counting subarrays of length 2 of "ab"*1: update_fromsubs:30.8 ms New Counter w/ generator: 54.3 ms Old Counter w/ generator: 98.8 ms This method would probably be most useful in processing strings, but there may be other uses. If it is accepted, please feel to change the method name and parameters. I especially wasn't sure what to call this method (update_fromsubsequences seemed rather long). -- components: Extension Modules, Library (Lib) files: counterpatch.diff keywords: patch messages: 123706 nosy: jpeel, rhettinger priority: normal severity: normal status: open title: collections.Counter object in C type: performance versions: Python 3.3 Added file: http://bugs.python.org/file19992/counterpatch.diff ___ Python tracker <http://bugs.python.org/issue10667> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com