[issue10160] operator.attrgetter slower than lambda after adding dotted names ability
Χρήστος Γεωργίου (Christos Georgiou) t...@users.sourceforge.net added the comment: This is not the proper place for it, but in the 3.2 and 2.7 news it is reported that “The multi-argument form of operator.attrgetter() function now runs slightly faster” while it should be “The multi-argument form of operator.attrgetter() function now runs slightly faster and the single-argument form runs much faster”. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue10160 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue10160] operator.attrgetter slower than lambda after adding dotted names ability
Antoine Pitrou pit...@free.fr added the comment: The PyUnicode_GET_SIZE issue was still there, but I've fixed it and committed in r86036. Thanks for your contribution! -- resolution: - fixed stage: patch review - committed/rejected status: open - closed ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue10160 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue10160] operator.attrgetter slower than lambda after adding dotted names ability
Antoine Pitrou pit...@free.fr added the comment: Some comments about the patch: - this seems to be some dead code: if (nattrs = 1) { if (!PyArg_UnpackTuple(args, attrgetter, 1, 1, attr)) return NULL; - you can't call PyUnicode_GET_SIZE or PyUnicode_AS_UNICODE before you have called PyUnicode_Check. If the object is not an unicode object, it triggers an assertion in debug mode: python: ./Modules/operator.c:404: attrgetter_new: Assertion `((PyObject*)(item))-ob_type))-tp_flags ((1L28))) != 0)' failed. - you should also call PyUnicode_InternInPlace() on the last dotless name (after the for loop) - this comment looks false: /* attrgetter_new code should ensure we never come here */ - in dotted_getattr, when attr is a tuple, you leak references to intermediate objects (because you don't decref obj before doing newobj = obj) Other than that, looks quite worthwhile. -- nosy: +pitrou ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue10160 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue10160] operator.attrgetter slower than lambda after adding dotted names ability
Χρήστος Γεωργίου (Christos Georgiou) t...@users.sourceforge.net added the comment: Thank you very much, Antoine, for your review. My comments in reply: - the dead code: it's not dead, IIRC it ensures that at least one argument is given, otherwise it raises an exception. - PyUnicode_GET_SIZE: you're right. The previous patch didn't have this problem, because there were two loops: the first one made sure in advance that all arguments are PyUnicode. - the false comment: right again. A remain from the first patch. - dotted_getattr and references: right! I should have noted better what Raymond's initial loop did. Attached a corrected version of the patch according to Antoine's comments. -- Added file: http://bugs.python.org/file19440/issue10160.diff ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue10160 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue10160] operator.attrgetter slower than lambda after adding dotted names ability
Raymond Hettinger rhettin...@users.sourceforge.net added the comment: The patch looks fine. You're overview of the process presented here in the tracker would make a nice comment in the code. If Antoine is happy with the latest patch, then it's ready to apply. -- assignee: rhettinger - pitrou ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue10160 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue10160] operator.attrgetter slower than lambda after adding dotted names ability
Χρήστος Γεωργίου (Christos Georgiou) t...@users.sourceforge.net added the comment: A newer version of the patch with the following changes: - single loop in the ag-attr setup phase of attrgetter_new; interning of the stored attribute names - added two more tests of invalid attrgetter parameters (.attr, attr.) -- Added file: http://bugs.python.org/file19339/issue10160-2.diff ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue10160 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue10160] operator.attrgetter slower than lambda after adding dotted names ability
Terry J. Reedy tjre...@udel.edu added the comment: I am not sure whether I should attach a zip/tar file with both the attachments (the sample benchmark and the diff); so I'll attach the diff in a further comment. Uploading separate files with .py, .diff extensions that can be viewed in a browser is indeed the right thing to do.\ -- nosy: +terry.reedy ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue10160 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue10160] operator.attrgetter slower than lambda after adding dotted names ability
Χρήστος Γεωργίου (Christos Georgiou) t...@users.sourceforge.net added the comment: Modules/operator.c grows by ~70 lines, most of it the setup code for ag-attr; also I loop twice over the args of attrgetter_new, choosing fast code that runs once per attrgetter creation than temporary data. Alex's suggestion to make use of Python-level functions to shorten the code of attrgetter_new could obviously work to decrease the source lines. I don't know how fast I would produce such a version if requested, though. Whatever the way attrgetter_new sets up the data, I would suggest that you keep the logic changes in general, i.e. set-up in attrgetter_new and keep a thinner dotted_getattr , since it avoids running the same checks and splitting over and over again for every attrgetter_call invocation. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue10160 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue10160] operator.attrgetter slower than lambda after adding dotted names ability
Changes by Éric Araujo mer...@netwok.org: -- components: -Documentation nosy: -d...@python ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue10160 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue10160] operator.attrgetter slower than lambda after adding dotted names ability
New submission from Χρήστος Γεωργίου (Christos Georgiou) t...@users.sourceforge.net: (Discovered in that StackOverflow answer: http://stackoverflow.com/questions/3940518/3942509#3942509 ; check the comments too) operator.attrgetter in its simplest form (i.e. with a single non-dotted name) needs more time to execute than an equivalent lambda expression. Attached file so3940518.py runs a simple benchmark comparing: a list comprehension of plain attribute access; attrgetter; and lambda. I will append sample benchmark times at the end of the comment. Browsing Modules/operator.c, I noticed that the dotted_getattr function was using PyUnicode_Check and (possibly) splitting on dots on *every* call of the attrgetter, which I thought to be most inefficient. I changed the py3k-daily-snapshot source to make the PyUnicode_Check calls in the attrgetter_new function; also, I modified the algorithm to pre-parse the operator.attrgetter functions for possible dots in the names, in order for the dotted_getattr function to become speedier. The only “drawback” is that now operator.attrgetter raises a TypeError on creation, not on subsequent calls of the attrgetter object; this shouldn't be a compatibility problem. However, I obviously had to update both Doc/library/operator.rst and Lib/test/test_operator.py . I am not sure whether I should attach a zip/tar file with both the attachments (the sample benchmark and the diff); so I'll attach the diff in a further comment. On the Ubuntu server 9.10 where I made the changes, I ran the so3940518.py sample benchmark before and after the changes. Run before the changes (third column is seconds, less is better): list comp 0.40925 100 map attrgetter 1.3897 100 map lambda 1.0098 100 Run after the changes: list comp 0.40036 100 map attrgetter 0.5196 100 map lambda 0.96 100 -- assignee: d...@python components: Documentation, Library (Lib), Tests files: so3940518.py messages: 119247 nosy: d...@python, tzot priority: normal severity: normal status: open title: operator.attrgetter slower than lambda after adding dotted names ability versions: Python 3.2 Added file: http://bugs.python.org/file19310/so3940518.py ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue10160 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue10160] operator.attrgetter slower than lambda after adding dotted names ability
Χρήστος Γεωργίου (Christos Georgiou) t...@users.sourceforge.net added the comment: Here comes the diff to Modules/operator.c, Doc/library/operator.rst and Lib/test/test_operator.py . As far as I could check, there are no leaks, but a more experienced eye in core development could not hurt. Also, obviously test_operatory.py passes all tests. Should this be accepted, I believe it should be backported to 2.7 (at least). I can do that, just let me know. -- keywords: +patch Added file: http://bugs.python.org/file19312/issue10160.diff ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue10160 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue10160] operator.attrgetter slower than lambda after adding dotted names ability
Changes by Benjamin Peterson benja...@python.org: -- assignee: d...@python - rhettinger nosy: +rhettinger ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue10160 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue10160] operator.attrgetter slower than lambda after adding dotted names ability
Changes by Χρήστος Γεωργίου (Christos Georgiou) t...@users.sourceforge.net: Removed file: http://bugs.python.org/file19312/issue10160.diff ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue10160 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue10160] operator.attrgetter slower than lambda after adding dotted names ability
Χρήστος Γεωργίου (Christos Georgiou) t...@users.sourceforge.net added the comment: Newer version of the diff, since I forgot some if(0) fprintf debug calls that shouldn't be there. -- Added file: http://bugs.python.org/file19313/issue10160.diff ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue10160 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue10160] operator.attrgetter slower than lambda after adding dotted names ability
Χρήστος Γεωργίου (Christos Georgiou) t...@users.sourceforge.net added the comment: An explanation to the changes. The old code kept the operator.itemgetter arguments in the ag-attr member. If the argument count (ag-nattrs) was 1, the single argument was kept; if more than 1, a tuple of the original arguments was kept. On every attrgetter_call call, if ag-nattrs was 1, dotted_getattr was called with the plain ag-attr as attribute name; if 2, dotted_getattr was called for every one of the original arguments. Now, ag-attr is always a tuple, containing either dotless strings or tuples of dotless strings: operator.attrgetter(name1, name2.name3, name4) stores (name1, (name2, name3), name4) in ag-attr. dotted_getattr accordingly chooses based on type (either str or tuple, ensured by attrgetter_new) whether to do a single access or a recursive one. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue10160 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue10160] operator.attrgetter slower than lambda after adding dotted names ability
Raymond Hettinger rhettin...@users.sourceforge.net added the comment: Thanks for noticing this. I wasn't aware that it had slowed down after dotted name support had been added. Since it is a mild performance issue, I'm lowering the priority. Will take a further look when I get the chance. A first look at the patch shows that it is bigger than I expected. Isn't it possible to check for a dot on construction and just record the boolean so that a fast, non-splitting path can be used. I'm reluctant to make the code volume grow much for this function. It is already a bit voluminous for the minor convenience it offers. -- priority: normal - low stage: - patch review type: - performance ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue10160 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue10160] operator.attrgetter slower than lambda after adding dotted names ability
Alex alex.gay...@gmail.com added the comment: Voice of ignorance here: why can't this be implemented in the naive way one might in Python, use the existing string splitting algorithms of stringlib, but just leave it in __new__. -- nosy: +alex ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue10160 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com