Milko Krachounov <pyt...@milko.3mhz.net> added the comment: I've been bugged by the lack of key= argument for bisect for some time now, and today I got to read this and the previous issues about the matter. I still fail to understand the reasons for the rejections. It might encourage bad design in which expensive key functions will get called more than once for the same element. However:
1. Not all key= functions are computationally expensive, in fact a quick search shows that most key= uses come are with itemgetter, attrgetter or some cousin. 2. Even for a computationally expensive function precached key isn't necessarily the right answer. And if the keys are huge, a key= function can actually be useful in implementing intelligent caching. 3. The reason for rejection is merely a performance issue which *can* be fixed without any significant changes to the design of the app should it prove to be a real issue. I was writing a module that keeps a sorted list of items in which items can be added, removed, or changed (they might get reordered). I used bisect to avoid useless slow linear searches, which would be one obstacle for the app to scale well. However, the sort key can be changed at any time, so implementing __lt__ wasn't an option. Keeping a second list with the keys is both memory and computationally expensive as inserting into a list is slow. It's not a shiny example of a bisect use-case as the only thing I've used my module so far is to display the files in a directory, which means the lists are small and the changes are seldom, so any implementation is good enough. But bisect with key= would have been best if it was available. What's worse, I have a buggy unreadable ad hoc implementation of bisect in my code right now. ;) I see the following as reasons why key= would provide benefit: 1. If you have a sorted list, bisect is a net win. Having a key= would enable you to utilize it without refactoring anything. The lack of key may as well encourage you to continue using linear searches, or other sub-optimal solutions. 2. Using key= is more readable and less error-prone than keeping two lists in simple cases where defining a class with __le__ is an overkill. Two examples I had where this was the case: a) A class I implemented to pick numbers from a certain discrete random distribution with bisect. Oh, but yeah, the implementation without key= is twice faster. b) I gave a hand to someone who was implementing a dictionary looked up the words as you typed them with bisect. 3. Insort. Insort is already slow enough as it is, a key= wouldn't slow it down much if at all. In fact, if you are keeping two lists, key= would speed it up. Insort with key= is a net win. I've implemented key= and quickly hacked a benchmark to show what performance hit it might have, and to put things into perspective. The results: 1. Cheap key function, integers and abs(), 1000000 items, 5000 bisects or insorts: a) Bisect with a key: 0.02205014s b) Bisect with a second list: 0.01328707s c) Insort with a key: 5.83688211s d) Bisect with a second list, and two inserts: 16.17302299s 2. Cheap key function, ~4000 char bytestrings and len(), 100000 items, 5000 bisects or insorts: a) Bisect with a key: 0.01829195s b) Bisect with a second list: 0.00945401s c) Insort with a key: 0.25511408s d) Bisect with a second list, and two inserts: 0.49303603s 3. Expensive key function, ~4000 char bytestrings and str.lower(), 100000 (500 MB) items, 5000 bisects or insorts: a) Bisect with a key: 1.26837015s b) Bisect with a second list: 0.08390594s c) Insort with a key: 1.50406289s d) Bisect with a second list, and two inserts: 0.57737398s 4. Expensive key function, ~4000 char bytestrings and str.lower(), 500000 (2.5 GB) items, 5000 bisects or insorts: a) Bisect with a key: 1.46136308s b) Bisect with a second list: 0.08768606s c) Insort with a key: 3.05218720s d) Bisect with a second list, and two inserts: 6.43227386s 5. Expensive key function, ~4000 char bytestrings and str.strip(), 100000 (500 MB) items, 5000 bisects or insorts: a) Bisect with a key: 0.03311396s b) Bisect with a second list: 0.01374602s c) Insort with a key: 0.27423000s d) Bisect with a second list, and two inserts: 0.49585080s 6. Expensive key function, ~4000 char bytestrings and str.strip(), 500000 (2.5 GB) items, 5000 bisects or insorts: a) Bisect with a key: 0.04530501 b) Bisect with a second list: 0.01912594 c) Insort with a key: 1.62209797 d) Bisect with a second list, and two inserts: 5.91734695 Also, I tried to bench linear searches, but as they had to run in Python code they aren't representative of anything. In the integer test they went for about 250 seconds without recalculating the key, and for about 500 with. Also, replacing them with .index() gave about 60 seconds if I ensured there's high probability that the number is in the list, and for 500 if I didn't. In short, key= for bisect would be convenient and neat, really useful in rare cases, leading to more readable code in the most common cases, and I'm unconvinced of the perceived harm that it would cause. I attach a patch implementing it for trunk, py3k, and the benchmark script I used to test it (on trunk). ---------- keywords: +patch nosy: +milko.krachounov Added file: http://bugs.python.org/file15248/bisect-trunk.patch _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue4356> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com