Re: [Python-ideas] a sorting protocol dunder method?
On 12/04/2017 10:01 AM, brent bejot wrote: I think we can all agree that defining __key__ and calling "sorted(list_of_attrdicts)" has that syntactic sugar that is oh-so-sweet-and-tasty. Just remember that too much syntactic sugar can give you cancer of the semicolon. -- Greg ___ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] a sorting protocol dunder method?
wow! a few time zones (and a day job) really make a difference to taking part in a discussion :-) Thanks for all the feedback. From what I can tell, we don't have a consensus, though It's looking pretty unlikely that this is going to be adopted (though maybe the decorator idea). But I'm going to go through and try to summarize what we (I?) have learned, and what does seem to be agreed upon, or not. If I misinterpret something you said, or take a quote out of context, please correct it, but trust that I didn't do it on purpose Also, it's kind of a pain to do a digest like this and properly attribute everyone, so mostly I'm not going to attribute the quotes... So: has this already been brought up and rejected? > https://bugs.python.org/issue20632 Thanks Serhiy -- I didn't think to look in the Bug DB. This does remind me that it would be good to have a place (maybe in a mets-PEP?) to put topics that have been discussed, but didn't get as far as anyone writing a PEP... An existence proof: in NLTK, an __lt__ method added purely to facilitate > consistent sorting (in doctests) of structured data objects for which > comparison operators do not really make conceptual sense: > https://github.com/nltk/nltk/pull/1902/files#diff- > 454368f06fd635b1e06c9bb6d65bd19bR689 > Granted, calling min() and max() on collections of these objects would > not make conceptual sense either. Still, __sort_key__ would have been > cleaner than __lt__. So nice to know I'm not the only one that wants (needs) to provide a sort order be default -- though it doesn't mean it's not a niche issue anyway. By default, we sort by the inherent order of the values. But if the > values have no inherent order (they are unordered), we can sort > unordered items in a collection by providing an appropriate key > function. Hence why I say they are loosely related. > It > doesn't make sense to put that functionality into the complex numbers > themselves: complex numbers are unordered, and any order we impose on > them comes from the collection, not the individual numbers. Sure -- and that's why we definitely still need a key option for the sorting functions, and why not all classes should define a sort order. But for many classes, it does make sense to define a default sort order. Take something as simple as strings -- they have a default sort order, but a user very well might want to sort them differently -- say case-insensitive, or ... So I think there is consensus here (and no one was proposing otherwise): - Defining a sort order is NOT required of all classes - The sorting methods definitely need to allow a custom key function. the comparison operators > apply to the values in the collection; the key function applies to the > collection itself But many (most) objects DO provide a default sort order, by means of the comparison methods. So the idea that objects have a default sort order is already pretty well accepted :-) But that assumes that there's only ever one way to order a collection of > unordered items. no,. it doesn't -- it implies there is a default way, which is what is done already for most types. And again, some classes shouldn't even have a default. I'm not sure I understand the motivation to make elements *sortable* but > not comparable. If an arbitrary order is still useful, I'd think you'd want > to be able to tell how two particular elements *would* sort by asking a yeah, I'm not sure about this -- going back to the example of complex numbers, I like the idea of providing a default sort order but not make the declaration that this is how the objects SHOULD be compared -- after all, you can provide your own key function, but not so easily re-define the comparisons... > Is sorting-related functionally too special-case to deserve a protocol? > > Yes, it is too special-case. I don't see any advantages in comparison with > defining the __lt__ method. It will be rather confusing if different > methods of sorting produce inconsistent order. If the first item of the > sorted list is not the smallest item of the list. well, yeah, but not any different than the fact that you can inconsitently define the comparison operators already. What I wasn't realizing is that you only need to define __lt__ (and __eq__) to get sortability. Maybe it would be good to document that more prominently somewhere (if it's documented at all, other than the source). But the idea of the class decorator looks more sane to me. yeah, I'm liking that. Most often when I define equality and comparison dunder methods for a > custom class, I'm effectively just deferring the comparison to some > field or tuple of fields of the object. Exactly -- which may be an argument for a decorator, rather than a dunder method -- particularly if performance isn't improved by the dunder method. What if we added a @key_ordering decorator, like @total_ordering but > using __key__ to generate the comparisons? This could be a good idea -- just
[Python-ideas] Data Classes in Kotlin
I thought this might be interesting input for the discussions about "data classes" in Python: https://kotlinlang.org/docs/reference/data-classes.html I think that the use of "data class" as syntax is kind of cool, but what really matters is the semantics they chose for Kotlin. -- Juancarlo *Añez* ___ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] a sorting protocol dunder method?
On Tue, Dec 5, 2017 at 8:54 AM, Julien Salort wrote: > Le 04/12/2017 à 14:16, Steven D'Aprano a écrit : > >> We're taking something which belongs in the report generator or >> collection, the knowledge of how to sort a collection of unordered >> values, and baking it into the values themselves. (Effectively making >> them ordered!) > > It is also possible to use this __key__ method for classes for which the > ordering > is indeed unambiguously defined, e.g.: > > class MyValue: > > def __init__(self, value, comment): > self.value = value > self.comment = comment > > def __key__(self): > return self.value > > Then it is not shocking to define a sorting key. MyValue = namedtuple('MyValue', ['value', 'comment']) Job done :) ChrisA ___ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] a sorting protocol dunder method?
Le 04/12/2017 à 14:16, Steven D'Aprano a écrit : We're taking something which belongs in the report generator or collection, the knowledge of how to sort a collection of unordered values, and baking it into the values themselves. (Effectively making them ordered!) It is also possible to use this __key__ method for classes for which the ordering is indeed unambiguously defined, e.g.: class MyValue: def __init__(self, value, comment): self.value = value self.comment = comment def __key__(self): return self.value Then it is not shocking to define a sorting key. ___ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] a sorting protocol dunder method?
On Tue, Dec 5, 2017 at 8:22 AM, Barry wrote: > > >> On 4 Dec 2017, at 20:12, Antoine Pitrou wrote: >> >> On Mon, 4 Dec 2017 19:37:02 + >> Barry Scott wrote: >>> I wondered what the performance would be and tested the following code: >>> >> [...] >>> >>> it outputs this for my with python 3.6.0 >>> >>> 1 >>> key 0.010628s 1 calls >>> lt 0.053690s 119886 calls >>> >>> It seems that providing a key is ~5 times faster the depending on __lt__. >>> (I even used a short circuit to se if __lt__ could beat key). >> >> Thanks for taking the time to write a benchmark. I'm not surprised >> by the results (and your __lt__ method isn't even complicated: the gap >> could be very much wider). There is more to Python performance than >> aggregate big-O algorithmic complexity. > > I was surprised by the huge difference. I was expecting a closer race. > > For the record I think that a __sort_key__ is not a good idea as it is so > easy to > do as I did and define key methods on the class, without the limit of one such > method. The numbers here are all fairly small, but they make a lot of sense. Calls into Python code are potentially quite slow, so there's a LOT of benefit to be had by calling Python code once per object instead of N log N times for the comparisons. Increasing the length of the list will make that difference even more pronounced. But that's an argument for using a key function, not for having a __key__ special method. ChrisA ___ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] a sorting protocol dunder method?
> On 4 Dec 2017, at 20:12, Antoine Pitrou wrote: > > On Mon, 4 Dec 2017 19:37:02 + > Barry Scott wrote: >> I wondered what the performance would be and tested the following code: >> > [...] >> >> it outputs this for my with python 3.6.0 >> >> 1 >> key 0.010628s 1 calls >> lt 0.053690s 119886 calls >> >> It seems that providing a key is ~5 times faster the depending on __lt__. >> (I even used a short circuit to se if __lt__ could beat key). > > Thanks for taking the time to write a benchmark. I'm not surprised > by the results (and your __lt__ method isn't even complicated: the gap > could be very much wider). There is more to Python performance than > aggregate big-O algorithmic complexity. I was surprised by the huge difference. I was expecting a closer race. For the record I think that a __sort_key__ is not a good idea as it is so easy to do as I did and define key methods on the class, without the limit of one such method. Barry > > Regards > > Antoine. > > > ___ > Python-ideas mailing list > Python-ideas@python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > ___ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] a sorting protocol dunder method?
On Mon, 4 Dec 2017 19:37:02 + Barry Scott wrote: > I wondered what the performance would be and tested the following code: > [...] > > it outputs this for my with python 3.6.0 > > 1 > key 0.010628s 1 calls > lt 0.053690s 119886 calls > > It seems that providing a key is ~5 times faster the depending on __lt__. > (I even used a short circuit to se if __lt__ could beat key). Thanks for taking the time to write a benchmark. I'm not surprised by the results (and your __lt__ method isn't even complicated: the gap could be very much wider). There is more to Python performance than aggregate big-O algorithmic complexity. Regards Antoine. ___ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] a sorting protocol dunder method?
I wondered what the performance would be and tested the following code: #!/usr/bin/env python3 import random import time random.seed( hash('Testing Keys') ) lt_calls = 0 key_calls = 0 class MyObject: def __init__( self, value1, value2 ): self.value1 = value1 self.value2 = value2 def __lt__(self, other): global lt_calls lt_calls +=1 if self.value1 < other.value1: return True else: return self.value2 < other.value2 def key(self): global key_calls key_calls +=1 return self.value1, self.value2 lt_list = [] random for value1 in reversed(range(1)): value2 = value1 - 50 lt_list.append( MyObject( value1, value2 ) ) random.shuffle( lt_list ) key_list = lt_list[:] print( len(lt_list) ) s = time.time() key_list.sort( key=MyObject.key ) e = time.time() - s print( 'key %.6fs %6d calls' % (e, key_calls) ) s = time.time() lt_list.sort() e = time.time() - s print( ' lt %.6fs %6d calls' % (e, lt_calls) ) it outputs this for my with python 3.6.0 1 key 0.010628s 1 calls lt 0.053690s 119886 calls It seems that providing a key is ~5 times faster the depending on __lt__. (I even used a short circuit to se if __lt__ could beat key). I often have more then one way to sort an object. Its easy for me to provide a set of key functions that meet the needs of each sort context. I'm not sure what extra value the __sort_key__ would offer over providing a key method as I did. Barry ___ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] a sorting protocol dunder method?
On 04/12/17 18:01, brent bejot wrote: I'm +1 on this idea for the most part. I agree particularly with the idea that it is better OOP for an object to access it's member variables to create the key than an external container to do so. This I'm absolutely fine with. Key methods are something to encourage. The problem that I have is that once you get beyond simple lists of number or strings, there often isn't a particularly obvious sort order, or rather there are often multiple obvious sort orders and you may want any of them. In fact I'd go so far as to suggest that there _usually_ isn't a single obvious sort order for non-trivial classes. To take a non-Python example, I'm in charge of the reading rota at my local church, and I keep a spreadsheet of readers to help me. Usually I sort that list by the date people last read the lesson, so I can quickly tell who I should ask next. When the newsletter asks me for a list of readers, though, I sort them alphabetically by surname, which most people would think of as the natural sorting order. -- Rhodri James *-* Kynesim Ltd ___ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] a sorting protocol dunder method?
On 04/12/17 18:01, brent bejot wrote: I'm +1 on this idea for the most part. I agree particularly with the idea that it is better OOP for an object to access it's member variables to create the key than an external container to do so. > and then sort like this: > sorted(list_of_attrdicts, key=AttrDict._id_order) Isn't this exactly what the operator module's itemgetter and attrgetter all ready give you? -- My fellow Pythonistas, ask not what our language can do for you, ask what you can do for our language. Mark Lawrence ___ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] a sorting protocol dunder method?
On 12/04/2017 10:01 AM, brent bejot wrote: This is certainly a good pattern to use in the current and older versions, but I think we can all agree that defining __key__ and calling "sorted(list_of_attrdicts)" has that syntactic sugar that is oh-so-sweet-and-tasty. Actually, no, we do not all agree. ;-) -- ~Ethan~ ___ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] a sorting protocol dunder method?
I'm +1 on this idea for the most part. I agree particularly with the idea that it is better OOP for an object to access it's member variables to create the key than an external container to do so. > and then sort like this: > sorted(list_of_attrdicts, key=AttrDict._id_order) This is certainly a good pattern to use in the current and older versions, but I think we can all agree that defining __key__ and calling "sorted(list_of_attrdicts)" has that syntactic sugar that is oh-so-sweet-and-tasty. > This will add an additional overhead. This will be even slower than passing the key function, since you will need to look up the __key__ method in every item. And there will be an overhead even in the case when the __key__ method is not defined. This, to me, is the only possible negative. I would be most interested to see how much of an effect this would have on real-world data that doesn't have __key__ defined. I may be new to this community but Steven D'Aprano and Antoine Pitrou, you guys bicker like my parents before they got a divorce. I'm pretty sure you're both veterans and so know how to behave yourselves. Please set the tone according to how you'd like us newbies to respond. -Brent ___ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] (no subject)
On Tue, 5 Dec 2017 02:52:44 +1100 Steven D'Aprano wrote: > On Mon, Dec 04, 2017 at 01:52:19PM +0100, Antoine Pitrou wrote: > > On Mon, 4 Dec 2017 23:16:11 +1100 > > Steven D'Aprano wrote: > > > On Mon, Dec 04, 2017 at 12:06:38PM +0100, Antoine Pitrou wrote: > > > > > > > There are definitely advantages. Sorting calls __lt__ for each > > > > comparison (that is, O(n log n) times) while __key__ would only be > > > > called once per item at the start (that is, O(n) times). > > > > > > Passing a key function doesn't magically turn a O(n log n) comparison > > > sort into a O(n) sort. > > > > Where did I say it did? > > See the text from you quoted above. You said there are "definitely > [performance] advantages" by using a key function. You then compare: > > - calling __lt__ O(n log n) times, versus > > - calling the key function O(n) times. > > This is a classic "apples versus oranges" comparison. You compare > *actually sorting the list* with *not sorting the list* and conclude > that they key function provides a performance advantage. At this point, I can only assume you are trolling by twisting my words... even though you later quote the part which explicitly clarifies that I was *not* saying what you claim I did. Why you seem to think that is contributing anything to the discussion rather than derailing it is beyond me. In any case, don't expect further responses from me. Regards Antoine. ___ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] a sorting protocol dunder method?
On Mon, Dec 04, 2017 at 01:52:19PM +0100, Antoine Pitrou wrote: > On Mon, 4 Dec 2017 23:16:11 +1100 > Steven D'Aprano wrote: > > On Mon, Dec 04, 2017 at 12:06:38PM +0100, Antoine Pitrou wrote: > > > > > There are definitely advantages. Sorting calls __lt__ for each > > > comparison (that is, O(n log n) times) while __key__ would only be > > > called once per item at the start (that is, O(n) times). > > > > Passing a key function doesn't magically turn a O(n log n) comparison > > sort into a O(n) sort. > > Where did I say it did? See the text from you quoted above. You said there are "definitely [performance] advantages" by using a key function. You then compare: - calling __lt__ O(n log n) times, versus - calling the key function O(n) times. This is a classic "apples versus oranges" comparison. You compare *actually sorting the list* with *not sorting the list* and conclude that they key function provides a performance advantage. Yes, the key function gets called O(n) times. And that's not enough to sort the list, you still have to actually sort, exactly as I said. So where do these performance advantages come from? As I said in another post, the overhead of decorating the list with the key function makes it rather unlikely that this will be faster than just sorting it. It can happen, if key(x).__lt__ is sufficiently faster than x.__lt__, but that would be unusual. > > Once you've called the key function every time, you still have to > > *actually sort*, which will be up to O(n log n) calls to __lt__ on > > whatever __key__ returned. > > Yes... and the whole point is for __key__ to return something which is > very cheap to compare, such that there are O(n) expensive calls to > __key__ and O(n log n) cheap calls to __lt__, rather than O(n log n) > expensive calls to __lt__. Read Chris' post again. The point he was making is that the class might only define __lt__ in order to support sorting, and if we allow it to define a key function instead the class can avoid adding __lt__ at all. There's no requirement or expectation that __lt__ is expensive. If you can order the values using a cheap method and an expensive method, why would you define __lt__ to use the expensive method instead of the cheap method? The point is to avoid defining __lt__ at all, and still support sorting. But even if we define both... what makes you think that x.__lt__ is expensive (if it exists) and key(x).__lt__ is cheap? It might be the other way. If they are different, there's no guarantee about which is cheaper. If they are the same, then one is redundant. > > > If __key__ is inconsistent with __lt__, it is the same error as making > > > __lt__ inconsistent with __gt__. > > > > You seem to be assuming that __key__ is another way of spelling __lt__, > > rather than being a key function. > > It isn't. Right -- you've now clarified your position. Thank you. It wasn't clear from your earlier posts. > > In any case, it isn't an error for __lt__ to be inconsistent with > > __gt__. > [...] > > Not all values have total ordering. > > As usual you should try to understand what people are trying to say > instead of imagining they are incompetent. Instead of getting your nose out of joint and accusing me of "imagining [you] are incompetent", and assuming that I didn't "try to understand what people are trying to say", how about *you* do the same? Don't assume I'm an idiot too stupid to understand your perfectly clear words, rather consider the possibility that maybe I'm reading and responding to you in good faith, but I'm not a mind-reader. If you failed to get your message across, perhaps the fault lies in your post, not my reading comprehension. In any case, whoever is to blame for the misunderstanding, the only one who can do anything about it is the writer. The writer should take responsibility for not being clear enough, rather than blaming the reader. [...] > > Defining a single comparison method is not enough to define the rest. > > How about you stick to the discussion? I'm talking about deriving > comparison methods from __key__, not from another comparison method. > Defining __key__ is definitely enough to define all comparison > methods. Here's a key function I've used: def key(string): return string.strip().casefold() Convert to a key method: def __key__(self): return self.strip().casefold() Is it your idea to define __lt__ and others like this? def __lt__(self, other): # ignoring the possibility of returning NotImplemented return self.__key__() < self.__key__() Fair enough. I'm not convinced that's going to offer definite performance advantages, but like the total_ordering decorator, presumably if we're using this, performance is secondary to convenience. Nor do I think this decorator needs to take an implicit dunder method, when it can take an explicit key function. -- Steve ___ Python-ide
Re: [Python-ideas] a sorting protocol dunder method?
On Mon, 4 Dec 2017 15:50:31 +0200 Serhiy Storchaka wrote: > > >> Yes, it is too special-case. I don't see any advantages in comparison > >> with defining the __lt__ method. > > > > There are definitely advantages. Sorting calls __lt__ for each > > comparison (that is, O(n log n) times) while __key__ would only be > > called once per item at the start (that is, O(n) times). > > It will call __lt__ for each key comparison (the same O(n log n) times), > but *in addition* it will call __key__ O(n) times. You can get the > benefit only when times of calling __key__ is much smaller than the > difference between times of calling item's __lt__ and key's __lt__, and > maybe only for large lists. Sure, that's the point: your non-trivial __key__ method reduces the instance to e.g. a simple tuple or string, and then __lt__ over those keys is cheap. > But why not just pass the key argument when > you sort the large list? For the same reason that you want __lt__ (or __eq__, or __hash__) to be defined on the type, not call it manually every time you want to make a comparison: because it's really a fundamental property of the type and it "feels" wrong to have to pass it explicitly. Also there are library routines which may sort implicitly their inputs (such as pprint, IIRC, though perhaps pprint only sorts after calling str() -- I haven't checked). > If __key__ is consistent with __lt__, then we can just use __lt__, and > don't introduce new special methods. This is ignoring all the other arguments... > The decorator idea LGTM. But it doesn't need the special __key__ method. > Just pass the key function as a decorator argument. I would find it cleaner to express it as a method in the class's body ("__key__" or anything else) rather than have to pass a function object. Also, it's easier to unit-test if it officially exists as a method... > This idea was proposed more than 3.5 years ago. Is there a PyPI package > that implements it? I don't know. I know I reimplemented such a thing for Numba (but of course didn't benefit from automatic sort() support), because I needed fast hashing and equality without implementing the corresponding methods by hand every time (*). It would be a pity to depend on a third-party package just for that. (*) see https://github.com/numba/numba/blob/master/numba/types/abstract.py#L88 Regards Antoine. ___ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] a sorting protocol dunder method?
04.12.17 14:25, Steven D'Aprano пише: On Mon, Dec 04, 2017 at 08:45:55AM +0200, Serhiy Storchaka wrote: But the idea of the class decorator looks more sane to me. The purpose of __key__ is to define a key function (not a comparison operator) for classes that aren't orderable and don't have __lt__. If you're going to then go ahead and define __lt__ and the other comparison operators, there's no point to __key__. Right. The only benefit of this decorator is that it could avoid writing a boilerplate code for simple cases. Just add @ordered_by_key(attrgetter('name', 'weight')). __key__ is not needed, just pass the key function as an argument of the decorator. Of course if it can be useful this doesn't mean that it should be included in the stdlib. It could live on PyPI. ___ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] a sorting protocol dunder method?
04.12.17 13:06, Antoine Pitrou пише: On Mon, 4 Dec 2017 08:45:55 +0200 Serhiy Storchaka wrote: 04.12.17 01:06, Chris Barker пише: So: has this already been brought up and rejected? https://bugs.python.org/issue20632 Am I imagining the performance benefits? This will add an additional overhead. This will be even slower than passing the key function, since you will need to look up the __key__ method in every item. That is a reasonable objection. However, looking up a tp_XXX slot is very cheap. But introducing a new slot is not easy. This will increase the size of the type object, break binary compatibility. According to Stefan Behnel's researches (https://bugs.python.org/issue31336) the main time of the creation of a new type is spent on initializing slots. This cost will pay every Python program, even if it doesn't use the __key__ method. There are many more common and important methods that don't have the corresponding slot (__length_hint__, __getstate__, __reduce__, __copy__, keys). Yes, it is too special-case. I don't see any advantages in comparison with defining the __lt__ method. There are definitely advantages. Sorting calls __lt__ for each comparison (that is, O(n log n) times) while __key__ would only be called once per item at the start (that is, O(n) times). It will call __lt__ for each key comparison (the same O(n log n) times), but *in addition* it will call __key__ O(n) times. You can get the benefit only when times of calling __key__ is much smaller than the difference between times of calling item's __lt__ and key's __lt__, and maybe only for large lists. But why not just pass the key argument when you sort the large list? It will be rather confusing if different methods of sorting produce inconsistent order. If __key__ is inconsistent with __lt__, it is the same error as making __lt__ inconsistent with __gt__. If __key__ is consistent with __lt__, then we can just use __lt__, and don't introduce new special methods. And there could be a decorator that generates all comparison methods from __key__. The decorator idea LGTM. But it doesn't need the special __key__ method. Just pass the key function as a decorator argument. This idea was proposed more than 3.5 years ago. Is there a PyPI package that implements it? ___ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] a sorting protocol dunder method?
On Sun, Dec 03, 2017 at 06:53:45PM -0800, Bruce Leban wrote: > I think you're arguing against this for the wrong reason. Chris was talking > about custom classes having the *option* of making them sortable by > providing a key method in the class definition. I never imagined that it would be a required method. Of course it is optional. But by adding interpreter support, we're blessing something which I think is a misguided design choice as the One Obvious Way. We're taking something which belongs in the report generator or collection, the knowledge of how to sort a collection of unordered values, and baking it into the values themselves. (Effectively making them ordered!) That's the wrong design. Your report needs to know about your values, your values shouldn't need to know how the report is formatted. Its like saying that you want an Email object, and a SMTP_Server object, but to make it more convenient for the SMTP_Server object, we should give the Email objects themselves a method that knows how to talk to port 25 and send themselves. Then the SMTP_Server just calls email.send() on each method. How convenient. The same applies here. Sure, it is convenient to just call bank_accounts.sort() (to re-use the example given by Carl) and it magically works, but as soon as your report changes and you want the bank accounts sorted according to their account name instead of balance, you have to either provide a key function, or change the __key__ method. Obviously changing the __key__ method will break any other reports that rely on it, so you end up using a key function anyway. I would mind this less if it isn't blessed by the interpreter. There are lots of classes which are excessively coupled to other things. I've written a few of them myself, so I understand the temptation. Sometimes that design might even be justified under "Practicality beats purity". But I don't think this is one of those cases: I don't see this as important enough or common enough to build it into the language as an actual dunder method. If people like Chris' idea, just add a sortkey() method to your class, and then call sorted(collection_of_Spam, key=Spam.sortkey) and it will Just Work. It is explicit, backwards compatible and doesn't need to wait for Python 3.8 or whenever this (mis)feature gets (hypothetically) added. > On Sun, Dec 3, 2017 at 3:06 PM, Chris Barker wrote: > > > Am I imagining the performance benefits? > > > > Maybe. Looking strictly at O(-) cost, there's no difference between a key > function and comparison operators. Sure it might potentially only make O(n) > calls to the key function and O(n log n) calls to compare the keys vs. O(n > log n) calls to the comparator functions but that might not actually be > faster. It is unlikely that calling a key function followed by key comparisons would be faster than just calling the key comparisons. Using a key function is effectively the old DSU idiom: values = [(key(x), x) for x in values] # O(n) values.sort() # O(n log n) values = [t[1] for t in values] # O(n) so you make two extra passes through the list. The only way that could be faster is if key(x).__lt__ is sufficiently cheap compared to x.__lt__ that it saves more than the cost of those two extra passes (plus the overhead from dealing with the extra tuples). You might be thinking of the old Python 1 and early Python 2 cmp argument to sort. The comparator function can end up calling x.__lt__ up to O(n**2) times if I remember correctly, so it is quite expensive. > There certainly are cases where implementing a key function would > be quite slow. The biggest problem with a key function is that it trades off memory for time. If you are constrained by memory, but don't care how slow your sort is, an old-school comparison function might suit you better. But for those cases, functools.cmp_to_key may help. -- Steve ___ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] a sorting protocol dunder method?
On Mon, 4 Dec 2017 12:19:07 + Paul Moore wrote: > On 4 December 2017 at 11:41, Steven D'Aprano wrote: > > On Sun, Dec 03, 2017 at 10:48:18PM -0800, Carl Meyer wrote: > >> I think this is an interesting idea, and I don't believe that either > >> performance or "sortable vs comparable" are very relevant. > > > > Performance is always relevant -- while performance shouldn't be the > > sole deciding factor, it should be a factor. > > > > And since the entire use-case for this is sorting versus comparison > > operators, I'm having trouble understanding why you think that sorting > > versus comparison operators is irrelevant. > > I'm not completely clear on what the expectation is (in terms of > "sortable vs comparable") here. It's quite clear if you read what Chris Barker posted originally (and which I agree with). We're talking about deriving comparison methods from a key function *while* making sorting potentially faster, because the costly reduction operation happens O(n) times instead of O(n log n) times. Steven OTOH seems to be inventing controversies just for the sake of posting a rant. Regards Antoine. ___ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] a sorting protocol dunder method?
On Mon, 4 Dec 2017 23:16:11 +1100 Steven D'Aprano wrote: > On Mon, Dec 04, 2017 at 12:06:38PM +0100, Antoine Pitrou wrote: > > > There are definitely advantages. Sorting calls __lt__ for each > > comparison (that is, O(n log n) times) while __key__ would only be > > called once per item at the start (that is, O(n) times). > > Passing a key function doesn't magically turn a O(n log n) comparison > sort into a O(n) sort. Where did I say it did? > Once you've called the key function every time, you still have to > *actually sort*, which will be up to O(n log n) calls to __lt__ on > whatever __key__ returned. Yes... and the whole point is for __key__ to return something which is very cheap to compare, such that there are O(n) expensive calls to __key__ and O(n log n) cheap calls to __lt__, rather than O(n log n) expensive calls to __lt__. > > If __key__ is inconsistent with __lt__, it is the same error as making > > __lt__ inconsistent with __gt__. > > You seem to be assuming that __key__ is another way of spelling __lt__, > rather than being a key function. It isn't. It's just supposed to be consistent with it, just like __hash__ is supposed to be consistent with __eq__, and noone reasonable chastises Python because it allows to define __hash__ independently of __eq__. Also please note Serhiy's sentence I was responding to: """It will be rather confusing if different methods of sorting produce inconsistent order.""" > In any case, it isn't an error for __lt__ to be inconsistent with > __gt__. [...] > Not all values have total ordering. As usual you should try to understand what people are trying to say instead of imagining they are incompetent. In the present case, it would definitely be an inconsistency (and a programming error) to have both `a < b` and `b < a` return true. > Defining a single comparison method is not enough to define the rest. How about you stick to the discussion? I'm talking about deriving comparison methods from __key__, not from another comparison method. Defining __key__ is definitely enough to define all comparison methods. Regards Antoine. ___ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] a sorting protocol dunder method?
On Mon, Dec 04, 2017 at 08:45:55AM +0200, Serhiy Storchaka wrote: > But the idea of the class decorator looks more sane to me. The purpose of __key__ is to define a key function (not a comparison operator) for classes that aren't orderable and don't have __lt__. If you're going to then go ahead and define __lt__ and the other comparison operators, there's no point to __key__. -- Steve ___ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] a sorting protocol dunder method?
On 4 December 2017 at 11:41, Steven D'Aprano wrote: > On Sun, Dec 03, 2017 at 10:48:18PM -0800, Carl Meyer wrote: >> I think this is an interesting idea, and I don't believe that either >> performance or "sortable vs comparable" are very relevant. > > Performance is always relevant -- while performance shouldn't be the > sole deciding factor, it should be a factor. > > And since the entire use-case for this is sorting versus comparison > operators, I'm having trouble understanding why you think that sorting > versus comparison operators is irrelevant. I'm not completely clear on what the expectation is (in terms of "sortable vs comparable") here. Clearly if a class has __lt__, it's both sortable and comparable, and that's fine. If it doesn't have __lt__, then the implication is that the class designer doesn't believe it's reasonable for it to be ordered. That's what not having comparison methods *means* (well, excepting the case that the designer didn't think of it, which is probably the case for 99% of my classes ;-)) If we have a __key__ method on a class, then the following becomes true: * We can work out which of 2 instances is bigger/smaller using max/min. * We can compare two items by doing a sort. So while the *intent* may not be to allow comparisons, that's what you've done. As a result, I don't think it's an important consideration to worry about classes that "should be sortable, but shouldn't be orderable". That's basically a contradiction in terms, and will likely only come up in corner cases where for technical reasons you may need your instances to participate in sorting without raising exceptions, but you don't consider them orderable (the NLTK example mentioned above). Conversely, when sorting a key can provide significant performance improvements. A single O(n) pass to compute keys, followed by O(n log(n)) comparisons could be significantly faster, assuming comparing keys is faster than extracting them from the object. So allowing classes to define __key__ could be a performance win over a __lt__ defined as (effectively) def __lt__(self, other): return self.__key__() < other.__key__() Overall, I don't see any problem with the idea, although it's not something I've ever needed myself, and I doubt that in practice it will make *that* much difference. The main practical benefit, I suspect, would be if there were an "Orderable" ABC that auto-generated the comparison methods given either __lt__ or __key__ (I could have sworn there was such an ABC for __lt__ already, but I can't find it in the library ref :-() Paul ___ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] a sorting protocol dunder method?
On Mon, Dec 04, 2017 at 12:06:38PM +0100, Antoine Pitrou wrote: > There are definitely advantages. Sorting calls __lt__ for each > comparison (that is, O(n log n) times) while __key__ would only be > called once per item at the start (that is, O(n) times). Passing a key function doesn't magically turn a O(n log n) comparison sort into a O(n) sort. Once you've called the key function every time, you still have to *actually sort*, which will be up to O(n log n) calls to __lt__ on whatever __key__ returned. The key function is just a built-in version of the old "DSU" (Decorate-Sort-Undecorate) idiom: values = [(key(x), x) for x in values] values.sort() values = [t[1] for t in values] If you want this functionality, go right ahead and give your class a sortkey method, and then pass that as the explicit key function to sorted: sorted(collection_of_Spam, key=Spam.sortkey) That is nice and explicit, and the method only gets looked up once. It works in any version of Python, it is fully backwards-compatible, and it requires no support from the interpreter. I still think this is a poor object design, putting the key function in the object being sorted rather than in the report doing the sorting, but so long as this isn't blessed by the language as the One Obvious Way I don't mind so much. > > It will be rather confusing if > > different methods of sorting produce inconsistent order. > > If __key__ is inconsistent with __lt__, it is the same error as making > __lt__ inconsistent with __gt__. You seem to be assuming that __key__ is another way of spelling __lt__, rather than being a key function. If it does the same thing as __lt__, it is a comparison method, not a key function, and the name is horribly misleading. In any case, it isn't an error for __lt__ to be inconsistent with __gt__. py> {1, 2, 3, 4} < {2, 3, 4, 5} False py> {1, 2, 3, 4} > {2, 3, 4, 5} False Not all values have total ordering. > And there could be a decorator that generates all comparison methods > from __key__. Defining a single comparison method is not enough to define the rest. You need __eq__ and one comparison method. (Technically we could use __ne__ and one other, but __eq__ is usual. But why not just define __lt__ and use total_ordering, instead of defining two identical decorators that differ only in the name of the dunder method they use? The purpose of __key__ was supposed to be to eliminate the need to define __lt__. It seems ridiculous to then use __key__ to define __lt__ when the whole point of __key__ is to avoid needing to define __lt__. -- Steve ___ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] a sorting protocol dunder method?
On Sun, Dec 03, 2017 at 10:48:18PM -0800, Carl Meyer wrote: > I think this is an interesting idea, and I don't believe that either > performance or "sortable vs comparable" are very relevant. Performance is always relevant -- while performance shouldn't be the sole deciding factor, it should be a factor. And since the entire use-case for this is sorting versus comparison operators, I'm having trouble understanding why you think that sorting versus comparison operators is irrelevant. > I doubt there is much performance to gain here, I doubt there is any performance gain -- rather a performance hit is far more likely. > and I think the default sort order for > a class must continue to match its comparison behavior. This proposal changes that: if a class defines __key__ but not __lt__, then the default sort behaviour will be different from the comparison behaviour. If it defines both, it isn't clear which will be used for sorting. Should __lt__ take priority, or __key__? Whichever we choose, somebody is going to be upset and confused by the choice. > Most often when I define equality and comparison dunder methods for a > custom class, I'm effectively just deferring the comparison to some > field or tuple of fields of the object. E.g. > > from functools import total_ordering > > @total_ordering > class BankAccount: > def __init__(self, balance): > self.balance = balance [snip example] This example shows exactly the confusion of concepts I'm talking about. Why should bank accounts be sorted by their balance, instead of by their account number, or account name, or date they were opened? Why should BankAccounts sort differently according to their balance, a quantity which can change after every transaction? What happens when somebody decides that bank accounts default sorting should be by their account name rather than the ever-changing balance? I'm not saying that it never makes sense to sort a bunch of accounts according to their balance. I'm saying that functionality is not part of the account themselves. If it belongs anywhere, it belongs in the collection of accounts. And even that is dubious: I believe that where it really belongs is in the report generator that needs to sort the collection of accounts. > It'd be nice to be able to eliminate an import That's an argument that applies to *literally* everything in the standard library. Should we make everything a built-in? The prerequisites for eliminating the need for an import should be a *lot* higher than just "it would be nice". I disagree with the design of this: it is putting the decision of how to sort uncomparable objects in the wrong place, in the object itself rather than in the report that wants to sort them. -- Steve ___ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] a sorting protocol dunder method?
On Mon, 4 Dec 2017 08:45:55 +0200 Serhiy Storchaka wrote: > 04.12.17 01:06, Chris Barker пише: > > So: has this already been brought up and rejected? > > https://bugs.python.org/issue20632 > > > Am I imagining the performance benefits? > > This will add an additional overhead. This will be even slower than > passing the key function, since you will need to look up the __key__ > method in every item. That is a reasonable objection. However, looking up a tp_XXX slot is very cheap. > Yes, it is too special-case. I don't see any advantages in comparison > with defining the __lt__ method. There are definitely advantages. Sorting calls __lt__ for each comparison (that is, O(n log n) times) while __key__ would only be called once per item at the start (that is, O(n) times). > It will be rather confusing if > different methods of sorting produce inconsistent order. If __key__ is inconsistent with __lt__, it is the same error as making __lt__ inconsistent with __gt__. And there could be a decorator that generates all comparison methods from __key__. Regards Antoine. ___ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] a sorting protocol dunder method?
On Sun, 3 Dec 2017 15:06:02 -0800 Chris Barker wrote: > I can't believe this hasn't been brought up before, but searching the web, > and python-ideas, and all the PEPs has found nothing (could be my lame > google-fu), so here goes: > > Recent python has moved toward a "key" function for customized sorting: > > list.sort(key=key_fun) > > key is also used (according to > https://docs.python.org/3.6/library/functools.html#functools.cmp_to_key) in: > > min(), max(), heapq.nlargest(), heapq.nsmallest(), itertools.groupby() > > with this fairly broad use, it seems it's becoming a fairly universal > protocol for ordering. > [...] +1 from me. I would also be +1 on an optional class decorator that would generate all the ordering comparison methods (__lt__, etc.) based on the __key__ method definition. Regards Antoine. ___ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] a sorting protocol dunder method?
On Sun, Dec 3, 2017 at 10:48 PM, Carl Meyer wrote: > It'd be nice to be able to eliminate an import and have the lines of > code and instead write that as: > > class BankAccount: > def __init__(self, balance): > self.balance = balance > > def __sort_key__(self): > return self.balance What if we added a @key_ordering decorator, like @total_ordering but using __key__ to generate the comparisons? I know you'd have to do an import, but usually adding things to the core language requires more of a benefit than that :-). -n -- Nathaniel J. Smith -- https://vorpus.org ___ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/