Re: [Python-ideas] a sorting protocol dunder method?

2017-12-04 Thread Greg Ewing

On 12/04/2017 10:01 AM, brent bejot wrote:


I think we can all agree that defining
__key__ and calling "sorted(list_of_attrdicts)" has that syntactic 
sugar that is oh-so-sweet-and-tasty.


Just remember that too much syntactic sugar can give
you cancer of the semicolon.

--
Greg

___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] a sorting protocol dunder method?

2017-12-04 Thread Chris Barker
wow! a few time zones (and a day job) really make a difference to taking
part in a discussion :-)

Thanks for all the feedback. From what I can tell, we don't have a
consensus, though It's looking pretty unlikely that this is going to be
adopted (though maybe the decorator idea). But I'm going to go through and
try to summarize what we (I?) have learned, and what does seem to be agreed
upon, or not. If I misinterpret something you said, or take a quote out of
context, please correct it, but trust that I didn't do it on purpose

Also, it's kind of a pain to do a digest like this and properly attribute
everyone, so mostly I'm not going to attribute the quotes...


So: has this already been brought up and rejected?
>

https://bugs.python.org/issue20632

Thanks Serhiy -- I didn't think to look in the Bug DB. This does remind me
that it would be good to have a place (maybe in a mets-PEP?) to put topics
that have been discussed, but didn't get as far as anyone writing a PEP...


An existence proof: in NLTK, an __lt__ method added purely to facilitate
> consistent sorting (in doctests) of structured data objects for which
> comparison operators do not really make conceptual sense:
> https://github.com/nltk/nltk/pull/1902/files#diff-
> 454368f06fd635b1e06c9bb6d65bd19bR689
> Granted, calling min() and max() on collections of these objects would
> not make conceptual sense either. Still, __sort_key__ would have been
> cleaner than __lt__.


So nice to know I'm not the only one that wants (needs) to provide a sort
order be default -- though it doesn't mean it's not a niche issue anyway.

By default, we sort by the inherent order of the values. But if the
> values have no inherent order (they are unordered), we can sort
> unordered items in a collection by providing an appropriate key
> function. Hence why I say they are loosely related.



>  It
> doesn't make sense to put that functionality into the complex numbers
> themselves: complex numbers are unordered, and any order we impose on
> them comes from the collection, not the individual numbers.


Sure -- and that's why we definitely still need a key option for the
sorting functions, and why not all classes should define a sort order. But
for many classes, it does make sense to define a default sort order. Take
something as simple as strings -- they have a default sort order, but a
user very well might want to sort them differently -- say case-insensitive,
or ...

So I think there is consensus here (and no one was proposing otherwise):

 - Defining a sort order is NOT required of all classes
 - The sorting methods definitely need to allow a custom key function.

 the comparison operators
> apply to the values in the collection; the key function applies to the
> collection itself


But many (most) objects DO provide a default sort order, by means of the
comparison methods.  So the idea that objects have a default sort order is
already pretty well accepted :-)

But that assumes that there's only ever one way to order a collection of
> unordered items.


no,. it doesn't -- it implies there is a default way, which is what is done
already for most types. And again, some classes  shouldn't even have a
default.

I'm not sure I understand the motivation to make elements *sortable* but
> not comparable. If an arbitrary order is still useful, I'd think you'd want
> to be able to tell how two particular elements *would* sort by asking a


yeah, I'm not sure about this -- going back to the example of complex
numbers, I like
the idea of providing a default sort order but not make the declaration
that this is how the objects SHOULD be compared -- after all, you can
provide your own key function, but not so easily re-define the
comparisons...


> Is sorting-related functionally too special-case to deserve a protocol?
>

> Yes, it is too special-case. I don't see any advantages in comparison with
> defining the __lt__ method. It will be rather confusing if different
> methods of sorting produce inconsistent order. If the first item of the
> sorted list is not the smallest item of the list.


well, yeah, but not any different than the fact that you can inconsitently
define the comparison operators already.

What I wasn't realizing is that you only need to define __lt__ (and __eq__)
to get sortability. Maybe it would be good to document that more
prominently somewhere (if it's documented at all, other than the source).

But the idea of the class decorator looks more sane to me.


yeah, I'm liking that.

Most often when I define equality and comparison dunder methods for a
> custom class, I'm effectively just deferring the comparison to some
> field or tuple of fields of the object.


Exactly -- which may be an argument for a decorator, rather than a dunder
method -- particularly if performance isn't improved by the dunder method.

What if we added a @key_ordering decorator, like @total_ordering but
> using __key__ to generate the comparisons?


This could be a good idea -- just 

[Python-ideas] Data Classes in Kotlin

2017-12-04 Thread Juancarlo Añez
I thought this might be interesting input for the discussions about "data
classes" in Python:

https://kotlinlang.org/docs/reference/data-classes.html

I think that the use of "data class" as syntax is kind of cool, but what
really matters is the semantics they chose for Kotlin.

-- 
Juancarlo *Añez*
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] a sorting protocol dunder method?

2017-12-04 Thread Chris Angelico
On Tue, Dec 5, 2017 at 8:54 AM, Julien Salort  wrote:
> Le 04/12/2017 à 14:16, Steven D'Aprano a écrit :
>
>> We're taking something which belongs in the report generator or
>> collection, the knowledge of how to sort a collection of unordered
>> values, and baking it into the values themselves. (Effectively making
>> them ordered!)
>
> It is also possible to use this __key__ method for classes for which the
> ordering
> is indeed unambiguously defined, e.g.:
>
> class MyValue:
>
> def __init__(self, value, comment):
> self.value = value
> self.comment = comment
>
> def __key__(self):
> return self.value
>
> Then it is not shocking to define a sorting key.

MyValue = namedtuple('MyValue', ['value', 'comment'])

Job done :)

ChrisA
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] a sorting protocol dunder method?

2017-12-04 Thread Julien Salort

Le 04/12/2017 à 14:16, Steven D'Aprano a écrit :


We're taking something which belongs in the report generator or
collection, the knowledge of how to sort a collection of unordered
values, and baking it into the values themselves. (Effectively making
them ordered!)
It is also possible to use this __key__ method for classes for which the 
ordering

is indeed unambiguously defined, e.g.:

class MyValue:

    def __init__(self, value, comment):
        self.value = value
        self.comment = comment

    def __key__(self):
        return self.value

Then it is not shocking to define a sorting key.
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] a sorting protocol dunder method?

2017-12-04 Thread Chris Angelico
On Tue, Dec 5, 2017 at 8:22 AM, Barry  wrote:
>
>
>> On 4 Dec 2017, at 20:12, Antoine Pitrou  wrote:
>>
>> On Mon, 4 Dec 2017 19:37:02 +
>> Barry Scott  wrote:
>>> I wondered what the performance would be and tested the following code:
>>>
>> [...]
>>>
>>> it outputs this for my with python 3.6.0
>>>
>>> 1
>>> key 0.010628s  1 calls
>>> lt 0.053690s 119886 calls
>>>
>>> It seems that providing a key is ~5 times faster the depending on __lt__.
>>> (I even used a short circuit to se if __lt__ could beat key).
>>
>> Thanks for taking the time to write a benchmark.  I'm not surprised
>> by the results (and your __lt__ method isn't even complicated: the gap
>> could be very much wider).  There is more to Python performance than
>> aggregate big-O algorithmic complexity.
>
> I was surprised by the huge difference. I was expecting a closer race.
>
> For the record I think that a __sort_key__ is not a good idea as it is so 
> easy to
> do as I did and define key methods on the class, without the limit of one such
> method.

The numbers here are all fairly small, but they make a lot of sense.
Calls into Python code are potentially quite slow, so there's a LOT of
benefit to be had by calling Python code once per object instead of N
log N times for the comparisons. Increasing the length of the list
will make that difference even more pronounced.

But that's an argument for using a key function, not for having a
__key__ special method.

ChrisA
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] a sorting protocol dunder method?

2017-12-04 Thread Barry


> On 4 Dec 2017, at 20:12, Antoine Pitrou  wrote:
> 
> On Mon, 4 Dec 2017 19:37:02 +
> Barry Scott  wrote:
>> I wondered what the performance would be and tested the following code:
>> 
> [...]
>> 
>> it outputs this for my with python 3.6.0
>> 
>> 1
>> key 0.010628s  1 calls
>> lt 0.053690s 119886 calls
>> 
>> It seems that providing a key is ~5 times faster the depending on __lt__.
>> (I even used a short circuit to se if __lt__ could beat key).
> 
> Thanks for taking the time to write a benchmark.  I'm not surprised
> by the results (and your __lt__ method isn't even complicated: the gap
> could be very much wider).  There is more to Python performance than
> aggregate big-O algorithmic complexity.

I was surprised by the huge difference. I was expecting a closer race.

For the record I think that a __sort_key__ is not a good idea as it is so easy 
to
do as I did and define key methods on the class, without the limit of one such
method.

Barry


> 
> Regards
> 
> Antoine.
> 
> 
> ___
> Python-ideas mailing list
> Python-ideas@python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
> 

___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] a sorting protocol dunder method?

2017-12-04 Thread Antoine Pitrou
On Mon, 4 Dec 2017 19:37:02 +
Barry Scott  wrote:
> I wondered what the performance would be and tested the following code:
> 
[...]
> 
> it outputs this for my with python 3.6.0
> 
> 1
> key 0.010628s  1 calls
>  lt 0.053690s 119886 calls
> 
> It seems that providing a key is ~5 times faster the depending on __lt__.
> (I even used a short circuit to se if __lt__ could beat key).

Thanks for taking the time to write a benchmark.  I'm not surprised
by the results (and your __lt__ method isn't even complicated: the gap
could be very much wider).  There is more to Python performance than
aggregate big-O algorithmic complexity.

Regards

Antoine.


___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] a sorting protocol dunder method?

2017-12-04 Thread Barry Scott
I wondered what the performance would be and tested the following code:

#!/usr/bin/env python3
import random
import time

random.seed( hash('Testing Keys') )

lt_calls = 0
key_calls = 0

class MyObject:
def __init__( self, value1, value2 ):
self.value1 = value1
self.value2 = value2

def __lt__(self, other):
global lt_calls
lt_calls +=1 

if self.value1 < other.value1:
return True
else:
return self.value2 < other.value2

def key(self):
global key_calls
key_calls +=1 

return self.value1, self.value2

lt_list = []
random

for value1 in reversed(range(1)):
value2 = value1 - 50

lt_list.append( MyObject( value1, value2 ) )

random.shuffle( lt_list )

key_list = lt_list[:]

print( len(lt_list) )

s = time.time()
key_list.sort( key=MyObject.key )
e = time.time() - s
print( 'key %.6fs %6d calls' % (e, key_calls) )

s = time.time()
lt_list.sort()
e = time.time() - s
print( ' lt %.6fs %6d calls' % (e, lt_calls) )


it outputs this for my with python 3.6.0

1
key 0.010628s  1 calls
 lt 0.053690s 119886 calls

It seems that providing a key is ~5 times faster the depending on __lt__.
(I even used a short circuit to se if __lt__ could beat key).

I often have more then one way to sort an object. Its easy for me to provide a 
set
of key functions that meet the needs of each sort context. I'm not sure what 
extra
value the __sort_key__ would offer over providing a key method as I did.

Barry

___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] a sorting protocol dunder method?

2017-12-04 Thread Rhodri James

On 04/12/17 18:01, brent bejot wrote:

I'm +1 on this idea for the most part.

I agree particularly with the idea that it is better OOP for an object to
access it's member variables to create the key than an external container
to do so.


This I'm absolutely fine with.  Key methods are something to encourage. 
The problem that I have is that once you get beyond simple lists of 
number or strings, there often isn't a particularly obvious sort order, 
or rather there are often multiple obvious sort orders and you may want 
any of them.  In fact I'd go so far as to suggest that there _usually_ 
isn't a single obvious sort order for non-trivial classes.


To take a non-Python example, I'm in charge of the reading rota at my 
local church, and I keep a spreadsheet of readers to help me.  Usually I 
sort that list by the date people last read the lesson, so I can quickly 
tell who I should ask next.  When the newsletter asks me for a list of 
readers, though, I sort them alphabetically by surname, which most 
people would think of as the natural sorting order.


--
Rhodri James *-* Kynesim Ltd
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] a sorting protocol dunder method?

2017-12-04 Thread Mark Lawrence

On 04/12/17 18:01, brent bejot wrote:

I'm +1 on this idea for the most part.

I agree particularly with the idea that it is better OOP for an object 
to access it's member variables to create the key than an external 
container to do so.


 > and then sort like this:
 > sorted(list_of_attrdicts, key=AttrDict._id_order)



Isn't this exactly what the operator module's itemgetter and attrgetter 
all ready give you?


--
My fellow Pythonistas, ask not what our language can do for you, ask
what you can do for our language.

Mark Lawrence

___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] a sorting protocol dunder method?

2017-12-04 Thread Ethan Furman

On 12/04/2017 10:01 AM, brent bejot wrote:


This is certainly a good pattern to use in the current and older versions, but 
I think we can all agree that defining
__key__ and calling "sorted(list_of_attrdicts)" has that syntactic sugar that 
is oh-so-sweet-and-tasty.


Actually, no, we do not all agree.  ;-)

--
~Ethan~
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] a sorting protocol dunder method?

2017-12-04 Thread brent bejot
I'm +1 on this idea for the most part.

I agree particularly with the idea that it is better OOP for an object to
access it's member variables to create the key than an external container
to do so.

> and then sort like this:
> sorted(list_of_attrdicts, key=AttrDict._id_order)

This is certainly a good pattern to use in the current and older versions,
but I think we can all agree that defining __key__ and calling
"sorted(list_of_attrdicts)" has that syntactic sugar that is
oh-so-sweet-and-tasty.

> This will add an additional overhead. This will be even slower than
passing the key function, since you will need to look up the __key__ method
in every item. And there will be an overhead even in the case when the
__key__ method is not defined.

This, to me, is the only possible negative.  I would be most interested to
see how much of an effect this would have on real-world data that doesn't
have __key__ defined.



I may be new to this community but Steven D'Aprano and Antoine Pitrou, you
guys bicker like my parents before they got a divorce.  I'm pretty sure
you're both veterans and so know how to behave yourselves.  Please set the
tone according to how you'd like us newbies to respond.

-Brent
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] (no subject)

2017-12-04 Thread Antoine Pitrou
On Tue, 5 Dec 2017 02:52:44 +1100
Steven D'Aprano  wrote:
> On Mon, Dec 04, 2017 at 01:52:19PM +0100, Antoine Pitrou wrote:
> > On Mon, 4 Dec 2017 23:16:11 +1100
> > Steven D'Aprano  wrote:  
> > > On Mon, Dec 04, 2017 at 12:06:38PM +0100, Antoine Pitrou wrote:
> > >   
> > > > There are definitely advantages.  Sorting calls __lt__ for each
> > > > comparison (that is, O(n log n) times) while __key__ would only be
> > > > called once per item at the start (that is, O(n) times).
> > > 
> > > Passing a key function doesn't magically turn a O(n log n) comparison 
> > > sort into a O(n) sort.  
> > 
> > Where did I say it did?  
> 
> See the text from you quoted above. You said there are "definitely 
> [performance] advantages" by using a key function. You then compare:
> 
> - calling __lt__ O(n log n) times, versus
> 
> - calling the key function O(n) times.
> 
> This is a classic "apples versus oranges" comparison. You compare 
> *actually sorting the list* with *not sorting the list* and conclude 
> that they key function provides a performance advantage.

At this point, I can only assume you are trolling by twisting my
words... even though you later quote the part which explicitly
clarifies that I was *not* saying what you claim I did.

Why you seem to think that is contributing anything to the discussion
rather than derailing it is beyond me.  In any case, don't expect
further responses from me.

Regards

Antoine.


___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] a sorting protocol dunder method?

2017-12-04 Thread Steven D'Aprano
On Mon, Dec 04, 2017 at 01:52:19PM +0100, Antoine Pitrou wrote:
> On Mon, 4 Dec 2017 23:16:11 +1100
> Steven D'Aprano  wrote:
> > On Mon, Dec 04, 2017 at 12:06:38PM +0100, Antoine Pitrou wrote:
> > 
> > > There are definitely advantages.  Sorting calls __lt__ for each
> > > comparison (that is, O(n log n) times) while __key__ would only be
> > > called once per item at the start (that is, O(n) times).  
> > 
> > Passing a key function doesn't magically turn a O(n log n) comparison 
> > sort into a O(n) sort.
> 
> Where did I say it did?

See the text from you quoted above. You said there are "definitely 
[performance] advantages" by using a key function. You then compare:

- calling __lt__ O(n log n) times, versus

- calling the key function O(n) times.

This is a classic "apples versus oranges" comparison. You compare 
*actually sorting the list* with *not sorting the list* and conclude 
that they key function provides a performance advantage.

Yes, the key function gets called O(n) times. And that's not enough to 
sort the list, you still have to actually sort, exactly as I said. So 
where do these performance advantages come from?

As I said in another post, the overhead of decorating the list with the 
key function makes it rather unlikely that this will be faster than just 
sorting it. It can happen, if key(x).__lt__ is sufficiently faster than 
x.__lt__, but that would be unusual.


> > Once you've called the key function every time, you still have to 
> > *actually sort*, which will be up to O(n log n) calls to __lt__ on 
> > whatever __key__ returned.
> 
> Yes... and the whole point is for __key__ to return something which is
> very cheap to compare, such that there are O(n) expensive calls to
> __key__ and O(n log n) cheap calls to __lt__, rather than O(n log n)
> expensive calls to __lt__.

Read Chris' post again. The point he was making is that the class might 
only define __lt__ in order to support sorting, and if we allow it to 
define a key function instead the class can avoid adding __lt__ at all. 
There's no requirement or expectation that __lt__ is expensive.

If you can order the values using a cheap method and an expensive 
method, why would you define __lt__ to use the expensive method instead 
of the cheap method? The point is to avoid defining __lt__ at all, and 
still support sorting.

But even if we define both... what makes you think that x.__lt__ is 
expensive (if it exists) and key(x).__lt__ is cheap? It might be the 
other way. If they are different, there's no guarantee about which is 
cheaper. If they are the same, then one is redundant.


> > > If __key__ is inconsistent with __lt__, it is the same error as making
> > > __lt__ inconsistent with __gt__.  
> > 
> > You seem to be assuming that __key__ is another way of spelling __lt__, 
> > rather than being a key function.
> 
> It isn't.

Right -- you've now clarified your position. Thank you. It wasn't clear 
from your earlier posts.


> > In any case, it isn't an error for __lt__ to be inconsistent with 
> > __gt__.
> [...]
> > Not all values have total ordering.
> 
> As usual you should try to understand what people are trying to say
> instead of imagining they are incompetent.

Instead of getting your nose out of joint and accusing me of "imagining 
[you] are incompetent", and assuming that I didn't "try to understand 
what people are trying to say", how about *you* do the same?

Don't assume I'm an idiot too stupid to understand your perfectly clear 
words, rather consider the possibility that maybe I'm reading and 
responding to you in good faith, but I'm not a mind-reader.

If you failed to get your message across, perhaps the fault lies in your 
post, not my reading comprehension. In any case, whoever is to blame for 
the misunderstanding, the only one who can do anything about it is the 
writer. The writer should take responsibility for not being clear 
enough, rather than blaming the reader.


[...]
> > Defining a single comparison method is not enough to define the rest. 
> 
> How about you stick to the discussion?  I'm talking about deriving
> comparison methods from __key__, not from another comparison method.
> Defining __key__ is definitely enough to define all comparison
> methods.

Here's a key function I've used:

def key(string):
return string.strip().casefold()

Convert to a key method:

def __key__(self):
return self.strip().casefold()


Is it your idea to define __lt__ and others like this?

def __lt__(self, other):
# ignoring the possibility of returning NotImplemented
return self.__key__() < self.__key__()


Fair enough. I'm not convinced that's going to offer definite 
performance advantages, but like the total_ordering decorator, 
presumably if we're using this, performance is secondary to convenience.

Nor do I think this decorator needs to take an implicit dunder method, 
when it can take an explicit key function.


-- 
Steve
___
Python-ide

Re: [Python-ideas] a sorting protocol dunder method?

2017-12-04 Thread Antoine Pitrou
On Mon, 4 Dec 2017 15:50:31 +0200
Serhiy Storchaka 
wrote:
> 
> >> Yes, it is too special-case. I don't see any advantages in comparison
> >> with defining the __lt__ method.  
> > 
> > There are definitely advantages.  Sorting calls __lt__ for each
> > comparison (that is, O(n log n) times) while __key__ would only be
> > called once per item at the start (that is, O(n) times).  
> 
> It will call __lt__ for each key comparison (the same O(n log n) times), 
> but *in addition* it will call __key__ O(n) times. You can get the 
> benefit only when times of calling __key__ is much smaller than the 
> difference between times of calling item's __lt__ and key's __lt__, and 
> maybe only for large lists.

Sure, that's the point: your non-trivial __key__ method reduces the
instance to e.g. a simple tuple or string, and then __lt__ over those
keys is cheap.

> But why not just pass the key argument when 
> you sort the large list?

For the same reason that you want __lt__ (or __eq__, or __hash__) to be
defined on the type, not call it manually every time you want to make a
comparison: because it's really a fundamental property of the type and
it "feels" wrong to have to pass it explicitly.

Also there are library routines which may sort implicitly their inputs
(such as pprint, IIRC, though perhaps pprint only sorts after calling
str() -- I haven't checked).

> If __key__ is consistent with __lt__, then we can just use __lt__, and 
> don't introduce new special methods.

This is ignoring all the other arguments...

> The decorator idea LGTM. But it doesn't need the special __key__ method. 
> Just pass the key function as a decorator argument.

I would find it cleaner to express it as a method in the class's body
("__key__" or anything else) rather than have to pass a function object.
Also, it's easier to unit-test if it officially exists as a method...

> This idea was proposed more than 3.5 years ago. Is there a PyPI package 
> that implements it?

I don't know.  I know I reimplemented such a thing for Numba (but of
course didn't benefit from automatic sort() support), because I needed
fast hashing and equality without implementing the corresponding
methods by hand every time (*). It would be a pity to depend on a
third-party package just for that.

(*) see
https://github.com/numba/numba/blob/master/numba/types/abstract.py#L88

Regards

Antoine.


___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] a sorting protocol dunder method?

2017-12-04 Thread Serhiy Storchaka

04.12.17 14:25, Steven D'Aprano пише:

On Mon, Dec 04, 2017 at 08:45:55AM +0200, Serhiy Storchaka wrote:

But the idea of the class decorator looks more sane to me.


The purpose of __key__ is to define a key function (not a comparison
operator) for classes that aren't orderable and don't have __lt__.

If you're going to then go ahead and define __lt__ and the other
comparison operators, there's no point to __key__.


Right. The only benefit of this decorator is that it could avoid writing 
a boilerplate code for simple cases. Just add 
@ordered_by_key(attrgetter('name', 'weight')).


__key__ is not needed, just pass the key function as an argument of the 
decorator.


Of course if it can be useful this doesn't mean that it should be 
included in the stdlib. It could live on PyPI.


___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] a sorting protocol dunder method?

2017-12-04 Thread Serhiy Storchaka

04.12.17 13:06, Antoine Pitrou пише:

On Mon, 4 Dec 2017 08:45:55 +0200
Serhiy Storchaka 
wrote:

04.12.17 01:06, Chris Barker пише:

So: has this already been brought up and rejected?


https://bugs.python.org/issue20632


Am I imagining the performance benefits?


This will add an additional overhead. This will be even slower than
passing the key function, since you will need to look up the __key__
method in every item.


That is a reasonable objection.  However, looking up a tp_XXX slot is
very cheap.


But introducing a new slot is not easy. This will increase the size of 
the type object, break binary compatibility. According to Stefan 
Behnel's researches (https://bugs.python.org/issue31336) the main time 
of the creation of a new type is spent on initializing slots. This cost 
will pay every Python program, even if it doesn't use the __key__ method.


There are many more common and important methods that don't have the 
corresponding slot (__length_hint__, __getstate__, __reduce__, __copy__, 
keys).



Yes, it is too special-case. I don't see any advantages in comparison
with defining the __lt__ method.


There are definitely advantages.  Sorting calls __lt__ for each
comparison (that is, O(n log n) times) while __key__ would only be
called once per item at the start (that is, O(n) times).


It will call __lt__ for each key comparison (the same O(n log n) times), 
but *in addition* it will call __key__ O(n) times. You can get the 
benefit only when times of calling __key__ is much smaller than the 
difference between times of calling item's __lt__ and key's __lt__, and 
maybe only for large lists. But why not just pass the key argument when 
you sort the large list?



It will be rather confusing if
different methods of sorting produce inconsistent order.


If __key__ is inconsistent with __lt__, it is the same error as making
__lt__ inconsistent with __gt__.


If __key__ is consistent with __lt__, then we can just use __lt__, and 
don't introduce new special methods.



And there could be a decorator that generates all comparison methods
from __key__.


The decorator idea LGTM. But it doesn't need the special __key__ method. 
Just pass the key function as a decorator argument.


This idea was proposed more than 3.5 years ago. Is there a PyPI package 
that implements it?


___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] a sorting protocol dunder method?

2017-12-04 Thread Steven D'Aprano
On Sun, Dec 03, 2017 at 06:53:45PM -0800, Bruce Leban wrote:

> I think you're arguing against this for the wrong reason. Chris was talking
> about custom classes having the *option* of making them sortable by
> providing a key method in the class definition.

I never imagined that it would be a required method. Of course it is 
optional. But by adding interpreter support, we're blessing something 
which I think is a misguided design choice as the One Obvious Way.

We're taking something which belongs in the report generator or 
collection, the knowledge of how to sort a collection of unordered 
values, and baking it into the values themselves. (Effectively making 
them ordered!)

That's the wrong design. Your report needs to know about your values, 
your values shouldn't need to know how the report is formatted.

Its like saying that you want an Email object, and a SMTP_Server object, 
but to make it more convenient for the SMTP_Server object, we should 
give the Email objects themselves a method that knows how to talk to 
port 25 and send themselves. Then the SMTP_Server just calls 
email.send() on each method. How convenient.

The same applies here. Sure, it is convenient to just call 
bank_accounts.sort() (to re-use the example given by Carl) and it 
magically works, but as soon as your report changes and you want the 
bank accounts sorted according to their account name instead of balance, 
you have to either provide a key function, or change the __key__ method. 
Obviously changing the __key__ method will break any other reports that 
rely on it, so you end up using a key function anyway.

I would mind this less if it isn't blessed by the interpreter. There are 
lots of classes which are excessively coupled to other things. I've 
written a few of them myself, so I understand the temptation.

Sometimes that design might even be justified under "Practicality beats 
purity". But I don't think this is one of those cases: I don't see this 
as important enough or common enough to build it into the language as an 
actual dunder method.

If people like Chris' idea, just add a sortkey() method to your class, 
and then call sorted(collection_of_Spam, key=Spam.sortkey) and it will 
Just Work. It is explicit, backwards compatible and doesn't need to wait 
for Python 3.8 or whenever this (mis)feature gets (hypothetically) 
added.


> On Sun, Dec 3, 2017 at 3:06 PM, Chris Barker  wrote:
> 
> > Am I imagining the performance benefits?
> >
> 
> Maybe. Looking strictly at O(-) cost, there's no difference between a key
> function and comparison operators. Sure it might potentially only make O(n)
> calls to the key function and O(n log n) calls to compare the keys vs. O(n
> log n) calls to the comparator functions but that might not actually be
> faster.

It is unlikely that calling a key function followed by key comparisons 
would be faster than just calling the key comparisons. Using a key 
function is effectively the old DSU idiom:

values = [(key(x), x) for x in values]  # O(n)
values.sort()   # O(n log n)
values = [t[1] for t in values] # O(n)

so you make two extra passes through the list. The only way that could 
be faster is if key(x).__lt__ is sufficiently cheap compared to x.__lt__ 
that it saves more than the cost of those two extra passes (plus the 
overhead from dealing with the extra tuples).

You might be thinking of the old Python 1 and early Python 2 cmp 
argument to sort. The comparator function can end up calling x.__lt__ up 
to O(n**2) times if I remember correctly, so it is quite expensive.


> There certainly are cases where implementing a key function would
> be quite slow.

The biggest problem with a key function is that it trades off memory for 
time. If you are constrained by memory, but don't care how slow your 
sort is, an old-school comparison function might suit you better. But 
for those cases, functools.cmp_to_key may help.



-- 
Steve
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] a sorting protocol dunder method?

2017-12-04 Thread Antoine Pitrou
On Mon, 4 Dec 2017 12:19:07 +
Paul Moore  wrote:
> On 4 December 2017 at 11:41, Steven D'Aprano  wrote:
> > On Sun, Dec 03, 2017 at 10:48:18PM -0800, Carl Meyer wrote:  
> >> I think this is an interesting idea, and I don't believe that either
> >> performance or "sortable vs comparable" are very relevant.  
> >
> > Performance is always relevant -- while performance shouldn't be the
> > sole deciding factor, it should be a factor.
> >
> > And since the entire use-case for this is sorting versus comparison
> > operators, I'm having trouble understanding why you think that sorting
> > versus comparison operators is irrelevant.  
> 
> I'm not completely clear on what the expectation is (in terms of
> "sortable vs comparable") here.

It's quite clear if you read what Chris Barker posted originally (and
which I agree with).  We're talking about deriving comparison methods
from a key function *while* making sorting potentially faster, because
the costly reduction operation happens O(n) times instead of O(n log n)
times.

Steven OTOH seems to be inventing controversies just for the sake of
posting a rant.

Regards

Antoine.


___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] a sorting protocol dunder method?

2017-12-04 Thread Antoine Pitrou
On Mon, 4 Dec 2017 23:16:11 +1100
Steven D'Aprano  wrote:
> On Mon, Dec 04, 2017 at 12:06:38PM +0100, Antoine Pitrou wrote:
> 
> > There are definitely advantages.  Sorting calls __lt__ for each
> > comparison (that is, O(n log n) times) while __key__ would only be
> > called once per item at the start (that is, O(n) times).  
> 
> Passing a key function doesn't magically turn a O(n log n) comparison 
> sort into a O(n) sort.

Where did I say it did?

> Once you've called the key function every time, you still have to 
> *actually sort*, which will be up to O(n log n) calls to __lt__ on 
> whatever __key__ returned.

Yes... and the whole point is for __key__ to return something which is
very cheap to compare, such that there are O(n) expensive calls to
__key__ and O(n log n) cheap calls to __lt__, rather than O(n log n)
expensive calls to __lt__.

> > If __key__ is inconsistent with __lt__, it is the same error as making
> > __lt__ inconsistent with __gt__.  
> 
> You seem to be assuming that __key__ is another way of spelling __lt__, 
> rather than being a key function.

It isn't.  It's just supposed to be consistent with it, just like
__hash__ is supposed to be consistent with __eq__, and noone reasonable
chastises Python because it allows to define __hash__ independently of
__eq__.

Also please note Serhiy's sentence I was responding to:
"""It will be rather confusing if different methods of sorting produce
inconsistent order."""

> In any case, it isn't an error for __lt__ to be inconsistent with 
> __gt__.
[...]
> Not all values have total ordering.

As usual you should try to understand what people are trying to say
instead of imagining they are incompetent.

In the present case, it would definitely be an inconsistency (and a
programming error) to have both `a < b` and `b < a` return true.

> Defining a single comparison method is not enough to define the rest. 

How about you stick to the discussion?  I'm talking about deriving
comparison methods from __key__, not from another comparison method.
Defining __key__ is definitely enough to define all comparison
methods.

Regards

Antoine.


___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] a sorting protocol dunder method?

2017-12-04 Thread Steven D'Aprano
On Mon, Dec 04, 2017 at 08:45:55AM +0200, Serhiy Storchaka wrote:

> But the idea of the class decorator looks more sane to me.

The purpose of __key__ is to define a key function (not a comparison 
operator) for classes that aren't orderable and don't have __lt__.

If you're going to then go ahead and define __lt__ and the other 
comparison operators, there's no point to __key__.



-- 
Steve
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] a sorting protocol dunder method?

2017-12-04 Thread Paul Moore
On 4 December 2017 at 11:41, Steven D'Aprano  wrote:
> On Sun, Dec 03, 2017 at 10:48:18PM -0800, Carl Meyer wrote:
>> I think this is an interesting idea, and I don't believe that either
>> performance or "sortable vs comparable" are very relevant.
>
> Performance is always relevant -- while performance shouldn't be the
> sole deciding factor, it should be a factor.
>
> And since the entire use-case for this is sorting versus comparison
> operators, I'm having trouble understanding why you think that sorting
> versus comparison operators is irrelevant.

I'm not completely clear on what the expectation is (in terms of
"sortable vs comparable") here. Clearly if a class has __lt__, it's
both sortable and comparable, and that's fine. If it doesn't have
__lt__, then the implication is that the class designer doesn't
believe it's reasonable for it to be ordered. That's what not having
comparison methods *means* (well, excepting the case that the designer
didn't think of it, which is probably the case for 99% of my classes
;-))

If we have a __key__ method on a class, then the following becomes true:

* We can work out which of 2 instances is bigger/smaller using max/min.
* We can compare two items by doing a sort.

So while the *intent* may not be to allow comparisons, that's what you've done.

As a result, I don't think it's an important consideration to worry
about classes that "should be sortable, but shouldn't be orderable".
That's basically a contradiction in terms, and will likely only come
up in corner cases where for technical reasons you may need your
instances to participate in sorting without raising exceptions, but
you don't consider them orderable (the NLTK example mentioned above).

Conversely, when sorting a key can provide significant performance
improvements. A single O(n) pass to compute keys, followed by O(n
log(n)) comparisons could be significantly faster, assuming comparing
keys is faster than extracting them from the object. So allowing
classes to define __key__ could be a performance win over a __lt__
defined as (effectively)

def __lt__(self, other):
return self.__key__() < other.__key__()

Overall, I don't see any problem with the idea, although it's not
something I've ever needed myself, and I doubt that in practice it
will make *that* much difference. The main practical benefit, I
suspect, would be if there were an "Orderable" ABC that auto-generated
the comparison methods given either __lt__ or __key__ (I could have
sworn there was such an ABC for __lt__ already, but I can't find it in
the library ref :-()

Paul
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] a sorting protocol dunder method?

2017-12-04 Thread Steven D'Aprano
On Mon, Dec 04, 2017 at 12:06:38PM +0100, Antoine Pitrou wrote:

> There are definitely advantages.  Sorting calls __lt__ for each
> comparison (that is, O(n log n) times) while __key__ would only be
> called once per item at the start (that is, O(n) times).

Passing a key function doesn't magically turn a O(n log n) comparison 
sort into a O(n) sort.

Once you've called the key function every time, you still have to 
*actually sort*, which will be up to O(n log n) calls to __lt__ on 
whatever __key__ returned. The key function is just a built-in version 
of the old "DSU" (Decorate-Sort-Undecorate) idiom:


values = [(key(x), x) for x in values]
values.sort()
values = [t[1] for t in values]


If you want this functionality, go right ahead and give your class a 
sortkey method, and then pass that as the explicit key function to 
sorted:

sorted(collection_of_Spam, key=Spam.sortkey)

That is nice and explicit, and the method only gets looked up once. 
It works in any version of Python, it is fully backwards-compatible, 
and it requires no support from the interpreter.

I still think this is a poor object design, putting the key function in 
the object being sorted rather than in the report doing the sorting, but 
so long as this isn't blessed by the language as the One Obvious Way I 
don't mind so much.



> > It will be rather confusing if 
> > different methods of sorting produce inconsistent order.
> 
> If __key__ is inconsistent with __lt__, it is the same error as making
> __lt__ inconsistent with __gt__.

You seem to be assuming that __key__ is another way of spelling __lt__, 
rather than being a key function. If it does the same thing as __lt__, 
it is a comparison method, not a key function, and the name is horribly 
misleading.

In any case, it isn't an error for __lt__ to be inconsistent with 
__gt__.


py> {1, 2, 3, 4} < {2, 3, 4, 5}
False
py> {1, 2, 3, 4} > {2, 3, 4, 5}
False


Not all values have total ordering.


> And there could be a decorator that generates all comparison methods
> from __key__.

Defining a single comparison method is not enough to define the rest. 
You need __eq__ and one comparison method. (Technically we could use 
__ne__ and one other, but __eq__ is usual.

But why not just define __lt__ and use total_ordering, instead of 
defining two identical decorators that differ only in the name of the 
dunder method they use?

The purpose of __key__ was supposed to be to eliminate the need to 
define __lt__. It seems ridiculous to then use __key__ to define __lt__ 
when the whole point of __key__ is to avoid needing to define __lt__.



-- 
Steve
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] a sorting protocol dunder method?

2017-12-04 Thread Steven D'Aprano
On Sun, Dec 03, 2017 at 10:48:18PM -0800, Carl Meyer wrote:
> I think this is an interesting idea, and I don't believe that either
> performance or "sortable vs comparable" are very relevant.

Performance is always relevant -- while performance shouldn't be the 
sole deciding factor, it should be a factor.

And since the entire use-case for this is sorting versus comparison 
operators, I'm having trouble understanding why you think that sorting 
versus comparison operators is irrelevant.


> I doubt there is much performance to gain here,

I doubt there is any performance gain -- rather a performance hit is far 
more likely.


> and I think the default sort order for
> a class must continue to match its comparison behavior.

This proposal changes that: if a class defines __key__ but not __lt__, 
then the default sort behaviour will be different from the comparison 
behaviour.

If it defines both, it isn't clear which will be used for sorting. 
Should __lt__ take priority, or __key__? Whichever we choose, somebody 
is going to be upset and confused by the choice.


> Most often when I define equality and comparison dunder methods for a
> custom class, I'm effectively just deferring the comparison to some
> field or tuple of fields of the object. E.g.
> 
> from functools import total_ordering
> 
> @total_ordering
> class BankAccount:
> def __init__(self, balance):
> self.balance = balance

[snip example]

This example shows exactly the confusion of concepts I'm talking about. 
Why should bank accounts be sorted by their balance, instead of by their 
account number, or account name, or date they were opened?

Why should BankAccounts sort differently according to their balance, a 
quantity which can change after every transaction? What happens when 
somebody decides that bank accounts default sorting should be by their 
account name rather than the ever-changing balance?

I'm not saying that it never makes sense to sort a bunch of accounts 
according to their balance. I'm saying that functionality is not part of 
the account themselves. If it belongs anywhere, it belongs in the 
collection of accounts. And even that is dubious: I believe that where 
it really belongs is in the report generator that needs to sort the 
collection of accounts.


> It'd be nice to be able to eliminate an import

That's an argument that applies to *literally* everything in the 
standard library. Should we make everything a built-in?

The prerequisites for eliminating the need for an import should be a 
*lot* higher than just "it would be nice".


I disagree with the design of this: it is putting the decision of how to 
sort uncomparable objects in the wrong place, in the object itself 
rather than in the report that wants to sort them.



-- 
Steve
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] a sorting protocol dunder method?

2017-12-04 Thread Antoine Pitrou
On Mon, 4 Dec 2017 08:45:55 +0200
Serhiy Storchaka 
wrote:
> 04.12.17 01:06, Chris Barker пише:
> > So: has this already been brought up and rejected?  
> 
> https://bugs.python.org/issue20632
> 
> > Am I imagining the performance benefits?  
> 
> This will add an additional overhead. This will be even slower than 
> passing the key function, since you will need to look up the __key__ 
> method in every item.

That is a reasonable objection.  However, looking up a tp_XXX slot is
very cheap.

> Yes, it is too special-case. I don't see any advantages in comparison 
> with defining the __lt__ method.

There are definitely advantages.  Sorting calls __lt__ for each
comparison (that is, O(n log n) times) while __key__ would only be
called once per item at the start (that is, O(n) times).

> It will be rather confusing if 
> different methods of sorting produce inconsistent order.

If __key__ is inconsistent with __lt__, it is the same error as making
__lt__ inconsistent with __gt__.

And there could be a decorator that generates all comparison methods
from __key__.

Regards

Antoine.


___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] a sorting protocol dunder method?

2017-12-04 Thread Antoine Pitrou
On Sun, 3 Dec 2017 15:06:02 -0800
Chris Barker  wrote:
> I can't believe this hasn't been brought up before, but searching the web,
> and python-ideas, and all the PEPs has found nothing (could be my lame
> google-fu), so here goes:
> 
> Recent python has moved toward a "key" function for customized sorting:
> 
> list.sort(key=key_fun)
> 
> key is also used (according to
> https://docs.python.org/3.6/library/functools.html#functools.cmp_to_key) in:
> 
> min(), max(), heapq.nlargest(), heapq.nsmallest(), itertools.groupby()
> 
> with this fairly broad use, it seems it's becoming a fairly universal
> protocol for ordering.
> 
[...]

+1 from me.

I would also be +1 on an optional class decorator that would generate
all the ordering comparison methods (__lt__, etc.) based on the __key__
method definition.

Regards

Antoine.


___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] a sorting protocol dunder method?

2017-12-04 Thread Nathaniel Smith
On Sun, Dec 3, 2017 at 10:48 PM, Carl Meyer  wrote:
> It'd be nice to be able to eliminate an import and have the lines of
> code and instead write that as:
>
> class BankAccount:
> def __init__(self, balance):
> self.balance = balance
>
> def __sort_key__(self):
> return self.balance

What if we added a @key_ordering decorator, like @total_ordering but
using __key__ to generate the comparisons? I know you'd have to do an
import, but usually adding things to the core language requires more
of a benefit than that :-).

-n

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/