On Sep 20, 2019, at 14:51, Richard Higginbotham <higgi...@gmail.com> wrote:
> 
> Andrew Barnert wrote:

>> What’s the goal here?
> 
> I'm not concerned about micro op, I'm concerned with macro op on large data 
> sets. Right now if someone comes to you with that simple use case of two 
> large lists you have to tell them to convert to sets and then back again. 
> Forget the new programmers who tried the naive way and quite because "python 
> is too slow", they are waiting 20 seconds  when we could deliver it to them 
> in 1 second.

But today, you can deliver it to them in 1 second: just give them 
`set(b).intersection(a)` or `set(a) & set(b)` or `sb = set(b)` then `[x for x 
in a if x in sb]` and you’re done. They can easily understand why it works. If 
they want to know why it’s faster, you can easily explain it, and they’ve 
learned something widely useful.

If we add a new function, whether it’s in builtins or itertools or elsewhere, 
then you can give them that function and you’re done; that’s no easier or 
harder. They probably won’t immediately know how it works, but if they ask, 
they’ll learn something useful. And once they get why it works, they probably 
will get why it’s faster. So, overall, the new solution is roughly the same 
benefits as the existing one that’s been there since Python 2.3.

>> I think it’s pretty easy to establish that sets are fast but they’re not 
>> magic, sorting
>> is pretty good but it’s not magic, there are some cases where step-compare 
>> is faster if
>> you have already-sorted lists… and none of this matters anyway, because in 
>> real life there
>> are very few use cases where you can equally choose between sets and step 
>> compare and need
>> to micro-optimize, but lots of cases where you have _other_ reasons to 
>> choose between
>> them.
> There is no step compare option in python now though. You either convert to 
> set or you hope the sun doesn't expire before your program finishes. 

Or you write the step-compare manually, the way C programmers have done for 
decades. Or you find a third-party library or copy and paste code off a Stack 
Overflow answer.

Converting to set is easier, and usually the right answer.

The important question is, for those cases where it _isn’t_ the right answer, 
is “write it yourself or find it on SO” good enough?

>> Sets work on non-comparable values; step-compare works on non-hashable 
>> values.
>> Step-compare (if implemented right) works on lazy iterables; sets can 
>> preserve the order
>> of the “master”input. They do different things with duplicates. Even when 
>> performance is
>> the only issue (and it’s not dominated by the file system anyway, as it 
>> would be in your
>> case), the characteristics of your specific data are going to make a huge 
>> difference. Not
>> to mention that there are often other compelling reasons to have a set or a 
>> sorted list,
>> at which point the choice is made for you.
>> So, this already eliminates the possibility of reimplementing set operations 
>> with
>> step-compare as suggested earlier. But it also already makes a compelling 
>> case for adding
>> step-compare functions to Python—but that case needs to be made clearly.
> I'm trying to make that case.

No you’re not. You’re trying to make the case that step-compare is better than 
set in all uses. Which isn’t true. And it isn’t necessary in the first place.

The case that needs to be made is that it’s good for some cases that we can’t 
handle today, and that at least one of those cases matters, and that the 
current state of affairs where people have to go figure out how to do it is not 
good enough.

> I've never come across another step and compare.

People do this all over the place. For example look at the merge step of any 
merge sort: it’s a step-compare union, possibly in-place, possibly n-ary 
instead of binary, but recognizably the same thing.

In C++, the stdlib has all of these algorithms, abstracted to work on any 
forward Iterators, in two versions that handle duplicates differently, plus 
in-place versions of some of them.

_______________________________________________
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/KLO4MBK4XRPGAKDD6ET43FIYUR6T33P6/
Code of Conduct: http://python.org/psf/codeofconduct/

Reply via email to