[Python-Dev] Re: Should set objects maintain insertion order too?

Kyle Stanley Mon, 23 Dec 2019 19:02:32 -0800

> To begin with, please compare performance of dict-with-None-values to
that of a set, for operations that can be expressed by both (e.g. both have
update()).


Good idea. It would be useful to have a baseline comparison. I emboldened
the comparisons with a notable difference.

Tested on:
Version: Python 3.8.0
OS: Linux kernel 5.4.5
CPU: Intel i5-4460

Initialization (about the same):
>>> timeit.timeit("set(i for i in range(1000))", number=100_000)
4.878508095000143
>>> timeit.timeit("{i: None for i in range(1000)}", number=100_000)
4.9192055170024105

Membership (about the same):
>>> timeit.timeit("random.randint(0, 999) in s", setup="import random; s =
set(i for i in range(1000))", number=10_000_000)
7.992674231001729
>>> timeit.timeit("random.randint(0, 999) in d", setup="import random; d =
{i: None for i in range(1000)}", number=10_000_000)
8.032404395999038

*Add* (much faster for dicts):
>>> timeit.timeit("s = set(); s.add(0)", number=100_000_000)
13.330938750001224
>>> timeit.timeit("d = {}; d[0] = None", number=100_000_000)
5.788865385999088

*Update* (much faster for updating sets):
>>> timeit.timeit("s.update(other_s)", setup="s = set(); other_s = set(i
for i in range(1000))", number=1_000_000)
6.2428832090008655
>>> timeit.timeit("d.update(other_d)", setup="d = {}; other_d = {i: None
for i in range(1000)}", number=1_000_000)
13.031371458000649

*Create list from keys* (faster for sets):
>>> timeit.timeit("list(s)", setup="s = set(i for i in range(1000))",
number=10_00_000)
5.24364303600305
>>> timeit.timeit("list(d)", setup="d = {i: None for i in range(1000)}",
number=10_00_000)
6.303017043999716

Removal isn't easily comparable, since s.pop(), s.discard(), d.pop(), and
d.popitem() all behave quite differently. Also, I'm sure these benchmarks
could be improved significantly, particularly with the "Add" ones. This
should provide a decent general idea though.

On Mon, Dec 23, 2019 at 8:12 PM Guido van Rossum <gu...@python.org> wrote:

> To begin with, please compare performance of dict-with-None-values to that
> of a set, for operations that can be expressed by both (e.g. both have
> update()).
>
> On Mon, Dec 23, 2019 at 18:08 Kyle Stanley <aeros...@gmail.com> wrote:
>
>> Larry Hastings wrote:
>> > a hypothetical collections.OrderedSet would probably work just fine.
>>
>> I'm also in agreement with adding a collections.OrderedSet. There
>> certainly seems to be a practical use case for a Set that preserves
>> insertion order, but with significant push back to modifying the existing
>> Set implementation (primarily due to performance concerns).
>>
>> If later down the road an improvement to OrderedSet is made that gives it
>> equal or better average performance across the board compared to Set, we
>> can consider changing the default implementation *if* there's
>> significant demand for it.
>>
>> Larry Hastings wrote:
>> > And he'd probably use it too, as that makes the code clearer / easier
>> to read.  If you read code using an OrderedSet, you know what the intent
>> was and what the semantics are.  If you see code using a dict but setting
>> the values to None, you think "that's crazy" and now you have a puzzle to
>> solve.  Let's hope those comments are accurate!
>>
>> This is an excellent point. My first thought if someone were to be using
>> a Dict with all values set to None would be: why not use a Set here? As
>> Larry said, the need for preservation of insertion order would have to be
>> explicitly described in the code comments. Realistically speaking, code
>> comments are not something that can be consistently relied upon, especially
>> if it's part of some boilerplate format that gets replicated or if the
>> container is used in multiple locations. I'd much rather have a dedicated
>> data structure that clearly describes what it does based on the name alone,
>> IMO that's a million times better for readability purposes.
>>
>> Also, this is mostly speculation since I haven't ran any benchmarks for
>> an OrderedSet implementation, but I strongly suspect that OrderedSet will
>> end up having better average performance for add, pop, and update than
>> trying to use a dictionary with None values (assuming it's implemented in C
>> or at least with a C accelerator).
>>
>> Not to mention allowing the ability to update (for both addition and
>> removal) based on intersections and unions across ordered sets, which
>> currently isn't possible to do with dictionaries (at least not directly
>> with |=, &=, -=. and ^=). I'm not sure how much of a practical use case
>> this would have for ordered sets specifically, but it's a nice bonus.
>>
>> On Sat, Dec 21, 2019 at 7:59 AM Larry Hastings <la...@hastings.org>
>> wrote:
>>
>>>
>>> (mixing together messages in the thread, sorry threaded-view readers)
>>>
>>> On 12/19/19 3:15 PM, Tim Peters wrote:
>>>
>>> [Nick]
>>>
>>> I took Larry's request a slightly different way:
>>>
>>> [...]
>>>
>>> Only Larry can answer whether that would meet his perceived need.  My
>>> _guess_ is that he wouldn't know OrderedSet existed, and, even if he
>>> did, he'd use a dict with None values anyway (because it's less hassle
>>> and does everything he wanted).
>>>
>>>
>>> At last, something I can speak knowledgeably about: Larry's use case!
>>> Regarding Larry, I'd say
>>>
>>>    1. his use case was small enough that almost anything maintaining
>>>    order would have worked, even a list,
>>>    2. he often *does have* a pretty good idea what goodies are salted
>>>    away in the Python standard library, and
>>>    3. a hypothetical collections.OrderedSet would probably work just
>>>    fine.  And he'd probably use it too, as that makes the code clearer /
>>>    easier to read.  If you read code using an OrderedSet, you know what the
>>>    intent was and what the semantics are.  If you see code using a dict but
>>>    setting the values to None, you think "that's crazy" and now you have a
>>>    puzzle to solve.  Let's hope those comments are accurate!
>>>
>>>
>>> Also, speaking personally, at least once (maybe twice?) in this thread
>>> folks have asked "would YOU, Mr Larry, really want ordered sets if it meant
>>> sets were slightly slower?"
>>>
>>> The first thing I'd say is, I'm not sure why people should care about
>>> what's best for me.  That's sweet of you!  But you really shouldn't.
>>>
>>> The second thing is, my needs are modest, so the speed hit we're talking
>>> about would likely be statistically insignificant, for me.
>>>
>>> And the third thing is, I don't really put the set() API through much of
>>> a workout anyway.  I build sets, I add and remove items, I iterate over
>>> them, I do the occasional union, and only very rarely do I use anything
>>> else.  Even then, when I write code where I reach for a fancy operation
>>> like intersection or symmetric_difference--what tends to happen is, I have
>>> a rethink and a refactor, and after I simplify my code I don't need the
>>> fancy operations anymore.  I can't remember the last time I used one of
>>> these fancy operations where the code really stuck around for a long time.
>>>
>>> So, personally?  Sure, I'd take that tradeoff.  As already established,
>>> I like that dict objects maintain their order, and I think it'd be swell if
>>> sets maintained their order too.  I suspect the performance hit wouldn't
>>> affect *me* in any meaningful way, and I could benefit from the
>>> order-maintaining semantics.  I bet it'd have other benefits too, like
>>> making regression tests more stable.  And my (admittedly uninformed!)
>>> assumption is, the loss of performance would mostly be in these
>>> sophisticated operations that I never seem to use anyway.
>>>
>>> But I don't mistake my needs for the needs of the Python community at
>>> large.  I'd be mighty interested to read other folks coming forward and
>>> saying, for example, "please don't make set objects any slower!" and
>>> talking us through neat real-world use cases.  Bonus points if you include
>>> mind-boggling numbers and benchmarks!
>>>
>>>
>>> On 12/20/19 5:09 AM, Peter Wang wrote:
>>>
>>> As Larry pointed out earlier, ordered dicts posed a problem for
>>> MicroPython.
>>>
>>> Just a minor correction: that wasn't me, that was Petr Viktorin.
>>>
>>>
>>> Ten days left until we retire 2.7,
>>>
>>>
>>> */arry*
>>> _______________________________________________
>>> Python-Dev mailing list -- python-dev@python.org
>>> To unsubscribe send an email to python-dev-le...@python.org
>>> https://mail.python.org/mailman3/lists/python-dev.python.org/
>>> Message archived at
>>> https://mail.python.org/archives/list/python-dev@python.org/message/JYSJ55CWUE2FFK2K2Y75EWE7R7M6ZDGG/
>>> Code of Conduct: http://python.org/psf/codeofconduct/
>>>
>> _______________________________________________
>> Python-Dev mailing list -- python-dev@python.org
>> To unsubscribe send an email to python-dev-le...@python.org
>> https://mail.python.org/mailman3/lists/python-dev.python.org/
>> Message archived at
>> https://mail.python.org/archives/list/python-dev@python.org/message/2KEAXZBQP6YJ2EV4YURQZ7NFFRJZQRFH/
>> Code of Conduct: http://python.org/psf/codeofconduct/
>>
> --
> --Guido (mobile)
>

_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/A7KJQJBCAIBWPIGHYTEMQTLQCELU55FT/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-Dev] Re: Should set objects maintain insertion order too?

Reply via email to