On Mon, Dec 23, 2019 at 8:56 PM Kyle Stanley <[email protected]> wrote:
>
> > To begin with, please compare performance of dict-with-None-values to that
> > of a set, for operations that can be expressed by both (e.g. both have
> > update()).
>
> Good idea. It would be useful to have a baseline comparison. I emboldened the
> comparisons with a notable difference.
>
> Tested on:
> Version: Python 3.8.0
> OS: Linux kernel 5.4.5
> CPU: Intel i5-4460
>
> Initialization (about the same):
> >>> timeit.timeit("set(i for i in range(1000))", number=100_000)
> 4.878508095000143
> >>> timeit.timeit("{i: None for i in range(1000)}", number=100_000)
> 4.9192055170024105
>
> Membership (about the same):
> >>> timeit.timeit("random.randint(0, 999) in s", setup="import random; s =
> >>> set(i for i in range(1000))", number=10_000_000)
> 7.992674231001729
> >>> timeit.timeit("random.randint(0, 999) in d", setup="import random; d =
> >>> {i: None for i in range(1000)}", number=10_000_000)
> 8.032404395999038
>
> Add (much faster for dicts):
> >>> timeit.timeit("s = set(); s.add(0)", number=100_000_000)
> 13.330938750001224
> >>> timeit.timeit("d = {}; d[0] = None", number=100_000_000)
> 5.788865385999088
>
> Update (much faster for updating sets):
> >>> timeit.timeit("s.update(other_s)", setup="s = set(); other_s = set(i for
> >>> i in range(1000))", number=1_000_000)
> 6.2428832090008655
> >>> timeit.timeit("d.update(other_d)", setup="d = {}; other_d = {i: None for
> >>> i in range(1000)}", number=1_000_000)
> 13.031371458000649
>
> Create list from keys (faster for sets):
> >>> timeit.timeit("list(s)", setup="s = set(i for i in range(1000))",
> >>> number=10_00_000)
> 5.24364303600305
> >>> timeit.timeit("list(d)", setup="d = {i: None for i in range(1000)}",
> >>> number=10_00_000)
> 6.303017043999716
>
> Removal isn't easily comparable, since s.pop(), s.discard(), d.pop(), and
> d.popitem() all behave quite differently. Also, I'm sure these benchmarks
> could be improved significantly, particularly with the "Add" ones. This
> should provide a decent general idea though.
>
> On Mon, Dec 23, 2019 at 8:12 PM Guido van Rossum <[email protected]> wrote:
>>
>> To begin with, please compare performance of dict-with-None-values to that
>> of a set, for operations that can be expressed by both (e.g. both have
>> update()).
>>
>> On Mon, Dec 23, 2019 at 18:08 Kyle Stanley <[email protected]> wrote:
>>>
>>> Larry Hastings wrote:
>>> > a hypothetical collections.OrderedSet would probably work just fine.
>>>
>>> I'm also in agreement with adding a collections.OrderedSet. There certainly
>>> seems to be a practical use case for a Set that preserves insertion order,
>>> but with significant push back to modifying the existing Set implementation
>>> (primarily due to performance concerns).
>>>
>>> If later down the road an improvement to OrderedSet is made that gives it
>>> equal or better average performance across the board compared to Set, we
>>> can consider changing the default implementation if there's significant
>>> demand for it.
>>>
>>> Larry Hastings wrote:
>>> > And he'd probably use it too, as that makes the code clearer / easier to
>>> > read. If you read code using an OrderedSet, you know what the intent was
>>> > and what the semantics are. If you see code using a dict but setting the
>>> > values to None, you think "that's crazy" and now you have a puzzle to
>>> > solve. Let's hope those comments are accurate!
>>>
>>> This is an excellent point. My first thought if someone were to be using a
>>> Dict with all values set to None would be: why not use a Set here? As Larry
>>> said, the need for preservation of insertion order would have to be
>>> explicitly described in the code comments. Realistically speaking, code
>>> comments are not something that can be consistently relied upon, especially
>>> if it's part of some boilerplate format that gets replicated or if the
>>> container is used in multiple locations. I'd much rather have a dedicated
>>> data structure that clearly describes what it does based on the name alone,
>>> IMO that's a million times better for readability purposes.
>>>
>>> Also, this is mostly speculation since I haven't ran any benchmarks for an
>>> OrderedSet implementation, but I strongly suspect that OrderedSet will end
>>> up having better average performance for add, pop, and update than trying
>>> to use a dictionary with None values (assuming it's implemented in C or at
>>> least with a C accelerator).
>>>
>>> Not to mention allowing the ability to update (for both addition and
>>> removal) based on intersections and unions across ordered sets, which
>>> currently isn't possible to do with dictionaries (at least not directly
>>> with |=, &=, -=. and ^=). I'm not sure how much of a practical use case
>>> this would have for ordered sets specifically, but it's a nice bonus.
>>>
>>> On Sat, Dec 21, 2019 at 7:59 AM Larry Hastings <[email protected]> wrote:
>>>>
>>>>
>>>> (mixing together messages in the thread, sorry threaded-view readers)
>>>>
>>>> On 12/19/19 3:15 PM, Tim Peters wrote:
>>>>
>>>> [Nick]
>>>>
>>>> I took Larry's request a slightly different way:
>>>>
>>>> [...]
>>>>
>>>> Only Larry can answer whether that would meet his perceived need. My
>>>> _guess_ is that he wouldn't know OrderedSet existed, and, even if he
>>>> did, he'd use a dict with None values anyway (because it's less hassle
>>>> and does everything he wanted).
>>>>
>>>>
>>>> At last, something I can speak knowledgeably about: Larry's use case!
>>>> Regarding Larry, I'd say
>>>>
>>>> his use case was small enough that almost anything maintaining order would
>>>> have worked, even a list,
>>>> he often does have a pretty good idea what goodies are salted away in the
>>>> Python standard library, and
>>>> a hypothetical collections.OrderedSet would probably work just fine. And
>>>> he'd probably use it too, as that makes the code clearer / easier to read.
>>>> If you read code using an OrderedSet, you know what the intent was and
>>>> what the semantics are. If you see code using a dict but setting the
>>>> values to None, you think "that's crazy" and now you have a puzzle to
>>>> solve. Let's hope those comments are accurate!
>>>>
>>>>
>>>> Also, speaking personally, at least once (maybe twice?) in this thread
>>>> folks have asked "would YOU, Mr Larry, really want ordered sets if it
>>>> meant sets were slightly slower?"
>>>>
>>>> The first thing I'd say is, I'm not sure why people should care about
>>>> what's best for me. That's sweet of you! But you really shouldn't.
>>>>
>>>> The second thing is, my needs are modest, so the speed hit we're talking
>>>> about would likely be statistically insignificant, for me.
>>>>
>>>> And the third thing is, I don't really put the set() API through much of a
>>>> workout anyway. I build sets, I add and remove items, I iterate over
>>>> them, I do the occasional union, and only very rarely do I use anything
>>>> else. Even then, when I write code where I reach for a fancy operation
>>>> like intersection or symmetric_difference--what tends to happen is, I have
>>>> a rethink and a refactor, and after I simplify my code I don't need the
>>>> fancy operations anymore. I can't remember the last time I used one of
>>>> these fancy operations where the code really stuck around for a long time.
>>>>
>>>> So, personally? Sure, I'd take that tradeoff. As already established, I
>>>> like that dict objects maintain their order, and I think it'd be swell if
>>>> sets maintained their order too. I suspect the performance hit wouldn't
>>>> affect me in any meaningful way, and I could benefit from the
>>>> order-maintaining semantics. I bet it'd have other benefits too, like
>>>> making regression tests more stable. And my (admittedly uninformed!)
>>>> assumption is, the loss of performance would mostly be in these
>>>> sophisticated operations that I never seem to use anyway.
>>>>
>>>> But I don't mistake my needs for the needs of the Python community at
>>>> large. I'd be mighty interested to read other folks coming forward and
>>>> saying, for example, "please don't make set objects any slower!" and
>>>> talking us through neat real-world use cases. Bonus points if you include
>>>> mind-boggling numbers and benchmarks!
>>>>
>>>>
>>>> On 12/20/19 5:09 AM, Peter Wang wrote:
>>>>
>>>> As Larry pointed out earlier, ordered dicts posed a problem for
>>>> MicroPython.
>>>>
>>>> Just a minor correction: that wasn't me, that was Petr Viktorin.
>>>>
>>>>
>>>> Ten days left until we retire 2.7,
>>>>
>>>>
>>>> /arry
>>>>
>>>> _______________________________________________
>>>> Python-Dev mailing list -- [email protected]
>>>> To unsubscribe send an email to [email protected]
>>>> https://mail.python.org/mailman3/lists/python-dev.python.org/
>>>> Message archived at
>>>> https://mail.python.org/archives/list/[email protected]/message/JYSJ55CWUE2FFK2K2Y75EWE7R7M6ZDGG/
>>>> Code of Conduct: http://python.org/psf/codeofconduct/
>>>
>>> _______________________________________________
>>> Python-Dev mailing list -- [email protected]
>>> To unsubscribe send an email to [email protected]
>>> https://mail.python.org/mailman3/lists/python-dev.python.org/
>>> Message archived at
>>> https://mail.python.org/archives/list/[email protected]/message/2KEAXZBQP6YJ2EV4YURQZ7NFFRJZQRFH/
>>> Code of Conduct: http://python.org/psf/codeofconduct/
>>
>> --
>> --Guido (mobile)
>
> _______________________________________________
> Python-Dev mailing list -- [email protected]
> To unsubscribe send an email to [email protected]
> https://mail.python.org/mailman3/lists/python-dev.python.org/
> Message archived at
> https://mail.python.org/archives/list/[email protected]/message/A7KJQJBCAIBWPIGHYTEMQTLQCELU55FT/
> Code of Conduct: http://python.org/psf/codeofconduct/
_______________________________________________
Python-Dev mailing list -- [email protected]
To unsubscribe send an email to [email protected]
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at
https://mail.python.org/archives/list/[email protected]/message/3YSQ7JE5N376PK7N3BGO5GLASDET6RQF/
Code of Conduct: http://python.org/psf/codeofconduct/