On Mon, Dec 23, 2019 at 8:56 PM Kyle Stanley <aeros...@gmail.com> wrote:
>
> > To begin with, please compare performance of dict-with-None-values to that 
> > of a set, for operations that can be expressed by both (e.g. both have 
> > update()).
>
> Good idea. It would be useful to have a baseline comparison. I emboldened the 
> comparisons with a notable difference.
>
> Tested on:
> Version: Python 3.8.0
> OS: Linux kernel 5.4.5
> CPU: Intel i5-4460
>
> Initialization (about the same):
> >>> timeit.timeit("set(i for i in range(1000))", number=100_000)
> 4.878508095000143
> >>> timeit.timeit("{i: None for i in range(1000)}", number=100_000)
> 4.9192055170024105
>
> Membership (about the same):
> >>> timeit.timeit("random.randint(0, 999) in s", setup="import random; s = 
> >>> set(i for i in range(1000))", number=10_000_000)
> 7.992674231001729
> >>> timeit.timeit("random.randint(0, 999) in d", setup="import random; d = 
> >>> {i: None for i in range(1000)}", number=10_000_000)
> 8.032404395999038
>
> Add (much faster for dicts):
> >>> timeit.timeit("s = set(); s.add(0)", number=100_000_000)
> 13.330938750001224
> >>> timeit.timeit("d = {}; d[0] = None", number=100_000_000)
> 5.788865385999088
>
> Update (much faster for updating sets):
> >>> timeit.timeit("s.update(other_s)", setup="s = set(); other_s = set(i for 
> >>> i in range(1000))", number=1_000_000)
> 6.2428832090008655
> >>> timeit.timeit("d.update(other_d)", setup="d = {}; other_d = {i: None for 
> >>> i in range(1000)}", number=1_000_000)
> 13.031371458000649
>
> Create list from keys (faster for sets):
> >>> timeit.timeit("list(s)", setup="s = set(i for i in range(1000))", 
> >>> number=10_00_000)
> 5.24364303600305
> >>> timeit.timeit("list(d)", setup="d = {i: None for i in range(1000)}", 
> >>> number=10_00_000)
> 6.303017043999716
>
> Removal isn't easily comparable, since s.pop(), s.discard(), d.pop(), and 
> d.popitem() all behave quite differently. Also, I'm sure these benchmarks 
> could be improved significantly, particularly with the "Add" ones. This 
> should provide a decent general idea though.
>
> On Mon, Dec 23, 2019 at 8:12 PM Guido van Rossum <gu...@python.org> wrote:
>>
>> To begin with, please compare performance of dict-with-None-values to that 
>> of a set, for operations that can be expressed by both (e.g. both have 
>> update()).
>>
>> On Mon, Dec 23, 2019 at 18:08 Kyle Stanley <aeros...@gmail.com> wrote:
>>>
>>> Larry Hastings wrote:
>>> > a hypothetical collections.OrderedSet would probably work just fine.
>>>
>>> I'm also in agreement with adding a collections.OrderedSet. There certainly 
>>> seems to be a practical use case for a Set that preserves insertion order, 
>>> but with significant push back to modifying the existing Set implementation 
>>> (primarily due to performance concerns).
>>>
>>> If later down the road an improvement to OrderedSet is made that gives it 
>>> equal or better average performance across the board compared to Set, we 
>>> can consider changing the default implementation if there's significant 
>>> demand for it.
>>>
>>> Larry Hastings wrote:
>>> > And he'd probably use it too, as that makes the code clearer / easier to 
>>> > read.  If you read code using an OrderedSet, you know what the intent was 
>>> > and what the semantics are.  If you see code using a dict but setting the 
>>> > values to None, you think "that's crazy" and now you have a puzzle to 
>>> > solve.  Let's hope those comments are accurate!
>>>
>>> This is an excellent point. My first thought if someone were to be using a 
>>> Dict with all values set to None would be: why not use a Set here? As Larry 
>>> said, the need for preservation of insertion order would have to be 
>>> explicitly described in the code comments. Realistically speaking, code 
>>> comments are not something that can be consistently relied upon, especially 
>>> if it's part of some boilerplate format that gets replicated or if the 
>>> container is used in multiple locations. I'd much rather have a dedicated 
>>> data structure that clearly describes what it does based on the name alone, 
>>> IMO that's a million times better for readability purposes.
>>>
>>> Also, this is mostly speculation since I haven't ran any benchmarks for an 
>>> OrderedSet implementation, but I strongly suspect that OrderedSet will end 
>>> up having better average performance for add, pop, and update than trying 
>>> to use a dictionary with None values (assuming it's implemented in C or at 
>>> least with a C accelerator).
>>>
>>> Not to mention allowing the ability to update (for both addition and 
>>> removal) based on intersections and unions across ordered sets, which 
>>> currently isn't possible to do with dictionaries (at least not directly 
>>> with |=, &=, -=. and ^=). I'm not sure how much of a practical use case 
>>> this would have for ordered sets specifically, but it's a nice bonus.
>>>
>>> On Sat, Dec 21, 2019 at 7:59 AM Larry Hastings <la...@hastings.org> wrote:
>>>>
>>>>
>>>> (mixing together messages in the thread, sorry threaded-view readers)
>>>>
>>>> On 12/19/19 3:15 PM, Tim Peters wrote:
>>>>
>>>> [Nick]
>>>>
>>>> I took Larry's request a slightly different way:
>>>>
>>>> [...]
>>>>
>>>> Only Larry can answer whether that would meet his perceived need.  My
>>>> _guess_ is that he wouldn't know OrderedSet existed, and, even if he
>>>> did, he'd use a dict with None values anyway (because it's less hassle
>>>> and does everything he wanted).
>>>>
>>>>
>>>> At last, something I can speak knowledgeably about: Larry's use case!  
>>>> Regarding Larry, I'd say
>>>>
>>>> his use case was small enough that almost anything maintaining order would 
>>>> have worked, even a list,
>>>> he often does have a pretty good idea what goodies are salted away in the 
>>>> Python standard library, and
>>>> a hypothetical collections.OrderedSet would probably work just fine.  And 
>>>> he'd probably use it too, as that makes the code clearer / easier to read. 
>>>>  If you read code using an OrderedSet, you know what the intent was and 
>>>> what the semantics are.  If you see code using a dict but setting the 
>>>> values to None, you think "that's crazy" and now you have a puzzle to 
>>>> solve.  Let's hope those comments are accurate!
>>>>
>>>>
>>>> Also, speaking personally, at least once (maybe twice?) in this thread 
>>>> folks have asked "would YOU, Mr Larry, really want ordered sets if it 
>>>> meant sets were slightly slower?"
>>>>
>>>> The first thing I'd say is, I'm not sure why people should care about 
>>>> what's best for me.  That's sweet of you!  But you really shouldn't.
>>>>
>>>> The second thing is, my needs are modest, so the speed hit we're talking 
>>>> about would likely be statistically insignificant, for me.
>>>>
>>>> And the third thing is, I don't really put the set() API through much of a 
>>>> workout anyway.  I build sets, I add and remove items, I iterate over 
>>>> them, I do the occasional union, and only very rarely do I use anything 
>>>> else.  Even then, when I write code where I reach for a fancy operation 
>>>> like intersection or symmetric_difference--what tends to happen is, I have 
>>>> a rethink and a refactor, and after I simplify my code I don't need the 
>>>> fancy operations anymore.  I can't remember the last time I used one of 
>>>> these fancy operations where the code really stuck around for a long time.
>>>>
>>>> So, personally?  Sure, I'd take that tradeoff.  As already established, I 
>>>> like that dict objects maintain their order, and I think it'd be swell if 
>>>> sets maintained their order too.  I suspect the performance hit wouldn't 
>>>> affect me in any meaningful way, and I could benefit from the 
>>>> order-maintaining semantics.  I bet it'd have other benefits too, like 
>>>> making regression tests more stable.  And my (admittedly uninformed!) 
>>>> assumption is, the loss of performance would mostly be in these 
>>>> sophisticated operations that I never seem to use anyway.
>>>>
>>>> But I don't mistake my needs for the needs of the Python community at 
>>>> large.  I'd be mighty interested to read other folks coming forward and 
>>>> saying, for example, "please don't make set objects any slower!" and 
>>>> talking us through neat real-world use cases.  Bonus points if you include 
>>>> mind-boggling numbers and benchmarks!
>>>>
>>>>
>>>> On 12/20/19 5:09 AM, Peter Wang wrote:
>>>>
>>>> As Larry pointed out earlier, ordered dicts posed a problem for 
>>>> MicroPython.
>>>>
>>>> Just a minor correction: that wasn't me, that was Petr Viktorin.
>>>>
>>>>
>>>> Ten days left until we retire 2.7,
>>>>
>>>>
>>>> /arry
>>>>
>>>> _______________________________________________
>>>> Python-Dev mailing list -- python-dev@python.org
>>>> To unsubscribe send an email to python-dev-le...@python.org
>>>> https://mail.python.org/mailman3/lists/python-dev.python.org/
>>>> Message archived at 
>>>> https://mail.python.org/archives/list/python-dev@python.org/message/JYSJ55CWUE2FFK2K2Y75EWE7R7M6ZDGG/
>>>> Code of Conduct: http://python.org/psf/codeofconduct/
>>>
>>> _______________________________________________
>>> Python-Dev mailing list -- python-dev@python.org
>>> To unsubscribe send an email to python-dev-le...@python.org
>>> https://mail.python.org/mailman3/lists/python-dev.python.org/
>>> Message archived at 
>>> https://mail.python.org/archives/list/python-dev@python.org/message/2KEAXZBQP6YJ2EV4YURQZ7NFFRJZQRFH/
>>> Code of Conduct: http://python.org/psf/codeofconduct/
>>
>> --
>> --Guido (mobile)
>
> _______________________________________________
> Python-Dev mailing list -- python-dev@python.org
> To unsubscribe send an email to python-dev-le...@python.org
> https://mail.python.org/mailman3/lists/python-dev.python.org/
> Message archived at 
> https://mail.python.org/archives/list/python-dev@python.org/message/A7KJQJBCAIBWPIGHYTEMQTLQCELU55FT/
> Code of Conduct: http://python.org/psf/codeofconduct/
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/3YSQ7JE5N376PK7N3BGO5GLASDET6RQF/
Code of Conduct: http://python.org/psf/codeofconduct/

Reply via email to