On Thu, Jun 25, 2020 at 05:27:16PM +0300, Ben Avrahami wrote:
> Hey all,
> Often I've found this kind of code:
> 
> seen = set()
> for i in iterable:
>   if i in seen:
>     ...  # do something in case of duplicates
>   else:
>     seen.add(i)
>     ... # do something in case of first visit
> 
> This kind of code appears whenever one needs to check for duplicates in
> case of a user-submitted iterable, or when we loop over a recursive
> iteration that may involve cycles (graph search or the like). This code
> could be improved if one could ensure an item is in the set, and get
> whether it was there before in one operation.

I disagree that it would be improved. The existing idiom is simple and 
flexible and perfectly understandable even if you don't known the gory 
details about how sets work.

Most importantly, it matches the way people think about the task:

    # Task: look for duplicates
    if element in seen:
        # it's a duplicate
        ...
    else:
        # never seen before, so remember it
        ...

This idiom works with sets, it works with lists, it works with trees, it 
works with anything that supports membership testing and inserting into 
a collection. It is the natural way to think about the task.

Whereas this:

    # Check for duplicates.
    add element to the collection
    was it already there before you just added it?

is a weird way to think about the problem.

    already_there = seen.add(element)
    if already_there:
        # handle the duplicate case

Who thinks like that? *wink*

Now, I agree that sometimes for the sake of efficiency, we need to do 
things in a weird way. But membership testing in sets is cheap, it's 
not a linear search of a huge list. So the efficiency argument here is 
weak.

If you can demonstrate a non-contrived case where the efficiency 
argument was strong, then I might be persuaded that a new method (not 
`add`) could be justified.

But either way, you also have to decide whether the `add` (or the new 
method) should *unconditionally* insert the element, or only do so if it 
wasn't present. This makes a big difference:

    seen = {2}
    already_there = seen.add(2.0)


At this point, is `seen` the set {2} or {2.0}? Justify why one answer is 
the best answer.


-- 
Steven
_______________________________________________
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/M4BDA4AR555XY666ZNU5A5O4TNYDZLGR/
Code of Conduct: http://python.org/psf/codeofconduct/

Reply via email to