[Python-ideas] Re: giving set.add a return value

2020-06-25 Thread Bar Harel
I believe I'm -0 for modifying the existing add() and +0 for a new function.

By the way, a good phrasing of the general mutating/non-mutating convention
opens the built-in types section of the docs:

"Some collection classes are mutable. The methods that add, subtract, or
rearrange their members in place, and don’t return a specific item, never
return the collection instance itself but None."

Note the use of the word item, rather than any return value.

Out of curiosity though, were we to change set.add(), how could we keep
backwards compatibility in the c-api?

On the Python level, we might find people do "val = set.add(i) or i", or
even "any(set.add(i) for i in range (10))" which would break. I myself
wrote the former quite often with "list.append" etc.

On the c-api, the problem might be worse -
 I wouldn't be surprised if people use PyCheckExact or pointer comparison
for the results of both PySet_Add and set_add of the set object.

Is there a way to maintain compatibility on the c-api that I'm not aware
of?

On Fri, Jun 26, 2020, 3:41 AM Greg Ewing 
wrote:

> On 26/06/20 4:01 am, Steele Farnsworth wrote:
> > My point was only that, as far as I know, all the methods for built in
> > container types that serve only to change what is contained return None,
> > and that this was an intentional design choice, so changing it in one
> > case would have to evoke a larger discussion about what those sorts of
> > methods should return.
>
> The reason for that convention is so that there is no confusion
> about which methods return new objects and which modify the object
> in place.
>
> However, achieving that goal only requires that mutating methods
> don't return 'self'. It doesn't mean they can't return some other
> value that might be useful.
>
> A wider argument against that might be that methods should be
> classifiable into "procedures" (that have side effects but not
> return values) and "functions" (that return values but don't
> have side effects). I'm not sure whether that's considered part
> of the Python design philosophy -- I don't remember seeing much
> discussion about it.
>
> In this particular case, it might be better to add a new method
> rather than redefining the existing 'add' method, because if code
> assuming the new behaviour were run on an older version of Python,
> it would fail in an obscure way.
>
> --
> Greg
> ___
> Python-ideas mailing list -- python-ideas@python.org
> To unsubscribe send an email to python-ideas-le...@python.org
> https://mail.python.org/mailman3/lists/python-ideas.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-ideas@python.org/message/ZYYUMHMD2SCNOETAPQQHDCET7JUQ5ZNZ/
> Code of Conduct: http://python.org/psf/codeofconduct/
>
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/SAZ5GTS6THVYUYMKU2VNE5RI3IJI3YCC/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Allow signal suppression

2020-06-25 Thread Yonatan Zunger via Python-ideas
Oh, that's definitely part of the problem, but that is *far* beyond my
ability to fix. Right now I'm still working on getting its owners to do
things like "could you please log somewhere when you kill jobs, and maybe
even indicate why the job was killed?".

The main time that signals show up in life outside of writing device
drivers and the like is when implementing or interacting with runtime
environments, basically, as they're the mechanism of interruptive
interprocess communication, noncooperative scheduling control, and so on.

Typical horrifying example: One of the systems I've do control is the shell
that runs cron jobs (not their scheduling, but their actual execution)
which needs to provide an outer harness that manages getting commands from
the scheduler, integration with all sorts of logging and monitoring
systems, etc., and needs to actually execute the Python code of the real
jobs inside it. It needs to do various kinds of noncooperative scheduling
to those subtasks (timeouts, killing and replacing workers under various
circumstances, etc) and so runs them in a subprocess. So I get several
layers of signals: incoming ones from the SIGTERM-happy outer runtime
environment (GCP), ones from the outer runner harness to the inner jobs,
and the logic in the inner jobs.

And alas, the logic in some of the inner jobs has to make fundamentally
non-idempotent, state-changing requests over API's to 3P systems that I
don't control, and which if terminated leave the 3P system in an
indeterminate and undeterminable state. Which means that if the cron job
gets terminated in the middle of that API request, the system ends up in an
unknown state, and whatever you do to get it into a known state will be
wrong (leading to user-visible bad behavior) half the time. And because its
final state can't be determined from its own API, and it can't be invoked
idempotently, you can't even use a 2-phase commit approach to protect that.

But it turns out that signal suppression does actually make this problem go
away enough to be manageable in prod. Except that the code now has to be
changed from single-threaded to multi-threaded for various other reasons,
and so signal suppression by changing the signal handlers and then changing
them back no longer works.


So that's an example of why you might find yourself in such a situation in
userland. And overall, Python's signal handling mechanism is pretty good;
it's *way* nicer than having to deal with it in C, since signal handlers
run in the main thread as more-or-less ordinary Python code, and you don't
have to deal with the equivalent of signal-safety and the like. The
downside of that flexibility, though, is that some tasks like deferring
signals end up being *really hard* in the Python layer, because even
appending the signum to an array isn't atomic enough to guarantee that it
won't be interrupted by another signal.


On Thu, Jun 25, 2020 at 5:43 PM Bernardo Sulzbach <
berna...@bernardosulzbach.com> wrote:

> On Thu, Jun 25, 2020 at 5:09 PM Yonatan Zunger via Python-ideas <
> python-ideas@python.org> wrote:
>
>> Hey everyone,
>>
>> I've been developing code which (alas) needs to operate in a runtime
>> environment which is quite *enthusiastic* about sending SIGTERMs and the
>> like, and where there are critical short sections of code that, if
>> interrupted, are very hard to resume without some user-visible anomaly
>> happening.
>>
>
> I find, for reasons you have already mentioned, having a "suppress all
> signals" something _really_ strange in userland code. But maybe I just have
> never seen a case in which it makes sense. Are you sure that the problem
> isn't "a runtime environment which is quite enthusiastic about sending
> SIGTERMs"?
>


-- 

Yonatan Zunger

Distinguished Engineer and Chief Ethics Officer

He / Him

zun...@humu.com

100 View St, Suite 101

Mountain View, CA 94041

Humu.com   · LinkedIn
  · Twitter

___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/AZY5OCDWJNRDCGKRURLMRATHTWQPOBQY/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Allow signal suppression

2020-06-25 Thread Bernardo Sulzbach
On Thu, Jun 25, 2020 at 5:09 PM Yonatan Zunger via Python-ideas <
python-ideas@python.org> wrote:

> Hey everyone,
>
> I've been developing code which (alas) needs to operate in a runtime
> environment which is quite *enthusiastic* about sending SIGTERMs and the
> like, and where there are critical short sections of code that, if
> interrupted, are very hard to resume without some user-visible anomaly
> happening.
>

I find, for reasons you have already mentioned, having a "suppress all
signals" something _really_ strange in userland code. But maybe I just have
never seen a case in which it makes sense. Are you sure that the problem
isn't "a runtime environment which is quite enthusiastic about sending
SIGTERMs"?
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/KJUNFGDEEMNL6IOUAUTOOT6TISO3RMGI/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: giving set.add a return value

2020-06-25 Thread Greg Ewing

On 26/06/20 4:01 am, Steele Farnsworth wrote:
My point was only that, as far as I know, all the methods for built in 
container types that serve only to change what is contained return None, 
and that this was an intentional design choice, so changing it in one 
case would have to evoke a larger discussion about what those sorts of 
methods should return.


The reason for that convention is so that there is no confusion
about which methods return new objects and which modify the object
in place.

However, achieving that goal only requires that mutating methods
don't return 'self'. It doesn't mean they can't return some other
value that might be useful.

A wider argument against that might be that methods should be
classifiable into "procedures" (that have side effects but not
return values) and "functions" (that return values but don't
have side effects). I'm not sure whether that's considered part
of the Python design philosophy -- I don't remember seeing much
discussion about it.

In this particular case, it might be better to add a new method
rather than redefining the existing 'add' method, because if code
assuming the new behaviour were run on an older version of Python,
it would fail in an obscure way.

--
Greg
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/ZYYUMHMD2SCNOETAPQQHDCET7JUQ5ZNZ/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: giving set.add a return value

2020-06-25 Thread Greg Ewing

On 26/06/20 2:45 am, Steele Farnsworth wrote:
Personally, I'd want to see mutator methods return `self` so you can do 
more than one mutation in a statement, but the convention is that all 
the mutator methods return `None`.


I would say the convention is that mutator methods don't return 'self'.
That doesn't preclude them from returning some other useful value.

--
Greg
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/HF5GGQ6WZD3RQ2J2H7AHZQPM24RH2YFT/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Allow signal suppression

2020-06-25 Thread Yonatan Zunger via Python-ideas
Absolutely, but I figured the natural thing to expose from the C API was a
very minimal function, and then put a context manager in the Python layer.

The actual context manager implementation I would use would be a bit
smarter than a bare set/reset -- it would use an unbounded semaphore and
only unsuppress when the count dropped to zero. That way, multiple threads
can simultaneously have critical regions, and suppression would happen
whenever anyone was suppressing.

On Thu, Jun 25, 2020 at 1:54 PM MRAB  wrote:

> On 2020-06-25 21:05, Yonatan Zunger via Python-ideas wrote:
> > Hey everyone,
> >
> > I've been developing code which (alas) needs to operate in a runtime
> > environment which is quite /enthusiastic/ about sending SIGTERMs and the
> > like, and where there are critical short sections of code that, if
> > interrupted, are very hard to resume without some user-visible anomaly
> > happening. This means getting to know the signal handling logic far too
> > well. In particular, it means that preventing signals in a "dangerous
> > window" is very difficult in the current language: while you can change
> > the signal handlers to "suppressing" handlers and then restore them from
> > the main thread, if you have potentially critical regions running in any
> > non-main thread, there's no good way for them to tell the main thread to
> > change the handlers... except by sending the main thread a signal. That
> > requires the "suppressing" handler to have nontrivial logic in it,
> > /but/ Python signal handlers are reentrant: signals are not suppressed
> > /during a signal handler/, and so hilarity ensues.
> >
> > Digging through all of this, though, there seems to be one interesting
> > thing that could be done in CPython in particular, and so I have a
> > possibly-crazy proposal for a runtime-specific extension.
> >
> > *Proposal: Add (as an optional part of the spec, runtimes may choose to
> > implement if they wish) sys.suppress_signals(bool).* Calling this
> > function with a value of True would defer all signal handling until it
> > was called again with a value of False.
> >
> > The reason this is potentially insane is that suppressing signals does
> > all sorts of things: e.g., it prevents keyboard interrupts, blocks
> > ENOPIPE errors if you try to write to a pipe with a terminated peer,
> > stops child processes from reporting their exit to the parent, screws
> > with profiling timers, and so on. This would be an a proper footgun if
> > misused, the sort of thing you should only activate if you actually
> > understand POSIX signals in depth.
> >
> > The implementation in CPython would be surprisingly simple: simply store
> > the boolean value in an atomic int, and add a second check at the start
> > of _PyErr_CheckSignalsTstate
> > <
> https://github.com/python/cpython/blob/master/Modules/signalmodule.c#L1693>,
>
> > prior to clearing is_tripped.
> >
> > How insane does this idea sound to people?
> >
> Wouldn't it be tidier as a context manager?
>
> with sys.suppress_signals():
>  ...
> ___
> Python-ideas mailing list -- python-ideas@python.org
> To unsubscribe send an email to python-ideas-le...@python.org
> https://mail.python.org/mailman3/lists/python-ideas.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-ideas@python.org/message/EWDLTCA3JYWRIFG2TTSYLW7WONGUEVDR/
> Code of Conduct: http://python.org/psf/codeofconduct/
>


-- 

Yonatan Zunger

Distinguished Engineer and Chief Ethics Officer

He / Him

zun...@humu.com

100 View St, Suite 101

Mountain View, CA 94041

Humu.com   · LinkedIn
  · Twitter

___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/F6UFRA4N35RE2KFGGHPSXEGLMZFGVW4A/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Allow signal suppression

2020-06-25 Thread MRAB

On 2020-06-25 21:05, Yonatan Zunger via Python-ideas wrote:

Hey everyone,

I've been developing code which (alas) needs to operate in a runtime 
environment which is quite /enthusiastic/ about sending SIGTERMs and the 
like, and where there are critical short sections of code that, if 
interrupted, are very hard to resume without some user-visible anomaly 
happening. This means getting to know the signal handling logic far too 
well. In particular, it means that preventing signals in a "dangerous 
window" is very difficult in the current language: while you can change 
the signal handlers to "suppressing" handlers and then restore them from 
the main thread, if you have potentially critical regions running in any 
non-main thread, there's no good way for them to tell the main thread to 
change the handlers... except by sending the main thread a signal. That 
requires the "suppressing" handler to have nontrivial logic in it, 
/but/ Python signal handlers are reentrant: signals are not suppressed 
/during a signal handler/, and so hilarity ensues.


Digging through all of this, though, there seems to be one interesting 
thing that could be done in CPython in particular, and so I have a 
possibly-crazy proposal for a runtime-specific extension.


*Proposal: Add (as an optional part of the spec, runtimes may choose to 
implement if they wish) sys.suppress_signals(bool).* Calling this 
function with a value of True would defer all signal handling until it 
was called again with a value of False.


The reason this is potentially insane is that suppressing signals does 
all sorts of things: e.g., it prevents keyboard interrupts, blocks 
ENOPIPE errors if you try to write to a pipe with a terminated peer, 
stops child processes from reporting their exit to the parent, screws 
with profiling timers, and so on. This would be an a proper footgun if 
misused, the sort of thing you should only activate if you actually 
understand POSIX signals in depth.


The implementation in CPython would be surprisingly simple: simply store 
the boolean value in an atomic int, and add a second check at the start 
of _PyErr_CheckSignalsTstate 
, 
prior to clearing is_tripped.


How insane does this idea sound to people?


Wouldn't it be tidier as a context manager?

with sys.suppress_signals():
...
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/EWDLTCA3JYWRIFG2TTSYLW7WONGUEVDR/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Allow signal suppression

2020-06-25 Thread Yonatan Zunger via Python-ideas
Hey everyone,

I've been developing code which (alas) needs to operate in a runtime
environment which is quite *enthusiastic* about sending SIGTERMs and the
like, and where there are critical short sections of code that, if
interrupted, are very hard to resume without some user-visible anomaly
happening. This means getting to know the signal handling logic far too
well. In particular, it means that preventing signals in a "dangerous
window" is very difficult in the current language: while you can change the
signal handlers to "suppressing" handlers and then restore them from the
main thread, if you have potentially critical regions running in any
non-main thread, there's no good way for them to tell the main thread to
change the handlers... except by sending the main thread a signal. That
requires the "suppressing" handler to have nontrivial logic in it, *but* Python
signal handlers are reentrant: signals are not suppressed *during a signal
handler*, and so hilarity ensues.

Digging through all of this, though, there seems to be one interesting
thing that could be done in CPython in particular, and so I have a
possibly-crazy proposal for a runtime-specific extension.

*Proposal: Add (as an optional part of the spec, runtimes may choose to
implement if they wish) sys.suppress_signals(bool).* Calling this function
with a value of True would defer all signal handling until it was called
again with a value of False.

The reason this is potentially insane is that suppressing signals does all
sorts of things: e.g., it prevents keyboard interrupts, blocks ENOPIPE
errors if you try to write to a pipe with a terminated peer, stops child
processes from reporting their exit to the parent, screws with profiling
timers, and so on. This would be an a proper footgun if misused, the sort
of thing you should only activate if you actually understand POSIX signals
in depth.

The implementation in CPython would be surprisingly simple: simply store
the boolean value in an atomic int, and add a second check at the start of
_PyErr_CheckSignalsTstate
,
prior to clearing is_tripped.

How insane does this idea sound to people?

Yonatan

-- 

Yonatan Zunger

Distinguished Engineer and Chief Ethics Officer

He / Him

zun...@humu.com

100 View St, Suite 101

Mountain View, CA 94041

Humu.com   · LinkedIn
  · Twitter

___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/AHWGT5XLC6QI73PCBWPXN4CEBJE5D42F/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: giving set.add a return value

2020-06-25 Thread Jonathan Fine
Hi Ben

You've definitely got a point. When you do
seen.add(key)
it's sometimes important to know if this changes 'seen'.

Here's a one-liner that does what you want:
key in seen or seen.add(key)

It's a bit obscure, and perhaps a bit too much
https://en.wikipedia.org/wiki/Code_golf. So here's a worked example.

>>> seen = set()
>>> 0 in seen or seen.add(0)
>>> 0 in seen or seen.add(0)
True

The first time '0 in seen' is False, so the OR part 'seen.add(0)' is
executed. As you and others have said, seen.add(key) always return None.
(The REPL does not echo the expression value if it is None.)

The second time '0 in seen' is True, so the OR part is not done, and the
value of the expression is True.

If you want, in your own code you can subclass set to provide the behaviour
you think is missing. Or just write a utility function. I think that would
in practice be better.

>>> def in_or_add(s, key):
...return key in s or s.add(key) or False

>>> seen = set()
>>> in_or_add(seen, 0)
False
>>> in_or_add(seen, 0)
True

An aside. Your proposal is:

> value. set.add would return True if the item was already in the set prior
> to insertion, and False otherwise.


This allows you to simplify your code example. But I find the semantics a
bit odd. If set.add(key) were to return a boolean, I'd expect it to be True
if the item genuinely was added. But your semantics are the other way
around.

My proposal of in_or_add and your revision of set.add both have the same
semantics. I think that is clear. But they have different names. If enough
people use a utility function such as in_or_add(s, key), then that makes a
good case for adding it as a set method.

Two final points. While processing the 'not key in seen' branch, an
exception might occur. Sometimes it's better to add the key to 'seen' once
all the associated processing is complete. (An exception that leaves the
system in an inconsistent state can cause problems later.) The second point
is that I'm not confident about 'in_or_add' as a name for the method.

Once again, thank you for your interesting idea.
-- 
Jonathan
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/AIZUO34NMWKH5GRZSVQ4BNF66OILU5VI/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: giving set.add a return value

2020-06-25 Thread Brett Cannon
On Thu, Jun 25, 2020 at 10:10 AM Ben Avrahami 
wrote:

> On Thu, Jun 25, 2020 at 7:55 PM Christopher Barker 
> wrote:
>
>> On Thu, Jun 25, 2020 at 9:02 AM Steele Farnsworth 
>> wrote:
>> indeed -- and that is pretty darn baked in to Python, so I don't think
>> it's going to change.
>>
>
30 years of history beats out 8 years since the last discussion of this. 


>
>>
> Except this convention doesn't hold for dict.setvalue (which I did
> misspell, sorry), or for dict.pop.
>

I don't know what you mean by setvalue(), but I'm assuming you meant
setdefault(). If that's true, then the guideline isn't "all mutating
methods must return None", it's "*unless* the method is specifically
designed to have a return value based on the purpose of the method, the
method should return None".


> Both these methods fundamentally mutate the collection, but they also
> return a value (which could be retrieved by a non-mutating method), that
> somewhat pertains to the operation performed.
>

Both pop() and setdefault() are meant to return something, they just also
happen to mutate things. You're wanting the reverse and that isn't how
Python has chosen to go.

Same with people who bring up wanting a fluent API where 'self' is always
returned. It's a very specific decision that for methods that do nothing
but mutation they return None to signify that you're not getting back a new
object, but something that mutated in-place. This is why list.sort(), for
instance, returns None; you are not getting a new copy of the list sorted,
you actually changed the original object which you should have named with a
variable. That way you have to access the object by the name you already
gave it as a reminder its the same object.

Plus Python isn't big on the crazy one-lineers where you might want to do
an inline call of add() to simply save yourself one line of code. Explicit
is better than implicit, and that can mean needing to simply do something
in two lines instead of one.
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/XBUPWZ3ZPEQZ233TF6CRYPMGXQ7XSJ7Y/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: giving set.add a return value

2020-06-25 Thread Ben Avrahami
On Thu, Jun 25, 2020 at 7:55 PM Christopher Barker 
wrote:

> On Thu, Jun 25, 2020 at 9:02 AM Steele Farnsworth 
> wrote:
> indeed -- and that is pretty darn baked in to Python, so I don't think
> it's going to change.
>
>
Except this convention doesn't hold for dict.setvalue (which I did
misspell, sorry), or for dict.pop. Both these methods fundamentally mutate
the collection, but they also return a value (which could be retrieved by a
non-mutating method), that somewhat pertains to the operation performed.


> Note: I"n not sure your example with setdefault is correct:
>

Fair enough, I misspelled setdefault and messed up the example, what I
meant was:
seen = {}
for k, v in iterable_of_tuples:
  if seen.setdefault(k,v) is not v:
... # duplicate key
  else:
... # new key
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/PSQRU7QZHFFTFA45DQJKQF4RLFIHSGMF/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: giving set.add a return value

2020-06-25 Thread Christopher Barker
On Thu, Jun 25, 2020 at 9:02 AM Steele Farnsworth 
wrote:

> My point was only that, as far as I know, all the methods for built in
> container types that serve only to change what is contained return None,
> and that this was an intentional design choice, so changing it in one case
> would have to evoke a larger discussion about what those sorts of methods
> should return.
>

indeed -- and that is pretty darn baked in to Python, so I don't think it's
going to change.

Note: I"n not sure your example with setdefault is correct:

seen = {}
for i in iterable:
  if seen.set_default(i, some_value) is not None:
...  # do something in case of duplicates
  else:
... # do something in case of first visit

A) you spelled setdefault() wrong -- no underscore

B) seen.setdefault(i, some_value) will return some_value if it's not there,
and whatever the value it is it is (in this case, starting with an empty
dict, it will be some_value always.

Running this code:

some_value = "sentinel"

iterable = [3, 2, 4, 2, 3]

seen = {}
for i in iterable:
if seen.setdefault(i, some_value) is not None:
# do something in case of duplicates
print(f'{i} was already in there')
else:
# do something in case of first visit
print(f'{i} was not already there')

results in:

In [9]: run in_set.py

3 was already in there
2 was already in there
4 was already in there
2 was already in there
3 was already in there

so not working.

But you can make it work if you reset the value:

some_value = "sentinel"

iterable = [3, 2, 4, 2, 3]

seen = {}
for i in iterable:
if seen.setdefault(i, None) is not None:
# do something in case of duplicates
print(f'{i} was already in there')
else:
# do something in case of first visit
print(f'{i} was not already there')
seen[i] = some_value

In [11]: run in_set.py

3 was not already there
2 was not already there
4 was not already there
2 was already in there
3 was already in there

But this is a bit klunky as well, not really any better than the set
version.




However, for the case at hand, adding a method similar to the
dict.setdefault() would be a reasonable thing to do. I'm not sure what to
call it, or what the API should be, but maybe:

class my_set(set):

def add_if_not_there(self, item):
if item in self:
return True
else:
self.add(item)
return False

seen = my_set()

for i in iterable:
if seen.add_if_not_there(i):
print(f'{i} was already in there')
else:
print(f'{i} was not already there')

However, while dict.setdefault does clean up and clarify otherwise somewhat
ugly code, I'm not sure this is that much better than:

for i in iterable:
if i in seen:
print(f'{i} was already in there')
else:
seen.add(i)
print(f'{i} was not already there')

But feel free to make the case :-)

Note that setdefault is in the MutableMapping ABC, so there could be some
debate about whether to add this new method to the MutableSet ABC.

-CHB




>
> I wouldn't be opposed to that discussion happening and for any changes
> that are made to happen within 3.x because I doubt that very much code that
> currently exists depends on these methods returning None or even use what
> they return at all.
>
> On Thu, Jun 25, 2020, 10:28 AM Ben Avrahami 
> wrote:
>
>> Hey all,
>> Often I've found this kind of code:
>>
>> seen = set()
>> for i in iterable:
>>   if i in seen:
>> ...  # do something in case of duplicates
>>   else:
>> seen.add(i)
>> ... # do something in case of first visit
>>
>> This kind of code appears whenever one needs to check for duplicates in
>> case of a user-submitted iterable, or when we loop over a recursive
>> iteration that may involve cycles (graph search or the like). This code
>> could be improved if one could ensure an item is in the set, and get
>> whether it was there before in one operation. This may seem overly
>> specific, but dicts do do this:
>>
>> seen = {}
>> for i in iterable:
>>   if seen.set_default(i, some_value) is not None:
>> ...  # do something in case of duplicates
>>   else:
>> ... # do something in case of first visit
>>
>> I think the set type would benefit greatly from its add method having a
>> return value. set.add would return True if the item was already in the set
>> prior to insertion, and False otherwise.
>>
>> Looking at the Cpython code, the set_add_entry already detects existing
>> entries, adding a return value would require no additional complexity.
>>
>> Any thoughts?
>> ___
>> Python-ideas mailing list -- python-ideas@python.org
>> To unsubscribe send an email to python-ideas-le...@python.org
>> https://mail.python.org/mailman3/lists/python-ideas.python.org/
>> Message archived at
>> https://mail.python.org/archives/list/python-ideas@python.org/message/6WYNYNG5J5HBD3PA7PW75RP4PMLOMH4C/
>> Code of Conduct: 

[Python-ideas] Re: giving set.add a return value

2020-06-25 Thread Ben Avrahami
On Thu, Jun 25, 2020 at 6:44 PM Alex Hall  wrote:

> Previous discussions on this:
>
>
> https://mail.python.org/archives/list/python-ideas@python.org/thread/ASHOHN32BQPBVPIGBZQRS24XHXFMB6XZ/
>
> https://mail.python.org/archives/list/python-ideas@python.org/thread/K5SS62AB5DFFZIJ7ASKPLB2P3XGSYFPC/
>  (seems
> like part of the above discussion that got separated)
>
> https://mail.python.org/archives/list/python-ideas@python.org/thread/CKF2GFI3HKQAAYRCMMRPTFMONG3UGO4T/#CKF2GFI3HKQAAYRCMMRPTFMONG3UGO4T
>

The latest of these discussions is over 8 years old, and I think it
deserves bringing up again.

The convention that "methods that modify self return None" is useful, but I
would say that dict's equivalent method: dict.setdefault is a modifying
method that also returns a pertinent (and often useful value), even though
it could too be split into two methods (__getitem__ and __setitem__ if the
key is missing), as does dict.pop (BTW, I think that set.discard should
also return a boolean, but that discussion can wait).

Indeed, adding a return value to set.add would not improve runtime
performance, but I believe it would both improve readability and make code
more fluent to write.

As for teaching, I don't think this additional bit of complexity would be
too difficult to learn. "set.add ensures that an item is in the set and
returns whether it was already included" seems pretty straightforward to
me. Not to mention that anyone coming from C#, Java, C++, and many other
languages would already be familiar with this concept.

Overall, I think my gripe is that, currently, set() does not have the
capabilities of a dict that maps to NoneType, that the simplest
implementation of set in python is more powerful than the standard
library's version.
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/UHCVO5PAS7DJXSGLMYEW72BRCE3RZUYT/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: giving set.add a return value

2020-06-25 Thread Steele Farnsworth
My point was only that, as far as I know, all the methods for built in
container types that serve only to change what is contained return None,
and that this was an intentional design choice, so changing it in one case
would have to evoke a larger discussion about what those sorts of methods
should return.

I wouldn't be opposed to that discussion happening and for any changes that
are made to happen within 3.x because I doubt that very much code that
currently exists depends on these methods returning None or even use what
they return at all.

On Thu, Jun 25, 2020, 10:28 AM Ben Avrahami  wrote:

> Hey all,
> Often I've found this kind of code:
>
> seen = set()
> for i in iterable:
>   if i in seen:
> ...  # do something in case of duplicates
>   else:
> seen.add(i)
> ... # do something in case of first visit
>
> This kind of code appears whenever one needs to check for duplicates in
> case of a user-submitted iterable, or when we loop over a recursive
> iteration that may involve cycles (graph search or the like). This code
> could be improved if one could ensure an item is in the set, and get
> whether it was there before in one operation. This may seem overly
> specific, but dicts do do this:
>
> seen = {}
> for i in iterable:
>   if seen.set_default(i, some_value) is not None:
> ...  # do something in case of duplicates
>   else:
> ... # do something in case of first visit
>
> I think the set type would benefit greatly from its add method having a
> return value. set.add would return True if the item was already in the set
> prior to insertion, and False otherwise.
>
> Looking at the Cpython code, the set_add_entry already detects existing
> entries, adding a return value would require no additional complexity.
>
> Any thoughts?
> ___
> Python-ideas mailing list -- python-ideas@python.org
> To unsubscribe send an email to python-ideas-le...@python.org
> https://mail.python.org/mailman3/lists/python-ideas.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-ideas@python.org/message/6WYNYNG5J5HBD3PA7PW75RP4PMLOMH4C/
> Code of Conduct: http://python.org/psf/codeofconduct/
>
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/ILNANLAGZR3S6VBMK7FJXUZZUMKGKJOV/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: giving set.add a return value

2020-06-25 Thread Alex Hall
Previous discussions on this:

https://mail.python.org/archives/list/python-ideas@python.org/thread/ASHOHN32BQPBVPIGBZQRS24XHXFMB6XZ/
https://mail.python.org/archives/list/python-ideas@python.org/thread/K5SS62AB5DFFZIJ7ASKPLB2P3XGSYFPC/
(seems
like part of the above discussion that got separated)
https://mail.python.org/archives/list/python-ideas@python.org/thread/CKF2GFI3HKQAAYRCMMRPTFMONG3UGO4T/#CKF2GFI3HKQAAYRCMMRPTFMONG3UGO4T

On Thu, Jun 25, 2020 at 4:30 PM Ben Avrahami  wrote:

> Hey all,
> Often I've found this kind of code:
>
> seen = set()
> for i in iterable:
>   if i in seen:
> ...  # do something in case of duplicates
>   else:
> seen.add(i)
> ... # do something in case of first visit
>
> This kind of code appears whenever one needs to check for duplicates in
> case of a user-submitted iterable, or when we loop over a recursive
> iteration that may involve cycles (graph search or the like). This code
> could be improved if one could ensure an item is in the set, and get
> whether it was there before in one operation. This may seem overly
> specific, but dicts do do this:
>
> seen = {}
> for i in iterable:
>   if seen.set_default(i, some_value) is not None:
> ...  # do something in case of duplicates
>   else:
> ... # do something in case of first visit
>
> I think the set type would benefit greatly from its add method having a
> return value. set.add would return True if the item was already in the set
> prior to insertion, and False otherwise.
>
> Looking at the Cpython code, the set_add_entry already detects existing
> entries, adding a return value would require no additional complexity.
>
> Any thoughts?
> ___
> Python-ideas mailing list -- python-ideas@python.org
> To unsubscribe send an email to python-ideas-le...@python.org
> https://mail.python.org/mailman3/lists/python-ideas.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-ideas@python.org/message/6WYNYNG5J5HBD3PA7PW75RP4PMLOMH4C/
> Code of Conduct: http://python.org/psf/codeofconduct/
>
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/Y7TV4ZQOXE3BEIEDU7STKST4EL5GXUQW/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: giving set.add a return value

2020-06-25 Thread Steele Farnsworth
Personally, I'd want to see mutator methods return `self` so you can do
more than one mutation in a statement, but the convention is that all the
mutator methods return `None`.

On Thu, Jun 25, 2020, 10:28 AM Ben Avrahami  wrote:

> Hey all,
> Often I've found this kind of code:
>
> seen = set()
> for i in iterable:
>   if i in seen:
> ...  # do something in case of duplicates
>   else:
> seen.add(i)
> ... # do something in case of first visit
>
> This kind of code appears whenever one needs to check for duplicates in
> case of a user-submitted iterable, or when we loop over a recursive
> iteration that may involve cycles (graph search or the like). This code
> could be improved if one could ensure an item is in the set, and get
> whether it was there before in one operation. This may seem overly
> specific, but dicts do do this:
>
> seen = {}
> for i in iterable:
>   if seen.set_default(i, some_value) is not None:
> ...  # do something in case of duplicates
>   else:
> ... # do something in case of first visit
>
> I think the set type would benefit greatly from its add method having a
> return value. set.add would return True if the item was already in the set
> prior to insertion, and False otherwise.
>
> Looking at the Cpython code, the set_add_entry already detects existing
> entries, adding a return value would require no additional complexity.
>
> Any thoughts?
> ___
> Python-ideas mailing list -- python-ideas@python.org
> To unsubscribe send an email to python-ideas-le...@python.org
> https://mail.python.org/mailman3/lists/python-ideas.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-ideas@python.org/message/6WYNYNG5J5HBD3PA7PW75RP4PMLOMH4C/
> Code of Conduct: http://python.org/psf/codeofconduct/
>
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/GQ3GYFCYMQSXRM2OPJ6ZRAKS4IYNETMG/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] giving set.add a return value

2020-06-25 Thread Ben Avrahami
Hey all,
Often I've found this kind of code:

seen = set()
for i in iterable:
  if i in seen:
...  # do something in case of duplicates
  else:
seen.add(i)
... # do something in case of first visit

This kind of code appears whenever one needs to check for duplicates in
case of a user-submitted iterable, or when we loop over a recursive
iteration that may involve cycles (graph search or the like). This code
could be improved if one could ensure an item is in the set, and get
whether it was there before in one operation. This may seem overly
specific, but dicts do do this:

seen = {}
for i in iterable:
  if seen.set_default(i, some_value) is not None:
...  # do something in case of duplicates
  else:
... # do something in case of first visit

I think the set type would benefit greatly from its add method having a
return value. set.add would return True if the item was already in the set
prior to insertion, and False otherwise.

Looking at the Cpython code, the set_add_entry already detects existing
entries, adding a return value would require no additional complexity.

Any thoughts?
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/6WYNYNG5J5HBD3PA7PW75RP4PMLOMH4C/
Code of Conduct: http://python.org/psf/codeofconduct/