Re: Non-deterministic set ordering

2022-05-15 Thread MRAB

On 2022-05-16 04:20, Rob Cliffe via Python-list wrote:



On 16/05/2022 04:13, Dan Stromberg wrote:


On Sun, May 15, 2022 at 8:01 PM Rob Cliffe via Python-list 
 wrote:


I was shocked to discover that when repeatedly running the following
program (condensed from a "real" program) under Python 3.8.3

for p in { ('x','y'), ('y','x') }:
 print(p)

the output was sometimes

('y', 'x')
('x', 'y')

and sometimes

('x', 'y')
('y', 'x')

Can anyone explain why running identical code should result in
traversing a set in a different order?


Sets are defined as unordered so that they can be hashed internally to 
give O(1) operations for many tasks.


It wouldn't be unreasonable for sets to use a fixed-by-arbitrary 
ordering for a given group of set operations, but being unpredictable 
deters developers from mistakenly assuming they are ordered.


If you need order, you should use a tuple, list, or something like 
https://grantjenks.com/docs/sortedcontainers/sortedset.html

Thanks, I can work round this behaviour.
But I'm curious: where does the variability come from?  Is it deliberate
(as your answer seems to imply)?  AFAIK the same code within the *same
run* of a program does produce identical results.


Basically, Python uses hash randomisation in order to protect it against 
denial-of-service attacks. (Search for "PYTHONHASHSEED" in the docs.)


It also applied to dicts (the code for sets was based on that for 
dicts), but dicts now remember their insertion order.

--
https://mail.python.org/mailman/listinfo/python-list


Re: Non-deterministic set ordering

2022-05-15 Thread Rob Cliffe via Python-list

Thanks, Paul.  Question answered!
Rob Cliffe

On 16/05/2022 04:36, Paul Bryan wrote:

This may explain it:
https://stackoverflow.com/questions/27522626/hash-function-in-python-3-3-returns-different-results-between-sessions

On Mon, 2022-05-16 at 04:20 +0100, Rob Cliffe via Python-list wrote:



On 16/05/2022 04:13, Dan Stromberg wrote:


On Sun, May 15, 2022 at 8:01 PM Rob Cliffe via Python-list
 wrote:

    I was shocked to discover that when repeatedly running the following
    program (condensed from a "real" program) under Python 3.8.3

    for p in { ('x','y'), ('y','x') }:
     print(p)

    the output was sometimes

    ('y', 'x')
    ('x', 'y')

    and sometimes

    ('x', 'y')
    ('y', 'x')

    Can anyone explain why running identical code should result in
    traversing a set in a different order?


Sets are defined as unordered so that they can be hashed internally to
give O(1) operations for many tasks.

It wouldn't be unreasonable for sets to use a fixed-by-arbitrary
ordering for a given group of set operations, but being unpredictable
deters developers from mistakenly assuming they are ordered.

If you need order, you should use a tuple, list, or something like
https://grantjenks.com/docs/sortedcontainers/sortedset.html

Thanks, I can work round this behaviour.
But I'm curious: where does the variability come from?  Is it deliberate
(as your answer seems to imply)?  AFAIK the same code within the *same
run* of a program does produce identical results.
Best wishes
Rob Cliffe



--
https://mail.python.org/mailman/listinfo/python-list


Re: Non-deterministic set ordering

2022-05-15 Thread Paul Bryan
This may explain it:
https://stackoverflow.com/questions/27522626/hash-function-in-python-3-3-returns-different-results-between-sessions

On Mon, 2022-05-16 at 04:20 +0100, Rob Cliffe via Python-list wrote:
> 
> 
> On 16/05/2022 04:13, Dan Stromberg wrote:
> > 
> > On Sun, May 15, 2022 at 8:01 PM Rob Cliffe via Python-list 
> >  wrote:
> > 
> >     I was shocked to discover that when repeatedly running the
> > following
> >     program (condensed from a "real" program) under Python 3.8.3
> > 
> >     for p in { ('x','y'), ('y','x') }:
> >      print(p)
> > 
> >     the output was sometimes
> > 
> >     ('y', 'x')
> >     ('x', 'y')
> > 
> >     and sometimes
> > 
> >     ('x', 'y')
> >     ('y', 'x')
> > 
> >     Can anyone explain why running identical code should result in
> >     traversing a set in a different order?
> > 
> > 
> > Sets are defined as unordered so that they can be hashed internally
> > to 
> > give O(1) operations for many tasks.
> > 
> > It wouldn't be unreasonable for sets to use a fixed-by-arbitrary 
> > ordering for a given group of set operations, but being
> > unpredictable 
> > deters developers from mistakenly assuming they are ordered.
> > 
> > If you need order, you should use a tuple, list, or something like 
> > https://grantjenks.com/docs/sortedcontainers/sortedset.html
> Thanks, I can work round this behaviour.
> But I'm curious: where does the variability come from?  Is it
> deliberate 
> (as your answer seems to imply)?  AFAIK the same code within the
> *same 
> run* of a program does produce identical results.
> Best wishes
> Rob Cliffe

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Non-deterministic set ordering

2022-05-15 Thread Rob Cliffe via Python-list



On 16/05/2022 04:13, Dan Stromberg wrote:


On Sun, May 15, 2022 at 8:01 PM Rob Cliffe via Python-list 
 wrote:


I was shocked to discover that when repeatedly running the following
program (condensed from a "real" program) under Python 3.8.3

for p in { ('x','y'), ('y','x') }:
 print(p)

the output was sometimes

('y', 'x')
('x', 'y')

and sometimes

('x', 'y')
('y', 'x')

Can anyone explain why running identical code should result in
traversing a set in a different order?


Sets are defined as unordered so that they can be hashed internally to 
give O(1) operations for many tasks.


It wouldn't be unreasonable for sets to use a fixed-by-arbitrary 
ordering for a given group of set operations, but being unpredictable 
deters developers from mistakenly assuming they are ordered.


If you need order, you should use a tuple, list, or something like 
https://grantjenks.com/docs/sortedcontainers/sortedset.html

Thanks, I can work round this behaviour.
But I'm curious: where does the variability come from?  Is it deliberate 
(as your answer seems to imply)?  AFAIK the same code within the *same 
run* of a program does produce identical results.

Best wishes
Rob Cliffe
--
https://mail.python.org/mailman/listinfo/python-list


Re: Non-deterministic set ordering

2022-05-15 Thread Dan Stromberg
On Sun, May 15, 2022 at 8:01 PM Rob Cliffe via Python-list <
python-list@python.org> wrote:

> I was shocked to discover that when repeatedly running the following
> program (condensed from a "real" program) under Python 3.8.3
>
> for p in { ('x','y'), ('y','x') }:
>  print(p)
>
> the output was sometimes
>
> ('y', 'x')
> ('x', 'y')
>
> and sometimes
>
> ('x', 'y')
> ('y', 'x')
>
> Can anyone explain why running identical code should result in
> traversing a set in a different order?
>

Sets are defined as unordered so that they can be hashed internally to give
O(1) operations for many tasks.

It wouldn't be unreasonable for sets to use a fixed-by-arbitrary ordering
for a given group of set operations, but being unpredictable deters
developers from mistakenly assuming they are ordered.

If you need order, you should use a tuple, list, or something like
https://grantjenks.com/docs/sortedcontainers/sortedset.html
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Changing calling sequence

2022-05-15 Thread Greg Ewing

On 16/05/22 1:20 am, 2qdxy4rzwzuui...@potatochowder.com wrote:

IMO,
classmethods were/are a bad idea (yes, I'm probably in the minority
around here, but someone has to be).


I don't think class methods are a bad idea per se, but having
them visible through instances seems unnecessary and confusing.
I suspect that wasn't a deliberate design decision, but just a
side effect of using a single class dict for both class and
instance things.

--
Greg
--
https://mail.python.org/mailman/listinfo/python-list


Non-deterministic set ordering

2022-05-15 Thread Rob Cliffe via Python-list
I was shocked to discover that when repeatedly running the following 
program (condensed from a "real" program) under Python 3.8.3


for p in { ('x','y'), ('y','x') }:
    print(p)

the output was sometimes

('y', 'x')
('x', 'y')

and sometimes

('x', 'y')
('y', 'x')

Can anyone explain why running identical code should result in 
traversing a set in a different order?

Thanks
Rob Cliffe
--
https://mail.python.org/mailman/listinfo/python-list


Re: Changing calling sequence

2022-05-15 Thread 2QdxY4RzWzUUiLuE
On 2022-05-15 at 14:44:09 +1000,
Chris Angelico  wrote:

> On Sun, 15 May 2022 at 14:27, dn  wrote:
> >
> > On 15/05/2022 11.34, 2qdxy4rzwzuui...@potatochowder.com wrote:
> > > On 2022-05-15 at 10:22:15 +1200,
> > > dn  wrote:
> > >
> > >> That said, a function which starts with a list of ifs-buts-and-maybes*
> > >> which are only there to ascertain which set of arguments have been
> > >> provided by the calling-routine; obscures the purpose/responsibility
> > >> of the function and decreases its readability (perhaps not by much,
> > >> but varying by situation).
> > >
> > > Agreed.
> > >
> > >> Accordingly, if the function is actually a method, recommend following
> > >> @Stefan's approach, ie multiple-constructors. Although, this too can
> > >> result in lower readability.
> > >
> > > (Having proposed that approach myself (and having used it over the
> > > decades for functions, methods, procedures, constructors, ...), I also
> > > agree.)
> > >
> > > Assuming good names,¹ how can this lead to lower readability?  I guess
> > > if there's too many of them, or programmers have to start wondering
> > > which one to use?  Or is this in the same generally obfuscating category
> > > as the ifs-buts-and-maybes at the start of a function?
> > >
> > > ¹ and properly invalidated caches
> >
> > Allow me to extend the term "readability" to include "comprehension".
> > Then add the statistical expectation that a class has only __init__().

Aha.  In that light, yeah, in geeral, the more stuff there is, the
harder it is to get your head around it.  And even if I document the
class (or the module), no one makes the time to read (let alone
comprehend) the document, which *should* clarify all those things that
are hard to discern from the code itself.

> > Thus, assuming this is the first time (or, ... for a while) that the
> > class is being employed, one has to read much further to realise that
> > there are choices of constructor.
> 
> Yeah. I would generally say, though, that any classmethod should be
> looked at as a potential alternate constructor, or at least an
> alternate way to obtain objects (eg preconstructed objects with
> commonly-used configuration - imagine a SecuritySettings class with a
> classmethod to get different defaults).

I think opening up the class and sifting through its classmethods to
find the factory functions is what dn is talking about.  Such a design
also means that once I have a SecuritySettings object, its (the
instance's) methods include both instance and class level methods.  IMO,
classmethods were/are a bad idea (yes, I'm probably in the minority
around here, but someone has to be).  The first person to scream "but
discoverability" will be severely beaten with a soft cushion.

> > Borrowing from the earlier example:
> >
> > >   This would be quite pythonic. For example, "datetime.date"
> > >   has .fromtimestamp(timestamp), .fromordinal(ordinal),
> > >   .fromisoformat(date_string), ...
> >
> > Please remember that this is only relevant if the function is actually a
> > module - which sense does not appear from the OP (IMHO).

Note that datetime.date is a class, not a module.

> > The alternatives' names are well differentiated and (apparently#)
> > appropriately named*.

[...]

> > Continuing the 'have to read further' criticism (above), it could
> > equally-well be applied to my preference for keyword-arguments, in that
> > I've suggested defining four parameters but the user will only call the
> > function with either three or one argument(s). Could this be described
> > as potentially-confusing?

Potentially.  :-)

In a well designed *library*, common keywords across multiple functions
provide consistency, which is generally good.  Even a bit of redundancy
can be good for the same reason.

OTOH, when there's only one function, and it has a pile of keyword
parameters that can only be used in certain combinations, then it
definitely can be harder to read/understand/use than separate functions
with simpler interfaces.

> Yes, definitely. Personally, I'd split it into two, one that takes the
> existing three arguments (preferably with the same name, for
> compatibility), and one with a different name that takes just the one
> arg. That could be a small wrapper that calls the original, or the
> original could become a wrapper that calls the new one, or the main
> body could be refactored into a helper that they both call. It all
> depends what makes the most sense internally, because that's not part
> of the API at that point.
> 
> But it does depend on how the callers operate. Sometimes it's easier
> to have a single function with switchable argument forms, other times
> it's cleaner to separate them.

"Easier" and "cleaner" are very often orthogonal.  ;-)  (Rich Hickey
(creator of Clojure) talks a lot about the difference between "easy" and
"simple."  Arguemnts for and against Unix often involve similar terms.)

And "easier" or "cleaner" for whom?  The person writing the