Re: Why operations between dict views return a set and not a frozenset?

2022-01-16 Thread Marco Sulla
Thank you a lot for letting me understand :)

On Tue, 11 Jan 2022 at 22:09, Peter J. Holzer  wrote:

> On 2022-01-11 19:49:20 +0100, Marco Sulla wrote:
> > I think this is what you mean:
> >
> > >>> dis.dis("for _ in {1, 2}: pass")
> >   1   0 SETUP_LOOP  12 (to 14)
> >   2 LOAD_CONST   3 (frozenset({1, 2}))
> >   4 GET_ITER
> > >>6 FOR_ITER 4 (to 12)
> >   8 STORE_NAME   0 (_)
> >  10 JUMP_ABSOLUTE6
> > >>   12 POP_BLOCK
> > >>   14 LOAD_CONST   2 (None)
> >  16 RETURN_VALUE
> > >>> a = {1, 2}
> > >>> dis.dis("for _ in a: pass")
> >   1   0 SETUP_LOOP  12 (to 14)
> >   2 LOAD_NAME0 (a)
> >   4 GET_ITER
> > >>6 FOR_ITER 4 (to 12)
> >   8 STORE_NAME   1 (_)
> >  10 JUMP_ABSOLUTE6
> > >>   12 POP_BLOCK
> > >>   14 LOAD_CONST   0 (None)
> >  16 RETURN_VALUE
>
> I think you have omitted the part that Chris was hinting at.
>
> >>> dis.dis("a = {1, 2};\nfor _ in a: pass")
>   1   0 LOAD_CONST   0 (1)
>   2 LOAD_CONST   1 (2)
>   4 BUILD_SET2
>   6 STORE_NAME   0 (a)
>
>   2   8 LOAD_NAME0 (a)
>  10 GET_ITER
> >>   12 FOR_ITER 4 (to 18)
>  14 STORE_NAME   1 (_)
>  16 JUMP_ABSOLUTE   12
> >>   18 LOAD_CONST   2 (None)
>  20 RETURN_VALUE
>
> Now compare
>
>   2 LOAD_CONST   3 (frozenset({1, 2}))
>
> with
>
>   1   0 LOAD_CONST   0 (1)
>   2 LOAD_CONST   1 (2)
>   4 BUILD_SET2
>
> and you see the difference between using a frozenset as a constant and
> building a set at runtime.
>
> hp
>
> --
>_  | Peter J. Holzer| Story must make more sense than reality.
> |_|_) ||
> | |   | h...@hjp.at |-- Charles Stross, "Creative writing
> __/   | http://www.hjp.at/ |   challenge!"
> --
> https://mail.python.org/mailman/listinfo/python-list
>
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Why operations between dict views return a set and not a frozenset?

2022-01-11 Thread Peter J. Holzer
On 2022-01-11 19:49:20 +0100, Marco Sulla wrote:
> I think this is what you mean:
> 
> >>> dis.dis("for _ in {1, 2}: pass")
>   1   0 SETUP_LOOP  12 (to 14)
>   2 LOAD_CONST   3 (frozenset({1, 2}))
>   4 GET_ITER
> >>6 FOR_ITER 4 (to 12)
>   8 STORE_NAME   0 (_)
>  10 JUMP_ABSOLUTE6
> >>   12 POP_BLOCK
> >>   14 LOAD_CONST   2 (None)
>  16 RETURN_VALUE
> >>> a = {1, 2}
> >>> dis.dis("for _ in a: pass")
>   1   0 SETUP_LOOP  12 (to 14)
>   2 LOAD_NAME0 (a)
>   4 GET_ITER
> >>6 FOR_ITER 4 (to 12)
>   8 STORE_NAME   1 (_)
>  10 JUMP_ABSOLUTE6
> >>   12 POP_BLOCK
> >>   14 LOAD_CONST   0 (None)
>  16 RETURN_VALUE

I think you have omitted the part that Chris was hinting at.

>>> dis.dis("a = {1, 2};\nfor _ in a: pass")
  1   0 LOAD_CONST   0 (1)
  2 LOAD_CONST   1 (2)
  4 BUILD_SET2
  6 STORE_NAME   0 (a)

  2   8 LOAD_NAME0 (a)
 10 GET_ITER
>>   12 FOR_ITER 4 (to 18)
 14 STORE_NAME   1 (_)
 16 JUMP_ABSOLUTE   12
>>   18 LOAD_CONST   2 (None)
 20 RETURN_VALUE

Now compare

  2 LOAD_CONST   3 (frozenset({1, 2}))

with

  1   0 LOAD_CONST   0 (1)
  2 LOAD_CONST   1 (2)
  4 BUILD_SET2

and you see the difference between using a frozenset as a constant and
building a set at runtime.

hp

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"


signature.asc
Description: PGP signature
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Why operations between dict views return a set and not a frozenset?

2022-01-11 Thread Chris Angelico
On Wed, Jan 12, 2022 at 5:49 AM Marco Sulla
 wrote:
>
> Ok... so I suppose, since you're inviting me to use dis and look at the 
> bytecode, that are you talking about constants in assembly, so const in C? 
> Sorry for the confusion, I'm not so skilled in C and I know nearly nothing 
> about assembly. Furthermore I never look at the bytecode of any language 
> before, so I simply didn't understand you.
>

No, I'm talking about constants in Python.

> I think this is what you mean:
>
> >>> dis.dis("for _ in {1, 2}: pass")
>   1   0 SETUP_LOOP  12 (to 14)
>   2 LOAD_CONST   3 (frozenset({1, 2}))

This is a constant.

>   4 GET_ITER
> >>6 FOR_ITER 4 (to 12)
>   8 STORE_NAME   0 (_)
>  10 JUMP_ABSOLUTE6
> >>   12 POP_BLOCK
> >>   14 LOAD_CONST   2 (None)

This is a constant.

>  16 RETURN_VALUE
> >>> a = {1, 2}
> >>> dis.dis("for _ in a: pass")
>   1   0 SETUP_LOOP  12 (to 14)
>   2 LOAD_NAME0 (a)
>   4 GET_ITER
> >>6 FOR_ITER 4 (to 12)
>   8 STORE_NAME   1 (_)
>  10 JUMP_ABSOLUTE6
> >>   12 POP_BLOCK
> >>   14 LOAD_CONST   0 (None)

This is a constant.

>  16 RETURN_VALUE
>

Try the same thing with other code and see whether you can see a
difference. In other words, *play around with dis.dis*.

If you're trying to hack on the internals of the CPython dictionary
implementation and do not understand how Python bytecode is executed,
you are doomed to make many MANY errors of judgment about performance.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Why operations between dict views return a set and not a frozenset?

2022-01-11 Thread Marco Sulla
Ok... so I suppose, since you're inviting me to use dis and look at the
bytecode, that are you talking about constants in assembly, so const in C?
Sorry for the confusion, I'm not so skilled in C and I know nearly nothing
about assembly. Furthermore I never look at the bytecode of any language
before, so I simply didn't understand you.

I think this is what you mean:

>>> dis.dis("for _ in {1, 2}: pass")
  1   0 SETUP_LOOP  12 (to 14)
  2 LOAD_CONST   3 (frozenset({1, 2}))
  4 GET_ITER
>>6 FOR_ITER 4 (to 12)
  8 STORE_NAME   0 (_)
 10 JUMP_ABSOLUTE6
>>   12 POP_BLOCK
>>   14 LOAD_CONST   2 (None)
 16 RETURN_VALUE
>>> a = {1, 2}
>>> dis.dis("for _ in a: pass")
  1   0 SETUP_LOOP  12 (to 14)
  2 LOAD_NAME0 (a)
  4 GET_ITER
>>6 FOR_ITER 4 (to 12)
  8 STORE_NAME   1 (_)
 10 JUMP_ABSOLUTE6
>>   12 POP_BLOCK
>>   14 LOAD_CONST   0 (None)
 16 RETURN_VALUE


On Tue, 11 Jan 2022 at 01:05, Chris Angelico  wrote:

> On Tue, Jan 11, 2022 at 10:26 AM Marco Sulla
>  wrote:
> >
> > On Wed, 5 Jan 2022 at 23:02, Chris Angelico  wrote:
> > >
> > > On Thu, Jan 6, 2022 at 8:01 AM Marco Sulla <
> marco.sulla.pyt...@gmail.com> wrote:
> > > >
> > > > On Wed, 5 Jan 2022 at 14:16, Chris Angelico 
> wrote:
> > > > > That's an entirely invisible optimization, but it's more than just
> > > > > "frozenset is faster than set". It's that a frozenset or tuple can
> be
> > > > > stored as a function's constants, which is a massive difference.
> > > >
> > > > Can you explain this?
> > >
> > > Play around with dis.dis and timeit.
> >
> > ? I don't understand. You're talking about function constants. What
> > are they? I can't dig deep into something if I can't know what it is.
> > Maybe are you talking about function default values for parameters?
>
> No, I'm talking about constants. Every function has them.
>
> > Of course. You can use a proxy and slow down almost everything much
> > more. Or you can simply create a version of the mutable object with
> > fewer methods, as more or less frozenset is. I checked the
> > implementation, no fast iteration is implemented. I do not understand
> > why in `for x in {1, 2, 3}` the set is substituted by a frozenset.
>
> Constants. Like I said, play around with dis.dis, and explore what's
> already happening. A set can't be a constant, a frozenset can be.
> Constants are way faster than building from scratch.
>
> Explore. Play around. I'm not going to try to explain everything in detail.
>
> If you're delving into the details of the C implementation of the
> dictionary, I would have expected you'd already be familiar with the
> way that functions behave.
>
> ChrisA
> --
> https://mail.python.org/mailman/listinfo/python-list
>
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Why operations between dict views return a set and not a frozenset?

2022-01-10 Thread Chris Angelico
On Tue, Jan 11, 2022 at 10:26 AM Marco Sulla
 wrote:
>
> On Wed, 5 Jan 2022 at 23:02, Chris Angelico  wrote:
> >
> > On Thu, Jan 6, 2022 at 8:01 AM Marco Sulla  
> > wrote:
> > >
> > > On Wed, 5 Jan 2022 at 14:16, Chris Angelico  wrote:
> > > > That's an entirely invisible optimization, but it's more than just
> > > > "frozenset is faster than set". It's that a frozenset or tuple can be
> > > > stored as a function's constants, which is a massive difference.
> > >
> > > Can you explain this?
> >
> > Play around with dis.dis and timeit.
>
> ? I don't understand. You're talking about function constants. What
> are they? I can't dig deep into something if I can't know what it is.
> Maybe are you talking about function default values for parameters?

No, I'm talking about constants. Every function has them.

> Of course. You can use a proxy and slow down almost everything much
> more. Or you can simply create a version of the mutable object with
> fewer methods, as more or less frozenset is. I checked the
> implementation, no fast iteration is implemented. I do not understand
> why in `for x in {1, 2, 3}` the set is substituted by a frozenset.

Constants. Like I said, play around with dis.dis, and explore what's
already happening. A set can't be a constant, a frozenset can be.
Constants are way faster than building from scratch.

Explore. Play around. I'm not going to try to explain everything in detail.

If you're delving into the details of the C implementation of the
dictionary, I would have expected you'd already be familiar with the
way that functions behave.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Why operations between dict views return a set and not a frozenset?

2022-01-10 Thread Marco Sulla
On Wed, 5 Jan 2022 at 23:02, Chris Angelico  wrote:
>
> On Thu, Jan 6, 2022 at 8:01 AM Marco Sulla  
> wrote:
> >
> > On Wed, 5 Jan 2022 at 14:16, Chris Angelico  wrote:
> > > That's an entirely invisible optimization, but it's more than just
> > > "frozenset is faster than set". It's that a frozenset or tuple can be
> > > stored as a function's constants, which is a massive difference.
> >
> > Can you explain this?
>
> Play around with dis.dis and timeit.

? I don't understand. You're talking about function constants. What
are they? I can't dig deep into something if I can't know what it is.
Maybe are you talking about function default values for parameters?

> > > Function positional arguments aren't interchangeable, so it makes
> > > sense to have them as a tuple.
> >
> > You are wrong, since kwarg is a dict. Indeed I proposed to use
> > frozendict for kwargs, and Guido said that it's a pity that this will
> > break a lot of existing Python code :D, since the fact that args is
> > _immutable_ and kwargs not always bothered him.
>
> Excuse me? I mentioned kwargs in the part that you removed from the
> quote, and the part you're quoting explicitly says "positional
> arguments".

Ok, I quote also the other part:

> (Function *keyword* arguments, on the other hand, are different; as
> long as the mapping from keys to values is maintained, you can remove
> some of them and pass the rest on, without fundamentally changing
> their meaning.)

First of all, I repeat, Guido said (more or less) that in a perfect
world, kwargs are immutable. Or maybe I did not understand what he
said, maybe he said that in a perfect world also args are mutable. But
I suppose it's more probable the first hypothesis :D

Secondly, you can also get the args from a function, transform it in a
list, change something and pass it unpacked to another function. You
will not change the meaning of the tuple, since, well, you copied it
in another mutable object. The original object is untouched.

I perfectly agree that, in the majority of cases, returning an
immutable vs a mutable are a matter of... sense? Meaning? Ok, I
perfectly agree. But IMHO there are many cases in which immutable
objects are used for a matter of speed, and I bet that args is one of
them.

> > Anyway, I'm starting to think that neither set nor frozenset are good
> > for dict items:
> >
> > (venv_3_10) marco@buzz:~$ python
> > Python 3.10.0 (heads/3.10-dirty:f6e8b80d20, Nov 18 2021, 19:16:18)
> > [GCC 10.1.1 20200718] on linux
> > Type "help", "copyright", "credits" or "license" for more information.
> > >>> a = {1: 2}
> > >>> b = {3: []}
> > >>> a | b
> > {1: 2, 3: []}
> > >>> a.items() | b.items()
> > Traceback (most recent call last):
> >   File "", line 1, in 
> > TypeError: unhashable type: 'list'
> > >>>
>
> Well yes. Only dict keys can be considered to be set-like.

This is not true. It's at least from Python 3.6, and I think also
before, that almost the full Set API was added to both keys and items
view.
Items indeed are a sort of set, in a mathematical sense, since any
pair (key, value) is unique, even if value is mutable.

> I don't
> know WHAT you think you're trying to do here, but if you ever thought
> of set operations on dict values, you may want to completely rethink
> what you're doing.

set ops on values? Never said that :) I said that currently you can
operate on item views with set operators. This is a fact.

I also said that, since py sets accept only hashable objects, maybe
another ad-hoc object should be used for the result of the items
operations.
But maybe the change isn't worth the additional trouble. Indeed I
didn't know about the new set methods and operations on dict views
until I explored dictobject.c

> Performance is not an automatic result of immutability. That simply
> isn't how it works.

Of course. You can use a proxy and slow down almost everything much
more. Or you can simply create a version of the mutable object with
fewer methods, as more or less frozenset is. I checked the
implementation, no fast iteration is implemented. I do not understand
why in `for x in {1, 2, 3}` the set is substituted by a frozenset.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Why operations between dict views return a set and not a frozenset?

2022-01-05 Thread Chris Angelico
On Thu, Jan 6, 2022 at 8:01 AM Marco Sulla  wrote:
>
> On Wed, 5 Jan 2022 at 14:16, Chris Angelico  wrote:
> > That's an entirely invisible optimization, but it's more than just
> > "frozenset is faster than set". It's that a frozenset or tuple can be
> > stored as a function's constants, which is a massive difference.
>
> Can you explain this?

Play around with dis.dis and timeit.

> > Function positional arguments aren't interchangeable, so it makes
> > sense to have them as a tuple.
>
> You are wrong, since kwarg is a dict. Indeed I proposed to use
> frozendict for kwargs, and Guido said that it's a pity that this will
> break a lot of existing Python code :D, since the fact that args is
> _immutable_ and kwargs not always bothered him.

Excuse me? I mentioned kwargs in the part that you removed from the
quote, and the part you're quoting explicitly says "positional
arguments".

> Anyway, I'm starting to think that neither set nor frozenset are good
> for dict items:
>
> (venv_3_10) marco@buzz:~$ python
> Python 3.10.0 (heads/3.10-dirty:f6e8b80d20, Nov 18 2021, 19:16:18)
> [GCC 10.1.1 20200718] on linux
> Type "help", "copyright", "credits" or "license" for more information.
> >>> a = {1: 2}
> >>> b = {3: []}
> >>> a | b
> {1: 2, 3: []}
> >>> a.items() | b.items()
> Traceback (most recent call last):
>   File "", line 1, in 
> TypeError: unhashable type: 'list'
> >>>

Well yes. Only dict keys can be considered to be set-like. I don't
know WHAT you think you're trying to do here, but if you ever thought
of set operations on dict values, you may want to completely rethink
what you're doing.

Performance is not an automatic result of immutability. That simply
isn't how it works.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Why operations between dict views return a set and not a frozenset?

2022-01-05 Thread Marco Sulla
On Wed, 5 Jan 2022 at 14:16, Chris Angelico  wrote:
> That's an entirely invisible optimization, but it's more than just
> "frozenset is faster than set". It's that a frozenset or tuple can be
> stored as a function's constants, which is a massive difference.

Can you explain this?

> In fact, the two data types are virtually identical in performance once 
> created [...]

This is really strange, since in theory frozenset should not have to
check if itself is mutated during the iteration, on each cycle. So the
speed should be noticeable faster. Maybe frozenset was not optimised,
because the use case is really little and will add potentially useless
C code? Furthermore, more the code, more the memory consumption and
less the speed. I have to check setobject.c.

> Function positional arguments aren't interchangeable, so it makes
> sense to have them as a tuple.

You are wrong, since kwarg is a dict. Indeed I proposed to use
frozendict for kwargs, and Guido said that it's a pity that this will
break a lot of existing Python code :D, since the fact that args is
_immutable_ and kwargs not always bothered him.

Anyway, I'm starting to think that neither set nor frozenset are good
for dict items:

(venv_3_10) marco@buzz:~$ python
Python 3.10.0 (heads/3.10-dirty:f6e8b80d20, Nov 18 2021, 19:16:18)
[GCC 10.1.1 20200718] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> a = {1: 2}
>>> b = {3: []}
>>> a | b
{1: 2, 3: []}
>>> a.items() | b.items()
Traceback (most recent call last):
  File "", line 1, in 
TypeError: unhashable type: 'list'
>>>
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Why operations between dict views return a set and not a frozenset?

2022-01-05 Thread Antoon Pardon




Op 4/01/2022 om 19:27 schreef Marco Sulla:

$ python
Python 3.10.0 (heads/3.10-dirty:f6e8b80d20, Nov 18 2021, 19:16:18)
[GCC 10.1.1 20200718] on linux
Type "help", "copyright", "credits" or "license" for more information.

a = {1:2}
c = {1:2, 3:4}
c.keys() - a.keys()

{3}

Why not frozenset({3})?


My 2 cents worths: Because dictviews mutate with the directory. That makes
them more like set than like frozenset. So operations on them produce a set.

--
Antoon Pardon.
--
https://mail.python.org/mailman/listinfo/python-list


Re: Why operations between dict views return a set and not a frozenset?

2022-01-05 Thread Chris Angelico
On Thu, Jan 6, 2022 at 12:05 AM Marco Sulla
 wrote:
>
> On Wed, 5 Jan 2022 at 00:54, Chris Angelico  wrote:
> > That's because a tuple is the correct data type when returning two
> > distinct items. It's not a list that has two elements in it; it's a
> > tuple of (key, value). Immutability is irrelevant.
>
> Immutability is irrelevant, speed no. A tuple is faster than a list
> and more compact. Also frozenset is faster than set. Indeed CPython
> optimises internally a
>
> for x in {1, 2, 3}
>
> transforming the set in a frozenset for a matter of speed. That's why
> tuple is usually preferred. I expected the same for frozenset

That's an entirely invisible optimization, but it's more than just
"frozenset is faster than set". It's that a frozenset or tuple can be
stored as a function's constants, which is a massive difference.

In fact, the two data types are virtually identical in performance once created:

rosuav@sikorsky:~$ python3 -m timeit -s "stuff = {1,2,3}" "for x in stuff: pass"
500 loops, best of 5: 46.2 nsec per loop
rosuav@sikorsky:~$ python3 -m timeit -s "stuff = frozenset({1,2,3})"
"for x in stuff: pass"
500 loops, best of 5: 46.7 nsec per loop
rosuav@sikorsky:~$ python3 -m timeit -s "stuff = set(range(1))"
"for x in stuff: pass"
5000 loops, best of 5: 82.1 usec per loop
rosuav@sikorsky:~$ python3 -m timeit -s "stuff =
frozenset(range(1))" "for x in stuff: pass"
5000 loops, best of 5: 81.3 usec per loop

Mutability is irrelevant, and so is the speed of the data type. Having
set operations on keys views return frozensets wouldn't improve
anything.

> > Got any examples of variable-length sequences?
>
> function positional args are tuples, for example.
>
> > Usually a tuple is a
> > structure, not just a sequence.
>
> eh? Are you talking about the underlying C code?

No, I'm talking about purpose. In general, a list contains a sequence
of things whose order matters but which can be removed from - you can
remove one item from your shopping list and the rest still mean what
they were. A tuple has a specific set of things in a specific order,
like Cartesian coordinates. You can't just remove the x coordinate
from an (x,y,z) tuple without fundamentally changing what it is.

Function positional arguments aren't interchangeable, so it makes
sense to have them as a tuple. Removing the first argument would
redefine what all the others mean, so a tuple is correct - it's not
just a list that's been made immutable for performance's sake.
(Function *keyword* arguments, on the other hand, are different; as
long as the mapping from keys to values is maintained, you can remove
some of them and pass the rest on, without fundamentally changing
their meaning.)

Do you have any examples of actually variable-length sequences that
are tuples for speed?

Measure before claiming a speed difference.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Why operations between dict views return a set and not a frozenset?

2022-01-05 Thread Marco Sulla
On Wed, 5 Jan 2022 at 00:54, Chris Angelico  wrote:
> That's because a tuple is the correct data type when returning two
> distinct items. It's not a list that has two elements in it; it's a
> tuple of (key, value). Immutability is irrelevant.

Immutability is irrelevant, speed no. A tuple is faster than a list
and more compact. Also frozenset is faster than set. Indeed CPython
optimises internally a

for x in {1, 2, 3}

transforming the set in a frozenset for a matter of speed. That's why
tuple is usually preferred. I expected the same for frozenset

> Got any examples of variable-length sequences?

function positional args are tuples, for example.

> Usually a tuple is a
> structure, not just a sequence.

eh? Are you talking about the underlying C code?

> If something is just returning a
> sequence, it'll most often return a dedicated sequence type (like
> range in Py3) or a list (like lots of things in Py2).

Python 2 is now obsolete, I don't think is relevant for the discussion.

About your sentence, yes, usually a dedicated view, sequence or
generator is returned, but tuples too are really much used. A list is
returned very sporadically, for what I remember.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Why operations between dict views return a set and not a frozenset?

2022-01-04 Thread 2QdxY4RzWzUUiLuE
On 2022-01-05 at 08:30:30 +1100,
Cameron Simpson  wrote:

> On 04Jan2022 21:03, Marco Sulla  wrote:
> >On Tue, 4 Jan 2022 at 19:38, Chris Angelico  wrote:
> >> [...] should the keys view be considered
> >> frozen or not? Remember the set of keys can change (when the
> >> underlying dict changes).
> >
> >Well, also the items can change, but they are returned as tuples with
> >2 elements.
> >
> >It seems to me that the stdlib, when something should return a
> >sequence, prefers to return a tuple. So I expected the same preference
> >for frozenset over set.
> >
> >> It's not difficult to construct a frozenset from a set.
> >
> >This sentence has the commutative property :)
> 
> Indeed.
> 
> But speaking for myself, I may well want to perform additional work on 
> the object returned. Making a copy of it for tht purpose seems very 
> wasteful (imagine the set is quite large). A modifiable version can be 
> used immediately with no time or space cost. And it can be left alone if 
> it is to be unchanged.  If I got a frozenset back I would inherently 
> have to copy it to do "modifying work".

Unless the additional work is to use it as a dictionary key, or to add
it to an existing [necessarily mutable!] set.  Then again, that's not
work *on* the object returned, that's additional work *with* the object
returned.

We could go around and around on this all day.  :-)

Python began with the premise of mutability.  IIRC, many attempts at
immutable class instances (for use in sets or as dictionary keys) have
run into the same issues Marco Sulla is having.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Why operations between dict views return a set and not a frozenset?

2022-01-04 Thread Chris Angelico
On Wed, Jan 5, 2022 at 7:04 AM Marco Sulla  wrote:
>
> On Tue, 4 Jan 2022 at 19:38, Chris Angelico  wrote:
> > [...] should the keys view be considered
> > frozen or not? Remember the set of keys can change (when the
> > underlying dict changes).
>
> Well, also the items can change, but they are returned as tuples with
> 2 elements.

That's because a tuple is the correct data type when returning two
distinct items. It's not a list that has two elements in it; it's a
tuple of (key, value). Immutability is irrelevant.

> It seems to me that the stdlib, when something should return a
> sequence, prefers to return a tuple. So I expected the same preference
> for frozenset over set.

Got any examples of variable-length sequences? Usually a tuple is a
structure, not just a sequence. If something is just returning a
sequence, it'll most often return a dedicated sequence type (like
range in Py3) or a list (like lots of things in Py2).

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Why operations between dict views return a set and not a frozenset?

2022-01-04 Thread Cameron Simpson
On 04Jan2022 21:03, Marco Sulla  wrote:
>On Tue, 4 Jan 2022 at 19:38, Chris Angelico  wrote:
>> [...] should the keys view be considered
>> frozen or not? Remember the set of keys can change (when the
>> underlying dict changes).
>
>Well, also the items can change, but they are returned as tuples with
>2 elements.
>
>It seems to me that the stdlib, when something should return a
>sequence, prefers to return a tuple. So I expected the same preference
>for frozenset over set.
>
>> It's not difficult to construct a frozenset from a set.
>
>This sentence has the commutative property :)

Indeed.

But speaking for myself, I may well want to perform additional work on 
the object returned. Making a copy of it for tht purpose seems very 
wasteful (imagine the set is quite large). A modifiable version can be 
used immediately with no time or space cost. And it can be left alone if 
it is to be unchanged.  If I got a frozenset back I would inherently 
have to copy it to do "modifying work".

So I prefer getting a modifiable object back.

Cheers,
Cameron Simpson 
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Why operations between dict views return a set and not a frozenset?

2022-01-04 Thread Marco Sulla
On Tue, 4 Jan 2022 at 19:38, Chris Angelico  wrote:
> [...] should the keys view be considered
> frozen or not? Remember the set of keys can change (when the
> underlying dict changes).

Well, also the items can change, but they are returned as tuples with
2 elements.

It seems to me that the stdlib, when something should return a
sequence, prefers to return a tuple. So I expected the same preference
for frozenset over set.

> It's not difficult to construct a frozenset from a set.

This sentence has the commutative property :)
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Why operations between dict views return a set and not a frozenset?

2022-01-04 Thread Chris Angelico
On Wed, Jan 5, 2022 at 5:29 AM Marco Sulla  wrote:
>
> $ python
> Python 3.10.0 (heads/3.10-dirty:f6e8b80d20, Nov 18 2021, 19:16:18)
> [GCC 10.1.1 20200718] on linux
> Type "help", "copyright", "credits" or "license" for more information.
> >>> a = {1:2}
> >>> c = {1:2, 3:4}
> >>> c.keys() - a.keys()
> {3}
> >>>
>

Let's start with this.

>>> a = {1}
>>> c = {1, 3}
>>> c - a
{3}

Do you agree that this should be a set?

If so, then the next question is: should the keys view be considered
frozen or not? Remember the set of keys can change (when the
underlying dict changes).

It's not difficult to construct a frozenset from a set.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Why operations between dict views return a set and not a frozenset?

2022-01-04 Thread Marco Sulla
$ python
Python 3.10.0 (heads/3.10-dirty:f6e8b80d20, Nov 18 2021, 19:16:18)
[GCC 10.1.1 20200718] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> a = {1:2}
>>> c = {1:2, 3:4}
>>> c.keys() - a.keys()
{3}
>>>


Why not frozenset({3})?
-- 
https://mail.python.org/mailman/listinfo/python-list


Return and set

2011-07-19 Thread Billy Mays
I have a method getToken() which checks to see if a value is set, and if 
so, return it.  However, it doesn't feel pythonic to me:


def getToken(self):
if self.tok:
t = self.tok
self.tok = None
return t
# ...


Is there a way to trim the 'if' block to reset self.tok upon return?

--
Bill
--
http://mail.python.org/mailman/listinfo/python-list


Re: Return and set

2011-07-19 Thread Paul Woolcock
I don't know if this is better (or more pythonic), but it works for me on
python 2.7.

 class MyKlass(object):
... def __init__(self, tok):
... self.tok = tok
... def gettoken(self):
... t, self.tok = self.tok or None, None
... return t

On Tue, Jul 19, 2011 at 9:23 AM, Billy Mays 
81282ed9a88799d21e77957df2d84bd6514d9...@myhashismyemail.com wrote:

 I have a method getToken() which checks to see if a value is set, and if
 so, return it.  However, it doesn't feel pythonic to me:

 def getToken(self):
if self.tok:
t = self.tok
self.tok = None
return t
# ...


 Is there a way to trim the 'if' block to reset self.tok upon return?

 --
 Bill
 --
 http://mail.python.org/**mailman/listinfo/python-listhttp://mail.python.org/mailman/listinfo/python-list




-- 

Paul Woolcock
pwool...@gmail.com
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Return and set

2011-07-19 Thread Ben Finney
Billy Mays
81282ed9a88799d21e77957df2d84bd6514d9...@myhashismyemail.com writes:

 I have a method getToken() which checks to see if a value is set, and
 if so, return it. However, it doesn't feel pythonic to me:

Clearly that's because the function name is not Pythonic :-)

I'll assume the name is a PEP-8 compatible ‘get_token’.

 def getToken(self):
 if self.tok:
 t = self.tok
 self.tok = None
 return t
 # ...

Are you testing ‘self.tok’ in a boolean context because you don't care
whether it it might be ‘’ or ‘0’ or ‘0.0’ or ‘[]’ or ‘False’ or lots
of other things that evaluate false in a boolean context?

If you want to test whether it is any value other than ‘None’, that's
not the way to do it. Instead, use ‘if self.token is not None’.

But I don't see why you test it at all, in that case, since you're
immediately setting it to ‘None’ afterward.

Also, the function name is quite misleading; the implication for a
function named ‘get_foo’ is that it is a non-destructive read. I would
expect the name of this function to indicate what's going on much more
explicitly.


My suggestion::

def get_and_reset_token(self):
result = self.token
self.token = None
return result

-- 
 \  “I stayed up all night playing poker with tarot cards. I got a |
  `\  full house and four people died.” —Steven Wright |
_o__)  |
Ben Finney
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Return and set

2011-07-19 Thread Billy Mays

On 07/19/2011 09:43 AM, Ben Finney wrote:

Billy Mays
81282ed9a88799d21e77957df2d84bd6514d9...@myhashismyemail.com  writes:


I have a method getToken() which checks to see if a value is set, and
if so, return it. However, it doesn't feel pythonic to me:


Clearly that's because the function name is not Pythonic :-)

I'll assume the name is a PEP-8 compatible ‘get_token’.


def getToken(self):
 if self.tok:
 t = self.tok
 self.tok = None
 return t
 # ...


Are you testing ‘self.tok’ in a boolean context because you don't care
whether it it might be ‘’ or ‘0’ or ‘0.0’ or ‘[]’ or ‘False’ or lots
of other things that evaluate false in a boolean context?

If you want to test whether it is any value other than ‘None’, that's
not the way to do it. Instead, use ‘if self.token is not None’.

But I don't see why you test it at all, in that case, since you're
immediately setting it to ‘None’ afterward.

Also, the function name is quite misleading; the implication for a
function named ‘get_foo’ is that it is a non-destructive read. I would
expect the name of this function to indicate what's going on much more
explicitly.


My suggestion::

 def get_and_reset_token(self):
 result = self.token
 self.token = None
 return result



This function is used in a file parser.  There are two methods, 
getToken() and peekToken().  getToken pops a token from the file, while 
peekToken keeps the token, but still returns it.


Code:

def getToken(self):
if self.tok:
t = self.tok
self.tok = None
return t
try:
t = self.gen.next()
except StopIteration:
return NULL
else:
return t

def peekToken(self):
if not self.tok:
self.tok = self.getToken()
return self.tok

NULL is an enumerated value I have defined above.  The idea is for 
peekToken to reuse getToken, but to keep the token still around.





--
http://mail.python.org/mailman/listinfo/python-list


Re: Return and set

2011-07-19 Thread Terry Reedy

On 7/19/2011 9:52 AM, Billy Mays wrote:

On 07/19/2011 09:43 AM, Ben Finney wrote:

Billy Mays
81282ed9a88799d21e77957df2d84bd6514d9...@myhashismyemail.com writes:


I have a method getToken() which checks to see if a value is set, and
if so, return it. However, it doesn't feel pythonic to me:


Clearly that's because the function name is not Pythonic :-)

I'll assume the name is a PEP-8 compatible ‘get_token’.


def getToken(self):
if self.tok:
t = self.tok
self.tok = None
return t
# ...


Are you testing ‘self.tok’ in a boolean context because you don't care
whether it it might be ‘’ or ‘0’ or ‘0.0’ or ‘[]’ or ‘False’ or lots
of other things that evaluate false in a boolean context?

If you want to test whether it is any value other than ‘None’, that's
not the way to do it. Instead, use ‘if self.token is not None’.

But I don't see why you test it at all, in that case, since you're
immediately setting it to ‘None’ afterward.

Also, the function name is quite misleading; the implication for a
function named ‘get_foo’ is that it is a non-destructive read. I would
expect the name of this function to indicate what's going on much more
explicitly.


My suggestion::

def get_and_reset_token(self):
result = self.token
self.token = None
return result



This function is used in a file parser. There are two methods,
getToken() and peekToken(). getToken pops a token from the file, while
peekToken keeps the token, but still returns it.

Code:

def getToken(self):
if self.tok:
t = self.tok
self.tok = None
return t
try:
t = self.gen.next()
except StopIteration:
return NULL
else:
return t

def peekToken(self):
if not self.tok:
self.tok = self.getToken()
return self.tok


You did not answer Ben's question about the allowed values of self.tok 
and whether you really want to clobber all 'false' values. The proper 
code depends on that answer.



NULL is an enumerated value I have defined above. The idea is for
peekToken to reuse getToken, but to keep the token still around.


I think about reversing and have getToken use peekToken and then reset. 
But that depends on the exact logic which depends on the specs. I would 
more likely have just one function with a reset parameter defaulted to 
the more common value.


--
Terry Jan Reedy


--
http://mail.python.org/mailman/listinfo/python-list


Re: Return and set

2011-07-19 Thread Micah
That sounds artificially backwards; why not let getToken() reuse peekToken()?

def peek(self):
if self.tok is None:
try:
self.tok = self.gen.next()
except StopIteration:
self.tok = NULL
return self.tok

def pop(self):
token = self.peek()
self.tok = None
return token
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Return and set

2011-07-19 Thread Billy Mays

On 07/19/2011 01:00 PM, Micah wrote:

That sounds artificially backwards; why not let getToken() reuse peekToken()?

def peek(self):
 if self.tok is None:
 try:
 self.tok = self.gen.next()
 except StopIteration:
 self.tok = NULL
 return self.tok

def pop(self):
 token = self.peek()
 self.tok = None
 return token


I actually like this way better, thanks!

--
Bill
--
http://mail.python.org/mailman/listinfo/python-list


Re: Return and set

2011-07-19 Thread Billy Mays

On 07/19/2011 01:02 PM, Terry Reedy wrote:


You did not answer Ben's question about the allowed values of self.tok
and whether you really want to clobber all 'false' values. The proper
code depends on that answer.


NULL is an enumerated value I have defined above. The idea is for
peekToken to reuse getToken, but to keep the token still around.


I think about reversing and have getToken use peekToken and then reset.
But that depends on the exact logic which depends on the specs. I would
more likely have just one function with a reset parameter defaulted to
the more common value.



self.gen is a generator that gets filters single characters from a file. 
 Values that come from self.gen.next() will always be string values 
since the file generator closes on EOF.  I can be sure that I will 
either get a string from self.gen.next() or catch an exception so its 
okay to use NULL (which evaluates to 0).


You are correct it makes more sense to use peekToken() to do the lifting 
and getToken() to reuse the token.  However, I am not sure what you mean 
by reset.


--
Bill
--
http://mail.python.org/mailman/listinfo/python-list


Re: Return and set

2011-07-19 Thread Terry Reedy

On 7/19/2011 1:00 PM, Micah wrote:

That sounds artificially backwards; why not let getToken() reuse peekToken()?

def peek(self):
 if self.tok is None:
 try:
 self.tok = self.gen.next()


If this is changed (as intended for the iteration protocol) to

 self.tok = next(self.gen)

then this code (and OP's version) should run as is on both Py2.6/7 and Py3.


 except StopIteration:
 self.tok = NULL
 return self.tok

def pop(self):
 token = self.peek()
 self.tok = None
 return token



--
Terry Jan Reedy

--
http://mail.python.org/mailman/listinfo/python-list


Re: Dictionary .keys() and .values() should return a set [with Python 3000 in mind]

2006-07-04 Thread Antoon Pardon
On 2006-07-01, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote:
 There's a few good reasons.
 1 - golden handcuffs.  Breaking old code is bad 90% of the time
 2 - creating a set MAY be slower.

 Python's sets seem to imply to that they will always be a hash map.  in
 this case, some creative hash map mapping could allow one to create a
 set without calculating hash codes (make the set hashmap have the same
 dimentions and rules as the dictionary one).
 If there was intent to allow Python implementations to use trees for
 the set, then a list is far faster to create (O(n) time instead of
 O(nlogn)).

 3 - using a set is sometimes slower (just as using a list is sometimes
 slower)
 I can't speak for your code, but this is the most common use of keys in
 my coding:

 # d is some dictionary
 keys = d.keys()
 keys.sort()
 for k in keys:
   #blah

Wouldn't you be better of with a tree instead of dictionary? Maybe
there are other reasons to prefer a dict, but have a look at:

  http://www.pardon-sleeuwaegen.be/antoon/avltree.html

Suppose t is a tree implementing the same mapping a your dictionary
d above, the code would be:

  # t is some tree
  for k in t:
#blah

And the keys will be treated in order.


If you try it, let me know what you think of it.

-- 
Antoon Pardon
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Dictionary .keys() and .values() should return a set [with Python3000 in mind]

2006-07-03 Thread [EMAIL PROTECTED]
Yes, this is what he's saying.  Its not broken, just a bit different.
 After all, you dont have a problem with:
lst = [1, 2, 3]
ptr = lst
lst.append(4) # this changes ptr

And a view of the dictionary is orders faster than creating a copy of
it (which is required to keep k0 from changing in your example).  If
you're LUCKY, copying a dictionary is O(n), and i would expect it to
take longer because hashmaps usually have a lot of empty space in them
(anyone with python specific experiance know if its diffent for python
hashmaps?), and each empty space takes time to iterate across.  So a
dictionary is over O(n), while a view takes O(1) time.  considering how
often this operation is used, it makes sense to do this optimization

And if you really want an independant list, you can always say
k0 = list(d.keys())

Paul Rubin wrote:

 Wait a sec, you're saying

   k0 = d.keys()
   d['whee'] = 'parrot'# this can change k0???
 
 That sounds broken.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Dictionary .keys() and .values() should return a set [with Python3000 in mind]

2006-07-03 Thread Paul Rubin
[EMAIL PROTECTED] [EMAIL PROTECTED] writes:
 And a view of the dictionary is orders faster than creating a copy of
 it (which is required to keep k0 from changing in your example).  If
 you're LUCKY, copying a dictionary is O(n), 

There are ways to do it so you don't have to copy, but the dict would
then no longer be a straightforward hash table.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Dictionary .keys() and .values() should return a set [with Python3000 in mind]

2006-07-03 Thread [EMAIL PROTECTED]
Ooh, can you point me to them?  This sounds very interesting.

The only way I can think of doing it is to have some fun with pointers
and not make the final copy until a change is made to the table.  I'd
love to read about an algoritm which can get around this!  I feel so
behind in the times,  I still prefer the simpliciy of the redblacktree
sometimes =p
Paul Rubin wrote:
 There are ways to do it so you don't have to copy, but the dict would
 then no longer be a straightforward hash table.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Dictionary .keys() and .values() should return a set [withPython3000 in mind]

2006-07-03 Thread [EMAIL PROTECTED]
You bring up a good point.  the for x in d: notation is a relativly
new construction, for x in d.keys() is much older.  Some of the value
of d.keys() goes away because we have this new construction, but
there's some reasons to keep it around:
1. Consitency.  You can get the values, you can get the (key, value)
pairs.  it'd be odd not to be able to get the keys
2. Choices.  if d.keys() is a FAST operation, then you can then use
that to create any structure you want.  For example, if you want a set,
  set(d.keys()) only requires you to create a set.  If d.keys() created
an independant list, python would first need to create a list, then
create a set.
Paul Rubin wrote:
 Delaney, Timothy (Tim) [EMAIL PROTECTED] writes:
  If you want an independent data set, you have to take a snapshot. For
  the above, that's doing:
 
  k0 = list(d.keys())

 I don't understand.  Why have .keys() at all, if it doesn't get you
 an independent data set?  If all you want is to iterate through the
 dict, you can already do that:
 
   for k in d: 

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Dictionary .keys() and .values() should return a set [with Python 3000 in mind]

2006-07-03 Thread Piet van Oostrum
 Paddy [EMAIL PROTECTED] (P) wrote:

P [EMAIL PROTECTED] wrote:
 This has been bothering me for a while. Just want to find out if it
 just me or perhaps others have thought of this too: Why shouldn't the
 keyset of a dictionary be represented as a set instead of a list?

P I think the order of the items returned by keys() and values() are
P related. I decided on a short  empirical test:

yes, it is documented that their order is related. In fact 
d.items() == zip(d.keys(), d.values())
This wouldn't work with sets instead of lists.
-- 
Piet van Oostrum [EMAIL PROTECTED]
URL: http://www.cs.uu.nl/~piet [PGP 8DAE142BE17999C4]
Private email: [EMAIL PROTECTED]
-- 
http://mail.python.org/mailman/listinfo/python-list


RE: Dictionary .keys() and .values() should return a set[withPython3000 in mind]

2006-07-03 Thread Delaney, Timothy (Tim)
Paul Rubin wrote:

 Delaney, Timothy (Tim) [EMAIL PROTECTED] writes:
 If you want an independent data set, you have to take a snapshot.
 For the above, that's doing: 
 
 k0 = list(d.keys())
 
 I don't understand.  Why have .keys() at all, if it doesn't get you
 an independent data set?  If all you want is to iterate through the
 dict, you can already do that:
 
   for k in d: 

Several reasons.

d.keys()
d.values()
d.items()

You can only have iter(d) equivalent to iter(*one* of the above).

Also, d.keys() can provide an indexable view i.e. you can index it like
the list currently returned from d.keys(). For most use cases, you can
treat it exactly like you can treat the list returned now - just don't
try adding to it or (probably) removing from it.

Tim Delaney
-- 
http://mail.python.org/mailman/listinfo/python-list


RE: Dictionary .keys() and .values() should return a set [withPython3000 in mind]

2006-07-03 Thread Delaney, Timothy (Tim)
Paul Rubin wrote:

 [EMAIL PROTECTED] [EMAIL PROTECTED] writes:
 And a view of the dictionary is orders faster than creating a copy
 of it (which is required to keep k0 from changing in your example). 
 If you're LUCKY, copying a dictionary is O(n),
 
 There are ways to do it so you don't have to copy, but the dict would
 then no longer be a straightforward hash table.

Plus copy-on-write isn't something you want in general - it's *much*
slower since it needs to copy the internal data every time it's
modified. The vast majority of uses of data structures does not involve
concurrent modification.

Tim Delaney
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Dictionary .keys() and .values() should return a set [withPython3000 in mind]

2006-07-03 Thread Scott David Daniels
Delaney, Timothy (Tim) wrote:
 Plus copy-on-write isn't something you want in general - it's *much*
 slower since it needs to copy the internal data every time it's
 modified. The vast majority of uses of data structures does not involve
 concurrent modification.

But copy-on-write in the presence of reference counts doesn't do the
mad useless-copy stuff.

--Scott David Daniels
[EMAIL PROTECTED]
-- 
http://mail.python.org/mailman/listinfo/python-list


RE: Dictionary .keys() and .values() should return a set[withPython3000 in mind]

2006-07-03 Thread Delaney, Timothy (Tim)
Scott David Daniels wrote:

 Delaney, Timothy (Tim) wrote:
 Plus copy-on-write isn't something you want in general - it's *much*
 slower since it needs to copy the internal data every time it's
 modified. The vast majority of uses of data structures does not
 involve concurrent modification.
 
 But copy-on-write in the presence of reference counts doesn't do the
 mad useless-copy stuff.

Good point - if there's only one reference, there's no need for copying
- this is definitely something that could be taken advantage of.

Tim Delaney
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Dictionary .keys() and .values() should return a set [with Python 3000 in mind]

2006-07-02 Thread Paul McGuire
[EMAIL PROTECTED] wrote in message
news:[EMAIL PROTECTED]
 This has been bothering me for a while. Just want to find out if it
 just me or perhaps others have thought of this too: Why shouldn't the
 keyset of a dictionary be represented as a set instead of a list?

I think this is an interesting suggestion.  Of course, the current situation
is as much a product of historical progression as anything: lists and dicts
pre-date sets, so the collection of keys had to be returned as a list.
Since lists also carry some concept of order to them, the documentation for
the list returned by dict.keys() went out of its way to say that the order
of the elements in the dict.keys() list had no bearing on the dict, the
insertion order of entries, or anything else, that the order of keys was
purely arbitrary.

In fact, there is not a little irony to this proposal, since it seems it was
just a few months ago that c.l.py had just about weekly requests for how to
create an ordered dict, with various ideas of how a dict should be
ordered, but most intended to preserve the order of insertion of items into
the dict.  And now here we have just about the opposite proposal - dicts
should not only *not* be ordered, they should revel in their disorderliness.

I liked the example in which the OP (of this thread) wanted to compare the
keys from two different dicts, for equality of keys.  Since the keys()
method returns a set of indeterminate order, we can't simply perform
dictA.keys() == dictB.keys().  But how often does this really happen?  In
practice, I think the keys of a dict, when this collection is used at all,
are usually sorted and then iterated over, usually to prettyprint the keys
and values in the dict.  Internally, this set of items shouldn't even exist
as a separate data structure - the dict's keys are merely labels on nodes in
some sort of hash tree.

Now that we really do have sets in Python, if one were to design dicts all
over again, it does seem like set would be a better choice than list for the
type returned by the keys() method.  To preserve compatibility, would a
second method, keyset(), do the trick?  The OP used this term himself,
referring to the keyset of the dictionary.  Still given the few cases
where I would access this as a true set, using set(dictA.keys()) doesn't
seem to be that big a hardship, and its obviousness will probably undercut
any push for a separate keyset() method.

-- Paul


-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Dictionary .keys() and .values() should return a set [with Python 3000 in mind]

2006-07-02 Thread Paddy

[EMAIL PROTECTED] wrote:
 This has been bothering me for a while. Just want to find out if it
 just me or perhaps others have thought of this too: Why shouldn't the
 keyset of a dictionary be represented as a set instead of a list?

I think the order of the items returned by keys() and values() are
related. I decided on a short  empirical test:

 import random
 n=50
 d = dict((i,random.randint(0,n-1)) for i in range(n))
 k,v = d.keys(), d.values()
 [d[k[i]] == v[i]  for i in range(n)]
[True, True, True, True, True, True, True, True, True, True, True,
True, True, True, True, True, True, True, True, True, True, True, True,
True, True, True, True, True, True, True, True, True, True, True, True,
True, True, True, True, True, True, True, True, True, True, True, True,
True, True, True]
 ## for larger n try
 # False in [d[k[i]] == v[i]  for i in range(n)]

The order of keys() and the order of values() are related, so a change
to returning sets would also loose this functionality.

Mind you, Never rely on that implied ordering. Always use items().

And so *if* sets were returned for keys() and values() then it should
for items() too.

- Paddy.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Dictionary .keys() and .values() should return a set [with Python 3000 in mind]

2006-07-02 Thread bearophileHUGS
Paddy:
 Mind you, Never rely on that implied ordering. Always use items().

Using dict.items() is probably better, but the manual says:

If items(), keys(), values(), iteritems(), iterkeys(), and itervalues() are 
called with no intervening modifications to the dictionary, the lists will 
directly correspond. This allows the creation of (value, key) pairs using 
zip(): pairs = zip(a.values(), a.keys()). The same relationship holds for 
the iterkeys() and itervalues() methods:

Is this going to change?


dict.keyset() seems nice, but you usually don't want to make a too much
big API.
Keeping APIs small is very important in Python, otherwise you need the
manual to write code.
I think a better solution to solve such key set problems is to optimize
Python itself, so Python computes set(dict) really fast (it can just
copies the hash of the dict).

Bye,
bearophile

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Dictionary .keys() and .values() should return a set [with Python 3000 in mind]

2006-07-02 Thread Simon Forman
Nick Vatamaniuc wrote:
 Robert Kern wrote:
  [EMAIL PROTECTED] wrote:
   The same thing goes for the values(). Here most people will argue that
...
 
  This part is pretty much a non-starter. Not all Python objects are hashable.
...
  Also, I may need keys to map to different objects that happen to be equal.
 
  --
  Robert Kern
 


So, values() can't return a set because of (at least) the two reasons
given above.  And since, as Scott David Daniels pointed out, dicts
support the iterator protocol, you can ask for a set of the keys easily
enough if you want it:

 d = dict(a=1, b=2, c=3)
 set(d)
set(['a', 'c', 'b'])

So, IMHO, there's not much point to having keys() return a set.


Peace,
~Simon

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Dictionary .keys() and .values() should return a set [with Python3000 in mind]

2006-07-02 Thread Terry Reedy
The meaning of dict.keys, etc, will not change for the 2.x series.  For 
3.0, I believe that Guido already intends that .keys() no longer return a 
separate list.  For one thing, a major, if not the main use, of the method 
is for iteration, as in 'for keys in d.keys():'.  For this, creating and 
them discarding a separate list is inefficient and, in a sense, silly.

One option is to make .keys be what .iterkeys is today.  Another, more 
recent, is to make .keys return a new iterable dict view object, details 
not yet worked out.  Either way, one would make an independent set or list 
with set(d.keys()) or list(d.keys)).

Terry Jan Reedy



-- 
http://mail.python.org/mailman/listinfo/python-list


RE: Dictionary .keys() and .values() should return a set [with Python3000 in mind]

2006-07-02 Thread Delaney, Timothy (Tim)
[EMAIL PROTECTED] wrote:

 This has been bothering me for a while. Just want to find out if it
 just me or perhaps others have thought of this too: Why shouldn't the
 keyset of a dictionary be represented as a set instead of a list?

There has been much discussion of this on the Python-3000 mailing list.
Suggestings included making keys(), etc return an iterator, and getting
rid of iterkeys(), etc.

The eventual consensus was that keys(), etc should return views on the
dictionary. These views would be re-iterable and will have basically the
same behaviour as the lists returned from keys(), etc. However, such a
view could have O(1) __contains__ behaviour, and would not incur the
overhead of creating a list and populating it - they would be backed by
the dictionary. Of course, this introduces the issue of concurrent
modification, but you can always get an independent copy by calling
list(dict.keys()).

Tim Delaney
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Dictionary .keys() and .values() should return a set [with Python3000 in mind]

2006-07-02 Thread Paul Rubin
Delaney, Timothy (Tim) [EMAIL PROTECTED] writes:
 The eventual consensus was that keys(), etc should return views on the
 dictionary. These views would be re-iterable and will have basically the
 same behaviour as the lists returned from keys(), etc. However, such a
 view could have O(1) __contains__ behaviour, and would not incur the
 overhead of creating a list and populating it - they would be backed by
 the dictionary. Of course, this introduces the issue of concurrent
 modification, but you can always get an independent copy by calling
 list(dict.keys()).

Wait a sec, you're saying

  k0 = d.keys()
  d['whee'] = 'parrot'# this can change k0???

That sounds broken.
-- 
http://mail.python.org/mailman/listinfo/python-list


RE: Dictionary .keys() and .values() should return a set [withPython3000 in mind]

2006-07-02 Thread Delaney, Timothy (Tim)
Paul Rubin wrote:

 Delaney, Timothy (Tim) [EMAIL PROTECTED] writes:
 The eventual consensus was that keys(), etc should return views on
 the dictionary. These views would be re-iterable and will have
 basically the same behaviour as the lists returned from keys(), etc.
 However, such a view could have O(1) __contains__ behaviour, and
 would not incur the overhead of creating a list and populating it -
 they would be backed by the dictionary. Of course, this introduces
 the issue of concurrent modification, but you can always get an
 independent copy by calling list(dict.keys()).
 
 Wait a sec, you're saying
 
   k0 = d.keys()
   d['whee'] = 'parrot'# this can change k0???
 
 That sounds broken.

That's the problem with views. You either take a snapshot of the current
state, or you deal with the backing store changing. Java's solution is
to raise a ConcurrentModificationException when it determines this has
happened (with lots of weasel words about no guarantees).

If you want an independent data set, you have to take a snapshot. For
the above, that's doing:

k0 = list(d.keys())

or

k0 = set(d.keys())

depending on what behaviour you want.

Tim Delaney
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Dictionary .keys() and .values() should return a set [withPython3000 in mind]

2006-07-02 Thread Paul Rubin
Delaney, Timothy (Tim) [EMAIL PROTECTED] writes:
 If you want an independent data set, you have to take a snapshot. For
 the above, that's doing:
 
 k0 = list(d.keys())

I don't understand.  Why have .keys() at all, if it doesn't get you
an independent data set?  If all you want is to iterate through the
dict, you can already do that:

  for k in d: 
-- 
http://mail.python.org/mailman/listinfo/python-list


Dictionary .keys() and .values() should return a set [with Python 3000 in mind]

2006-07-01 Thread [EMAIL PROTECTED]
This has been bothering me for a while. Just want to find out if it
just me or perhaps others have thought of this too: Why shouldn't the
keyset of a dictionary be represented as a set instead of a list? I
know that sets were introduced a lot later and lists/dictionaries were
used instead but I think the only correct way now is for the
dictionary keys and values to be sets. Presently {1:0,2:0,3:0}.keys()
will produce [1,2,3] but it could also produce [3,1,2] or [3,2,1] on a
different machine architecture. The Python documentation states that:

Keys and values are listed in an _arbitrary_(my emphasis) order which
is non-random, varies across Python implementations, and depends on the
dictionary's history of insertions and deletions.


So on the same machine it will be the case that:  {1:0, 2:0,
3:0}.keys() == {3:0, 2:0, 1:0}.keys() is True. But if there are 2 lists
of keys of the same dictionary, one is pickled on another machine, with
a different arbitrary non-random ordering, then the keys will not be
equal to each other. It seems like the problem could be solved by
returning a set instead.

The same thing goes for the values(). Here most people will argue that
values are not necessarily unique, so a list is more appropriate, but
in fact the values are unique it is just that more than one key could
map to the same value.  What is 'seen' in dictionary is the mapping
rule.  Also the values are not ordered and should not be indexable --
they should be a set. Just as  the keys(), one ordered list of values
from a dictionary on one machine will not necessarily be equal to
another list of values an on another machine, while in fact, they
should be.

On a more fundamental and general level, a dictionary is actually an
explicit function, also called a 'map'. A set of keys, traditionally
called a 'domain' are mapped to a set of values, traditionally called a
'range'. This mapping produces at most a surjective function (i.e. two
or more keys can map to same value and all the values are mapped to by
some keys).  Notice that the traditional counterparts to keys and
values are sets and not just lists. This seems like theory babble, but
in the case of Python staying 'true' to the theory is usually a
GoodThing(tm).

I love Python primarily because it does something in only one, correct,
and reasonable way. The same principle should probably apply to Python
itself including to its built-in data structures.

Of course the compatibilty will be broken. Any code relying on a
certain ordering of keys() or values() returned would need to be
updated. One could argue though that such code was not entirely correct
in the first place to asssume that, and should be fixed anyway.

Obviously this fix should not be in Python 2.X but perhaps would be
worth considering for Python 3000. What does everyone think?

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Dictionary .keys() and .values() should return a set [with Python 3000 in mind]

2006-07-01 Thread [EMAIL PROTECTED]
There's a few good reasons.
1 - golden handcuffs.  Breaking old code is bad 90% of the time
2 - creating a set MAY be slower.

Python's sets seem to imply to that they will always be a hash map.  in
this case, some creative hash map mapping could allow one to create a
set without calculating hash codes (make the set hashmap have the same
dimentions and rules as the dictionary one).
If there was intent to allow Python implementations to use trees for
the set, then a list is far faster to create (O(n) time instead of
O(nlogn)).

3 - using a set is sometimes slower (just as using a list is sometimes
slower)
I can't speak for your code, but this is the most common use of keys in
my coding:
# d is some dictionary
keys = d.keys()
keys.sort()
for k in keys:
  #blah

sets cannot be sorted, while lists can.  If keys() returned a set, then
I'd have to turn it into a list every time.

There's potential to add views to python (a key view of a dictionary
being a frozenset containing the dictionary's keys, which is updated
whenever the dictionary updates), but I think thats annother topic
which is out of the scope of your origional idea.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Dictionary .keys() and .values() should return a set [with Python 3000 in mind]

2006-07-01 Thread Scott David Daniels
[EMAIL PROTECTED] wrote:
 I can't speak for your code, but this is the most common use of keys in
 my coding:
 # d is some dictionary
 keys = d.keys()
 keys.sort()
 for k in keys:
   #blah

This you can rewrite quite effectively as:

 for k in sorted(d):
 #blah

--Scott David Daniels
[EMAIL PROTECTED]
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Dictionary .keys() and .values() should return a set [with Python 3000 in mind]

2006-07-01 Thread Nick Vatamaniuc
 1 - golden handcuffs.  Breaking old code is bad 90% of the time
I agree with you on  that, mostly for code that counted on list methods
of result of keys() - like you later show with sort.   But there is a
side note:  old code that assumed a particular ordering of the keys or
values is broken anyway.  So even if ks={1:0,2:0,3:0}.keys() returns
[1,2,3] on my machine I should not do something like
'my_savings_account + ks[0]' That code should be fixed anyway, since on
a different machine it might produce different values for ks[0].

 2 - creating a set MAY be slower.
Creating a set from the dictionary's keys should not be a lot slower
because the keys are already unique, there is no need to check each key
against the other keys just return them as a set.  I assume this is
what you meant by make the set hashmap have the same dimensions and
rules as the dictionary one.  Perhaps a dictionary would internally
just copy its  keys to the set and return it rather than construct as
set from scratch (with duplication checks and all).

3 - using a set is sometimes slower
Again, depending how it is used.  In your case you argue that you
usually sort the keys anyway so a list is more convinient.  But
different use cases can call for differnent operations on the keys()
after they have been retrieved. What if someone wants to do an
intersection to find common keys with another dictionary, then a set
would be more appropriate. The intent of the set() type was to not temp
anyone into assuming an ordering of keys() just because a list is
indexable. And eliminate the need for a long footnote in the
documentation of the dict type that talks about 'an arbitrary
non-random ordering' - it takes while just to grasp what that means...

In general I believe that a small performance penalty is acceptable in
order to have a correct and consistent data type, especially for Python
i.e. I might not argue the same for Perl or C.

-Nick V.

[EMAIL PROTECTED] wrote:
 There's a few good reasons.
 1 - golden handcuffs.  Breaking old code is bad 90% of the time
 2 - creating a set MAY be slower.

 Python's sets seem to imply to that they will always be a hash map.  in
 this case, some creative hash map mapping could allow one to create a
 set without calculating hash codes (make the set hashmap have the same
 dimentions and rules as the dictionary one).
 If there was intent to allow Python implementations to use trees for
 the set, then a list is far faster to create (O(n) time instead of
 O(nlogn)).

 3 - using a set is sometimes slower (just as using a list is sometimes
 slower)
 I can't speak for your code, but this is the most common use of keys in
 my coding:
 # d is some dictionary
 keys = d.keys()
 keys.sort()
 for k in keys:
   #blah

 sets cannot be sorted, while lists can.  If keys() returned a set, then
 I'd have to turn it into a list every time.

 There's potential to add views to python (a key view of a dictionary
 being a frozenset containing the dictionary's keys, which is updated
 whenever the dictionary updates), but I think thats annother topic
 which is out of the scope of your origional idea.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Dictionary .keys() and .values() should return a set [with Python 3000 in mind]

2006-07-01 Thread Robert Kern
[EMAIL PROTECTED] wrote:
 The same thing goes for the values(). Here most people will argue that
 values are not necessarily unique, so a list is more appropriate, but
 in fact the values are unique it is just that more than one key could
 map to the same value.  What is 'seen' in dictionary is the mapping
 rule.  Also the values are not ordered and should not be indexable --
 they should be a set. Just as  the keys(), one ordered list of values
 from a dictionary on one machine will not necessarily be equal to
 another list of values an on another machine, while in fact, they
 should be.

This part is pretty much a non-starter. Not all Python objects are hashable.

In [1]: d = {}

In [2]: for i in range(1, 10):
...: d[i] = range(i)
...:
...:

In [3]: set(d.values())
---
exceptions.TypeError Traceback (most recent 
call 
last)

/Users/kern/ipython console

TypeError: list objects are unhashable


Also, I may need keys to map to different objects that happen to be equal.

-- 
Robert Kern

I have come to believe that the whole world is an enigma, a harmless enigma
  that is made terrible by our own mad attempt to interpret it as though it had
  an underlying truth.
   -- Umberto Eco

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Dictionary .keys() and .values() should return a set [with Python 3000 in mind]

2006-07-01 Thread John Machin
On 2/07/2006 6:01 AM, [EMAIL PROTECTED] wrote:
 This has been bothering me for a while. 
[snip]

Summary of OP's post: d.keys() and d.values() should return sets in 
Python 3.0.

Observations:
(1) My code [some of which dates back to Python 1.5.1] uses about 12 x 
d.items() and 11 x d.keys() for each 1 x d.values()
(2) Most cases of d.X() don't need/want the untrammelled list and could 
be replaced by d.iterX(). Example: max_frequency = max(tally.values())

Opinion: d.X() could be deprecated, but I'd rather see a 
consciousness-raising for the d.iterX() methods, and for the construct
 for key in d:

Cheers,
John


-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Dictionary .keys() and .values() should return a set [with Python 3000 in mind]

2006-07-01 Thread Roy Smith
Nick Vatamaniuc [EMAIL PROTECTED] wrote:
 But there is a side note:  old code that assumed a particular ordering 
 of the keys or values is broken anyway.

From a testing point of view, it would be interesting if there was a flag 
which said, Deliberately change everything which isn't guaranteed to be a 
specific way.  So, for example, dict.keys() would return a list in reverse 
order of how it normally does it (even if it cost more to do it that way).  
An alternate hash key generator would be slipped into place.  Floating 
point math would get a little noise added to the least significant bits.  
And so on.  Might be interesting to see what sorts of bugs that would shake 
out from user code.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Dictionary .keys() and .values() should return a set [with Python 3000 in mind]

2006-07-01 Thread Nick Vatamaniuc
You are correct I should have thought of that. I still think the keys()
method should return a set() though.

Robert Kern wrote:
 [EMAIL PROTECTED] wrote:
  The same thing goes for the values(). Here most people will argue that
  values are not necessarily unique, so a list is more appropriate, but
  in fact the values are unique it is just that more than one key could
  map to the same value.  What is 'seen' in dictionary is the mapping
  rule.  Also the values are not ordered and should not be indexable --
  they should be a set. Just as  the keys(), one ordered list of values
  from a dictionary on one machine will not necessarily be equal to
  another list of values an on another machine, while in fact, they
  should be.

 This part is pretty much a non-starter. Not all Python objects are hashable.

 In [1]: d = {}

 In [2]: for i in range(1, 10):
 ...: d[i] = range(i)
 ...:
 ...:

 In [3]: set(d.values())
 ---
 exceptions.TypeError Traceback (most recent 
 call
 last)

 /Users/kern/ipython console

 TypeError: list objects are unhashable


 Also, I may need keys to map to different objects that happen to be equal.

 --
 Robert Kern

 I have come to believe that the whole world is an enigma, a harmless enigma
   that is made terrible by our own mad attempt to interpret it as though it 
 had
   an underlying truth.
-- Umberto Eco

-- 
http://mail.python.org/mailman/listinfo/python-list