Re: [Tutor] Why do sets use 6 times as much memory as lists?

2013-05-07 Thread Dave Angel

On 05/07/2013 01:10 PM, Bjorn Madsen wrote:

import sys
L=[x for x in range(1)]
sys.getsizeof(L)

43816

L={x for x in range(1)}
sys.getsizeof(L)

262260
?
kind regards,
Bjorn


Just curious:  what did you expect?  Sets have a different purpose, 
basically to be able to do an 'in' operation instantly, while lists have 
to scan every item in the list.


I figure the ratio to be more like 75% larger, since the 10,000 objects 
are going to take up maybe 24 bytes each.


--
DaveA
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Why do sets use 6 times as much memory as lists?

2013-05-07 Thread Oscar Benjamin
On 7 May 2013 18:10, Bjorn Madsen bjorn.h.mad...@googlemail.com wrote:
 import sys
 L=[x for x in range(1)]
 sys.getsizeof(L)
 43816
 L={x for x in range(1)}
 sys.getsizeof(L)
 262260

Firstly, these results may vary depending on operating system,
processor architecture and build options so this won't always be
exactly the same. I do get the same numbers as you, however, on
standard python 2.7 on 32-bit Windows.

So why do sets use extra memory? Under the hood a list is an array
which stores a pointer in each slot with no gaps between the pointers.

A set uses a hash-table which is an array that stores pointers at
randomish positions and only fills about half of its spaces. This
causes it to need twice as much memory for all the gaps in its array.
On top of that Python sets use a resizable hash table and to make
resizing efficient the hash of each object is stored in addition to
its pointer. This essentially requires a whole extra array so it
doubles your storage requirements again. With these two I would expect
a set to be something like 4 times bigger so the 6 times bigger that
you report seems reasonable to me (I may have missed something else in
this).


Oscar
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Why do sets use 6 times as much memory as lists?

2013-05-07 Thread Bjorn Madsen
Hi, Thanks for the quick response.

Being curious I actually expected something like this:
 L={x:None for x in range(1)}
 sys.getsizeof(L)
196660


That is why I wondered why 6 times is a lot given that a dict can do the
same at 3/4 of the mem-footprint.
Just got surprised about the overhead as sets some are more lightweight
than dictionaries.

additional overhead must have something to do with set-specific operations,
I imagine such as:
set1 - set2
set | set2
set1  set2
set1 = set2


Thanks anyway!



On 7 May 2013 18:53, Oscar Benjamin oscar.j.benja...@gmail.com wrote:

 On 7 May 2013 18:10, Bjorn Madsen bjorn.h.mad...@googlemail.com wrote:
  import sys
  L=[x for x in range(1)]
  sys.getsizeof(L)
  43816
  L={x for x in range(1)}
  sys.getsizeof(L)
  262260

 Firstly, these results may vary depending on operating system,
 processor architecture and build options so this won't always be
 exactly the same. I do get the same numbers as you, however, on
 standard python 2.7 on 32-bit Windows.

 So why do sets use extra memory? Under the hood a list is an array
 which stores a pointer in each slot with no gaps between the pointers.

 A set uses a hash-table which is an array that stores pointers at
 randomish positions and only fills about half of its spaces. This
 causes it to need twice as much memory for all the gaps in its array.
 On top of that Python sets use a resizable hash table and to make
 resizing efficient the hash of each object is stored in addition to
 its pointer. This essentially requires a whole extra array so it
 doubles your storage requirements again. With these two I would expect
 a set to be something like 4 times bigger so the 6 times bigger that
 you report seems reasonable to me (I may have missed something else in
 this).


 Oscar

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Why do sets use 6 times as much memory as lists?

2013-05-07 Thread Oscar Benjamin
On 7 May 2013 19:09, Bjorn Madsen bjorn.h.mad...@googlemail.com wrote:
 Hi, Thanks for the quick response.

 Being curious I actually expected something like this:
 L={x:None for x in range(1)}
 sys.getsizeof(L)
 196660


 That is why I wondered why 6 times is a lot given that a dict can do the
 same at 3/4 of the mem-footprint.
 Just got surprised about the overhead as sets some are more lightweight
 than dictionaries.

I think that the 3/4 is pretty much random. sets and dicts resize
peroidically depending on how they are used.

 additional overhead must have something to do with set-specific operations,
 I imagine such as:
 set1 - set2
 set | set2
 set1  set2
 set1 = set2
 

I don't think so. Here's the results on a 64-bit Linux machine (Python 2.7):
 import sys
 sys.getsizeof([x for x in range(1)])
87632
 sys.getsizeof({x for x in range(1)})
524520
 sys.getsizeof({x: None for x in range(1)})
786712

So in this case the set ends up being 2/3 the size of the dict. As I
said before these results will not be consistent from one computer to
another.


Oscar
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Why do sets use 6 times as much memory as lists?

2013-05-07 Thread Steven D'Aprano

On 08/05/13 03:10, Bjorn Madsen wrote:

import sys
L=[x for x in range(1)]
sys.getsizeof(L)

43816

L={x for x in range(1)}
sys.getsizeof(L)

262260
?



Both sets and lists may be over-allocated. I expect that lists created with a 
list comprehension may not be over-allocated, but if you append a single item 
to it, Python *may* resize it and quadruple or double the size. If you don't 
over-allocate, then *every* append becomes slow:

[a, b, c, d] append e:

copy the list, plus one extra slot:

[a, b, c, d]
[a, b, c, d, UNUSED]

insert e:

[a, b, c, d]
[a, b, c, d, e]

delete the original:

[a, b, c, d, e]


Instead of adding just one extra slot, Python quadruples (for small lists) or 
doubles the size, so that *nearly always* your list has plenty of extra slots 
to append into instantly. This makes appending an almost-constant time 
operation, on average, regardless of how big your list is, and big list will 
never use more than twice the amount of memory needed. That's a good trade off 
of space versus time (memory versus speed).


Sets, and dicts, are also over-allocated, because of the nature of how hash tables work. 
They also trade off space for time. The hash table is a big array of slots, some of which 
are flagged as empty, some of which have an item in them:

[EMPTY, EMPTY, b, EMPTY, d, c, EMPTY, EMPTY, EMPTY, a, EMPTY, e]

When you do a lookup on (say) c, Python calculates a hash of c to get the index 
to look. Say it gets index 5, it looks at index 5 and sees c is there. This 
means it only needs to look in one slot instead of up to 12.

The more empty slots, the better the hash table will perform. If you would like 
more details on how hash tables work, you can google it, or just ask here, but 
the short answer is that in order to get extremely fast almost constant-time 
insertion, deletion and lookup of items in dicts and sets, they trade off some 
memory for speed.


--
Steven
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Why do sets use 6 times as much memory as lists?

2013-05-07 Thread eryksun
On Tue, May 7, 2013 at 2:09 PM, Bjorn Madsen
bjorn.h.mad...@googlemail.com wrote:

 Being curious I actually expected something like this:
 L={x:None for x in range(1)}
 sys.getsizeof(L)
 196660

 That is why I wondered why 6 times is a lot given that a dict can do the
 same at 3/4 of the mem-footprint. Just got surprised about the overhead
 as sets some are more lightweight than dictionaries.

Minus the basic object size, for the same size table a set should be
about 2/3 the size of a dict.

I suppose you're using Python 3.3. To reduce memory usage, the dict
type in 3.3 allows splitting tables to share keys/hashes among
instances:

http://www.python.org/dev/peps/pep-0412

That's not directly related to the size difference here, but at the
same time it was decided to also cut the initial growth rate to save
memory. In previous versions when the table is 2/3 full it gets
quadrupled in size, which slows to doubling after 50,000 active keys.
In 3.3 a dict always uses doubling.

So in 3.2 a fresh dict or set with 10,000 entries will have a table
sized 32768, while in 3.3 the table is sized 16384 for a dict and
32768 for a set. Keep in mind that as keys are added and removed the
table sizes will diverge, even for same number of active keys. Dummy
keys left in the table count toward the table utilization, but get
purged by a resize.
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Why do sets use 6 times as much memory as lists?

2013-05-07 Thread eryksun
On Tue, May 7, 2013 at 9:30 PM, eryksun eryk...@gmail.com wrote:

 That is why I wondered why 6 times is a lot given that a dict can do the
 same at 3/4 of the mem-footprint.

I hope it's clear that 3/4 here comes from 1/2 * 3/2. In other words
the dict table has 1/2 the number of entries, and each entry is 3/2
times the size, accounting for the pointer to the value.
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Why do sets use 6 times as much memory as lists?

2013-05-07 Thread wesley chun
Following Oscar's comment, It's also O(n) vs. O(1) tradeoff.

--wesley


On Tue, May 7, 2013 at 12:09 PM, Oscar Benjamin
oscar.j.benja...@gmail.comwrote:

 On 7 May 2013 19:09, Bjorn Madsen bjorn.h.mad...@googlemail.com wrote:
  Hi, Thanks for the quick response.
 
  Being curious I actually expected something like this:
  L={x:None for x in range(1)}
  sys.getsizeof(L)
  196660
 
 
  That is why I wondered why 6 times is a lot given that a dict can do the
  same at 3/4 of the mem-footprint.
  Just got surprised about the overhead as sets some are more lightweight
  than dictionaries.

 I think that the 3/4 is pretty much random. sets and dicts resize
 peroidically depending on how they are used.

  additional overhead must have something to do with set-specific
 operations,
  I imagine such as:
  set1 - set2
  set | set2
  set1  set2
  set1 = set2
  

 I don't think so. Here's the results on a 64-bit Linux machine (Python
 2.7):
  import sys
  sys.getsizeof([x for x in range(1)])
 87632
  sys.getsizeof({x for x in range(1)})
 524520
  sys.getsizeof({x: None for x in range(1)})
 786712

 So in this case the set ends up being 2/3 the size of the dict. As I
 said before these results will not be consistent from one computer to
 another.


 Oscar
 ___
 Tutor maillist  -  Tutor@python.org
 To unsubscribe or change subscription options:
 http://mail.python.org/mailman/listinfo/tutor




-- 
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
A computer never does what you want... only what you tell it.
+wesley chun : wescpy at gmail : @wescpy
Python training  consulting : http://CyberwebConsulting.com
Core Python books : http://CorePython.com
Python blog: http://wescpy.blogspot.com
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor