Re: [Tutor] Why do sets use 6 times as much memory as lists?
On 05/07/2013 01:10 PM, Bjorn Madsen wrote: import sys L=[x for x in range(1)] sys.getsizeof(L) 43816 L={x for x in range(1)} sys.getsizeof(L) 262260 ? kind regards, Bjorn Just curious: what did you expect? Sets have a different purpose, basically to be able to do an 'in' operation instantly, while lists have to scan every item in the list. I figure the ratio to be more like 75% larger, since the 10,000 objects are going to take up maybe 24 bytes each. -- DaveA ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Why do sets use 6 times as much memory as lists?
On 7 May 2013 18:10, Bjorn Madsen bjorn.h.mad...@googlemail.com wrote: import sys L=[x for x in range(1)] sys.getsizeof(L) 43816 L={x for x in range(1)} sys.getsizeof(L) 262260 Firstly, these results may vary depending on operating system, processor architecture and build options so this won't always be exactly the same. I do get the same numbers as you, however, on standard python 2.7 on 32-bit Windows. So why do sets use extra memory? Under the hood a list is an array which stores a pointer in each slot with no gaps between the pointers. A set uses a hash-table which is an array that stores pointers at randomish positions and only fills about half of its spaces. This causes it to need twice as much memory for all the gaps in its array. On top of that Python sets use a resizable hash table and to make resizing efficient the hash of each object is stored in addition to its pointer. This essentially requires a whole extra array so it doubles your storage requirements again. With these two I would expect a set to be something like 4 times bigger so the 6 times bigger that you report seems reasonable to me (I may have missed something else in this). Oscar ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Why do sets use 6 times as much memory as lists?
Hi, Thanks for the quick response. Being curious I actually expected something like this: L={x:None for x in range(1)} sys.getsizeof(L) 196660 That is why I wondered why 6 times is a lot given that a dict can do the same at 3/4 of the mem-footprint. Just got surprised about the overhead as sets some are more lightweight than dictionaries. additional overhead must have something to do with set-specific operations, I imagine such as: set1 - set2 set | set2 set1 set2 set1 = set2 Thanks anyway! On 7 May 2013 18:53, Oscar Benjamin oscar.j.benja...@gmail.com wrote: On 7 May 2013 18:10, Bjorn Madsen bjorn.h.mad...@googlemail.com wrote: import sys L=[x for x in range(1)] sys.getsizeof(L) 43816 L={x for x in range(1)} sys.getsizeof(L) 262260 Firstly, these results may vary depending on operating system, processor architecture and build options so this won't always be exactly the same. I do get the same numbers as you, however, on standard python 2.7 on 32-bit Windows. So why do sets use extra memory? Under the hood a list is an array which stores a pointer in each slot with no gaps between the pointers. A set uses a hash-table which is an array that stores pointers at randomish positions and only fills about half of its spaces. This causes it to need twice as much memory for all the gaps in its array. On top of that Python sets use a resizable hash table and to make resizing efficient the hash of each object is stored in addition to its pointer. This essentially requires a whole extra array so it doubles your storage requirements again. With these two I would expect a set to be something like 4 times bigger so the 6 times bigger that you report seems reasonable to me (I may have missed something else in this). Oscar ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Why do sets use 6 times as much memory as lists?
On 7 May 2013 19:09, Bjorn Madsen bjorn.h.mad...@googlemail.com wrote: Hi, Thanks for the quick response. Being curious I actually expected something like this: L={x:None for x in range(1)} sys.getsizeof(L) 196660 That is why I wondered why 6 times is a lot given that a dict can do the same at 3/4 of the mem-footprint. Just got surprised about the overhead as sets some are more lightweight than dictionaries. I think that the 3/4 is pretty much random. sets and dicts resize peroidically depending on how they are used. additional overhead must have something to do with set-specific operations, I imagine such as: set1 - set2 set | set2 set1 set2 set1 = set2 I don't think so. Here's the results on a 64-bit Linux machine (Python 2.7): import sys sys.getsizeof([x for x in range(1)]) 87632 sys.getsizeof({x for x in range(1)}) 524520 sys.getsizeof({x: None for x in range(1)}) 786712 So in this case the set ends up being 2/3 the size of the dict. As I said before these results will not be consistent from one computer to another. Oscar ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Why do sets use 6 times as much memory as lists?
On 08/05/13 03:10, Bjorn Madsen wrote: import sys L=[x for x in range(1)] sys.getsizeof(L) 43816 L={x for x in range(1)} sys.getsizeof(L) 262260 ? Both sets and lists may be over-allocated. I expect that lists created with a list comprehension may not be over-allocated, but if you append a single item to it, Python *may* resize it and quadruple or double the size. If you don't over-allocate, then *every* append becomes slow: [a, b, c, d] append e: copy the list, plus one extra slot: [a, b, c, d] [a, b, c, d, UNUSED] insert e: [a, b, c, d] [a, b, c, d, e] delete the original: [a, b, c, d, e] Instead of adding just one extra slot, Python quadruples (for small lists) or doubles the size, so that *nearly always* your list has plenty of extra slots to append into instantly. This makes appending an almost-constant time operation, on average, regardless of how big your list is, and big list will never use more than twice the amount of memory needed. That's a good trade off of space versus time (memory versus speed). Sets, and dicts, are also over-allocated, because of the nature of how hash tables work. They also trade off space for time. The hash table is a big array of slots, some of which are flagged as empty, some of which have an item in them: [EMPTY, EMPTY, b, EMPTY, d, c, EMPTY, EMPTY, EMPTY, a, EMPTY, e] When you do a lookup on (say) c, Python calculates a hash of c to get the index to look. Say it gets index 5, it looks at index 5 and sees c is there. This means it only needs to look in one slot instead of up to 12. The more empty slots, the better the hash table will perform. If you would like more details on how hash tables work, you can google it, or just ask here, but the short answer is that in order to get extremely fast almost constant-time insertion, deletion and lookup of items in dicts and sets, they trade off some memory for speed. -- Steven ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Why do sets use 6 times as much memory as lists?
On Tue, May 7, 2013 at 2:09 PM, Bjorn Madsen bjorn.h.mad...@googlemail.com wrote: Being curious I actually expected something like this: L={x:None for x in range(1)} sys.getsizeof(L) 196660 That is why I wondered why 6 times is a lot given that a dict can do the same at 3/4 of the mem-footprint. Just got surprised about the overhead as sets some are more lightweight than dictionaries. Minus the basic object size, for the same size table a set should be about 2/3 the size of a dict. I suppose you're using Python 3.3. To reduce memory usage, the dict type in 3.3 allows splitting tables to share keys/hashes among instances: http://www.python.org/dev/peps/pep-0412 That's not directly related to the size difference here, but at the same time it was decided to also cut the initial growth rate to save memory. In previous versions when the table is 2/3 full it gets quadrupled in size, which slows to doubling after 50,000 active keys. In 3.3 a dict always uses doubling. So in 3.2 a fresh dict or set with 10,000 entries will have a table sized 32768, while in 3.3 the table is sized 16384 for a dict and 32768 for a set. Keep in mind that as keys are added and removed the table sizes will diverge, even for same number of active keys. Dummy keys left in the table count toward the table utilization, but get purged by a resize. ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Why do sets use 6 times as much memory as lists?
On Tue, May 7, 2013 at 9:30 PM, eryksun eryk...@gmail.com wrote: That is why I wondered why 6 times is a lot given that a dict can do the same at 3/4 of the mem-footprint. I hope it's clear that 3/4 here comes from 1/2 * 3/2. In other words the dict table has 1/2 the number of entries, and each entry is 3/2 times the size, accounting for the pointer to the value. ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Why do sets use 6 times as much memory as lists?
Following Oscar's comment, It's also O(n) vs. O(1) tradeoff. --wesley On Tue, May 7, 2013 at 12:09 PM, Oscar Benjamin oscar.j.benja...@gmail.comwrote: On 7 May 2013 19:09, Bjorn Madsen bjorn.h.mad...@googlemail.com wrote: Hi, Thanks for the quick response. Being curious I actually expected something like this: L={x:None for x in range(1)} sys.getsizeof(L) 196660 That is why I wondered why 6 times is a lot given that a dict can do the same at 3/4 of the mem-footprint. Just got surprised about the overhead as sets some are more lightweight than dictionaries. I think that the 3/4 is pretty much random. sets and dicts resize peroidically depending on how they are used. additional overhead must have something to do with set-specific operations, I imagine such as: set1 - set2 set | set2 set1 set2 set1 = set2 I don't think so. Here's the results on a 64-bit Linux machine (Python 2.7): import sys sys.getsizeof([x for x in range(1)]) 87632 sys.getsizeof({x for x in range(1)}) 524520 sys.getsizeof({x: None for x in range(1)}) 786712 So in this case the set ends up being 2/3 the size of the dict. As I said before these results will not be consistent from one computer to another. Oscar ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor -- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - A computer never does what you want... only what you tell it. +wesley chun : wescpy at gmail : @wescpy Python training consulting : http://CyberwebConsulting.com Core Python books : http://CorePython.com Python blog: http://wescpy.blogspot.com ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor