Re: item access time: sets v. lists
"[EMAIL PROTECTED]" <[EMAIL PROTECTED]> wrote: > A set, on the other hand, uses a hash table so finding an element > takes constant time (it's one hash lookup, independent of the size of > the set)--and determining an item isn't there is likewise constant > time. > One hash calculation, but there could be collisions. Worst case for an n element dict/set could be that it takes n attempts to find a value, although unless you try to do it deliberately by ensuring that the keys have identical hash values this won't happen in practice. Paul McGuire wrote: > Thanks, I stand corrected. How do they know how big a hash table to > use? I think this is an interesting problem. If I read the code correctly: Minimum size is 8. Whenever more than 2/3 of the slots are in use (including slots where the element has been deleted) the dictionary is resized to the smallest power of 2 (greater than 8) which is greater than 4 times the number of currently occupied slots (or 2 times the number of occupied slots when more than 5 slots are occupied). This can reduce the size if a lot of elements have been deleted. -- http://mail.python.org/mailman/listinfo/python-list
Re: item access time: sets v. lists
"Duncan Booth" <[EMAIL PROTECTED]> wrote in message news:[EMAIL PROTECTED] > "Paul McGuire" <[EMAIL PROTECTED]> wrote: > >> By contrast, a set (and also the keys in a dict) use a tree structure >> to index more quickly into the list of items > > 'dict' and I believe also 'set' use a hash table, not a tree structure. Thanks, I stand corrected. How do they know how big a hash table to use? I think this is an interesting problem. -- Paul -- http://mail.python.org/mailman/listinfo/python-list
Re: item access time: sets v. lists
"Neil Cerutti" <[EMAIL PROTECTED]> wrote in message news:[EMAIL PROTECTED] > On 2006-10-04, Paul McGuire <[EMAIL PROTECTED]> wrote: >> "Neil Cerutti" <[EMAIL PROTECTED]> wrote in message >> news:[EMAIL PROTECTED] >> > > It seems to be timing "testing for membership", not "random > access". Random access is just seq[n]; at least, that's the > assumption I made. > > Perhaps I read the test description out of context and got > confused. > Oh, poor wording on my part, sorry. You got me wondering what seq[n] would give me in the case of a set: >>> z = set(range(100)) >>> z[30] Traceback (most recent call last): File "", line 1, in ? TypeError: unindexable object >>> It really doesn't make any sense since sets are unordered. So we couldn't compare lists and sets in this manner anyway. I *did* learn that dicts and sets use a hash table, not a tree, which also helps explain why the initial construction of the 1-element set takes so much longer than just simple assembly of the range into a list. -- Paul -- http://mail.python.org/mailman/listinfo/python-list
Re: item access time: sets v. lists
David Isaac wrote: > Paul M. wrote: > > Random access to item in list/set when item exists > > set -> 0.000241650824337 > > list -> 0.0245168031132 > > > > Random access to item in list/set when item does not exist > > set -> 0.000187733357172 > > list -> 0.522086186932 > > > OK, that's a much better set of answers > including to questions I did not > even know I wanted to ask until > I saw your post. But, how to > explain the above?? The code was flawed, but it illustrates a point. To find an arbitrary item in a list takes time proportional to the length of the list--on average, you have to scan n/2 items in a list of length n to find it. "x in list" really has to check item 0, item 1, item 2, etc until it finds x or reaches the end of the list. Obviously in a sorted list like this, finding 0 will be much faster than finding since you have to scan almost the whole list to find the latter. Of course, if you know the list is sorted you could use a binary search (which will find the item in log(n) comparisons on average). So really to find a random item that is in the list is going to take more like 0.250 seconds on average, give or take a little. When the item doesn't occur in the list, you wind up having to check every element to see if it's there. (binary search would work for this case as well for a sorted list). A set, on the other hand, uses a hash table so finding an element takes constant time (it's one hash lookup, independent of the size of the set)--and determining an item isn't there is likewise constant time. (if your data is degenerate and you have many things in the set that all hash to the same value then the lookup could take longer, but that's pretty unlikely to make a big difference in practice with python sets and user data). -- http://mail.python.org/mailman/listinfo/python-list
Re: item access time: sets v. lists
On 2006-10-04, Paul McGuire <[EMAIL PROTECTED]> wrote: > "Neil Cerutti" <[EMAIL PROTECTED]> wrote in message > news:[EMAIL PROTECTED] > >> Look at the code again. It's not testing what it says it's >> testing. > > It isnt? > > The only quibble I can see is that there really is no "first" > element in a set. I picked the "0 in set" and "0 in list" to > pick the fastest case for list, which does a linear search > through the list elements. > > Where did I go wrong on the test descriptions? It seems to be timing "testing for membership", not "random access". Random access is just seq[n]; at least, that's the assumption I made. Perhaps I read the test description out of context and got confused. -- Neil Cerutti 8 new choir robes are currently needed, due to the addition of several new members and to the deterioration of some of the older ones. --Church Bulletin Blooper -- http://mail.python.org/mailman/listinfo/python-list
Re: item access time: sets v. lists
"Paul McGuire" <[EMAIL PROTECTED]> wrote: > By contrast, a set (and also the keys in a dict) use a tree structure > to index more quickly into the list of items 'dict' and I believe also 'set' use a hash table, not a tree structure. -- http://mail.python.org/mailman/listinfo/python-list
Re: item access time: sets v. lists
"David Isaac" <[EMAIL PROTECTED]> wrote in message news:[EMAIL PROTECTED] > Paul M. wrote: >> Random access to item in list/set when item exists >> set -> 0.000241650824337 >> list -> 0.0245168031132 >> >> Random access to item in list/set when item does not exist >> set -> 0.000187733357172 >> list -> 0.522086186932 > > > OK, that's a much better set of answers > including to questions I did not > even know I wanted to ask until > I saw your post. But, how to > explain the above?? > > Thanks, > Alan Isaac > > Searching for an item in a list is a linear search, and all 1 items must be checked before determining that a non-existent item is not in the list, and sure enough, our list takes about 20 times as long to find that 10500 is not in the range 1 as it does to find 500 in the range 1. By contrast, a set (and also the keys in a dict) use a tree structure to index more quickly into the list of items, so this access (and determination of non-existence) will be much faster, especially so for a large list. -- Paul -- http://mail.python.org/mailman/listinfo/python-list
Re: item access time: sets v. lists
"Neil Cerutti" <[EMAIL PROTECTED]> wrote in message news:[EMAIL PROTECTED] > Look at the code again. It's not testing what it says it's > testing. > It isnt? The only quibble I can see is that there really is no "first" element in a set. I picked the "0 in set" and "0 in list" to pick the fastest case for list, which does a linear search through the list elements. Where did I go wrong on the test descriptions? -- Paul -- http://mail.python.org/mailman/listinfo/python-list
Re: item access time: sets v. lists
On 2006-10-04, David Isaac <[EMAIL PROTECTED]> wrote: > Paul M. wrote: >> Random access to item in list/set when item exists >> set -> 0.000241650824337 >> list -> 0.0245168031132 >> >> Random access to item in list/set when item does not exist >> set -> 0.000187733357172 >> list -> 0.522086186932 > > > OK, that's a much better set of answers > including to questions I did not > even know I wanted to ask until > I saw your post. But, how to > explain the above?? Look at the code again. It's not testing what it says it's testing. print "\nRandom access to first item in list/set" t1=timeit.Timer("0 in z","z = set(xrange(1))") t2=timeit.Timer("0 in z", "z = list(xrange(1))") print "set ->", t1.timeit(1000) print "list ->", t2.timeit(1000) print "\nRandom access to item in list/set when item exists" t1=timeit.Timer("500 in z","z = set(xrange(1))") t2=timeit.Timer("500 in z", "z = list(xrange(1))") print "set ->", t1.timeit(1000) print "list ->", t2.timeit(1000) -- Neil Cerutti -- http://mail.python.org/mailman/listinfo/python-list
Re: item access time: sets v. lists
Paul M. wrote: > Random access to item in list/set when item exists > set -> 0.000241650824337 > list -> 0.0245168031132 > > Random access to item in list/set when item does not exist > set -> 0.000187733357172 > list -> 0.522086186932 OK, that's a much better set of answers including to questions I did not even know I wanted to ask until I saw your post. But, how to explain the above?? Thanks, Alan Isaac -- http://mail.python.org/mailman/listinfo/python-list
Re: item access time: sets v. lists
"David Isaac" <[EMAIL PROTECTED]> wrote in message news:[EMAIL PROTECTED] > Is it expected for access to set elements to be much > slower than access to list elements? Explanation? > Thanks, > Alan Isaac > t1=timeit.Timer("for i in set(xrange(1)):pass","") t2=timeit.Timer("for i in list(xrange(1)):pass","") t1.timeit(1000) > 9.806250235714316 t2.timeit(1000) > 3.9823075279120701 > > A couple of points: 1. Your use of timeit is a bit flawed. You are not only timing the access into the set or list, but also the construction of the set/list, *every time*. Use the second string arg (the one you are passing as "") for one-time initialization, so that you measure only access, which is what you say your are interested in. 2. Ooops, it turns out you're not really *accessing* the set/list, you are *iterating* over it. Given that sets are implemented internally using a tree structure, I would expect that iterating over a list could be faster, although only marginally so. 3. Here are some stats for a little more correct timeits: Construct list/set set -> 1.89524618352 list -> 0.299499796762 Iterate over list/set set -> 0.646887523414 list -> 0.552001162159 Random access to first item in list/set set -> 0.000189409547861 list -> 0.000160076210804 Random access to item in list/set when item exists set -> 0.000241650824337 list -> 0.0245168031132 Random access to item in list/set when item does not exist set -> 0.000187733357172 list -> 0.522086186932 The code: import timeit print "\nConstruct list/set" t1=timeit.Timer("z = set(xrange(1))","") t2=timeit.Timer("z = list(xrange(1))","") print "set ->", t1.timeit(1000) print "list ->", t2.timeit(1000) print "\nIterate over list/set" t1=timeit.Timer("for i in z: pass","z = set(xrange(1))") t2=timeit.Timer("for i in z: pass", "z = list(xrange(1))") print "set ->", t1.timeit(1000) print "list ->", t2.timeit(1000) print "\nRandom access to first item in list/set" t1=timeit.Timer("0 in z","z = set(xrange(1))") t2=timeit.Timer("0 in z", "z = list(xrange(1))") print "set ->", t1.timeit(1000) print "list ->", t2.timeit(1000) print "\nRandom access to item in list/set when item exists" t1=timeit.Timer("500 in z","z = set(xrange(1))") t2=timeit.Timer("500 in z", "z = list(xrange(1))") print "set ->", t1.timeit(1000) print "list ->", t2.timeit(1000) print "\nRandom access to item in list/set when item does not exist" t1=timeit.Timer("10500 in z","z = set(xrange(1))") t2=timeit.Timer("10500 in z", "z = list(xrange(1))") print "set ->", t1.timeit(1000) print "list ->", t2.timeit(1000) -- Paul -- http://mail.python.org/mailman/listinfo/python-list
Re: item access time: sets v. lists
On Wed, 04 Oct 2006 16:02:56 GMT "David Isaac" <[EMAIL PROTECTED]> wrote: > Is it expected for access to set elements to be much > slower than access to list elements? Explanation? > Thanks, > Alan Isaac > > >>> t1=timeit.Timer("for i in set(xrange(1)):pass","") > >>> t2=timeit.Timer("for i in list(xrange(1)):pass","") > [...] You're measuring the time for creating the xrange and the set/list too. Create them before you call Timer() and repeat your timing. Dennis Benzinger -- http://mail.python.org/mailman/listinfo/python-list
item access time: sets v. lists
Is it expected for access to set elements to be much slower than access to list elements? Explanation? Thanks, Alan Isaac >>> t1=timeit.Timer("for i in set(xrange(1)):pass","") >>> t2=timeit.Timer("for i in list(xrange(1)):pass","") >>> t1.timeit(1000) 9.806250235714316 >>> t2.timeit(1000) 3.9823075279120701 -- http://mail.python.org/mailman/listinfo/python-list