Re: item access time: sets v. lists

2006-10-04 Thread Duncan Booth
"[EMAIL PROTECTED]" <[EMAIL PROTECTED]> wrote:

> A set, on the other hand, uses a hash table so finding an element
> takes constant time (it's one hash lookup, independent of the size of
> the set)--and determining an item isn't there is likewise constant
> time. 
> 
One hash calculation, but there could be collisions. Worst case for an n
element dict/set could be that it takes n attempts to find a value, 
although unless you try to do it deliberately by ensuring that the keys 
have identical hash values this won't happen in practice.

Paul McGuire wrote:
> Thanks, I stand corrected.  How do they know how big a hash table to
> use?  I think this is an interesting problem.

If I read the code correctly:

Minimum size is 8. Whenever more than 2/3 of the slots are in use 
(including slots where the element has been deleted) the dictionary is 
resized to the smallest power of 2 (greater than 8) which is greater than 4 
times the number of currently occupied slots (or 2 times the number of 
occupied slots when more than 5 slots are occupied). This can reduce 
the size if a lot of elements have been deleted.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: item access time: sets v. lists

2006-10-04 Thread Paul McGuire
"Duncan Booth" <[EMAIL PROTECTED]> wrote in message 
news:[EMAIL PROTECTED]
> "Paul McGuire" <[EMAIL PROTECTED]> wrote:
>
>> By contrast, a set (and also the keys in a dict) use a tree structure
>> to index more quickly into the list of items
>
> 'dict' and I believe also 'set' use a hash table, not a tree structure.

Thanks, I stand corrected.  How do they know how big a hash table to use?  I 
think this is an interesting problem.

-- Paul


-- 
http://mail.python.org/mailman/listinfo/python-list


Re: item access time: sets v. lists

2006-10-04 Thread Paul McGuire
"Neil Cerutti" <[EMAIL PROTECTED]> wrote in message 
news:[EMAIL PROTECTED]
> On 2006-10-04, Paul McGuire <[EMAIL PROTECTED]> wrote:
>> "Neil Cerutti" <[EMAIL PROTECTED]> wrote in message
>> news:[EMAIL PROTECTED]
>>
>
> It seems to be timing "testing for membership", not "random
> access". Random access is just seq[n]; at least, that's the
> assumption I made.
>
> Perhaps I read the test description out of context and got
> confused.
>

Oh, poor wording on my part, sorry.

You got me wondering what seq[n] would give me in the case of a set:

>>> z = set(range(100))
>>> z[30]
Traceback (most recent call last):
  File "", line 1, in ?
TypeError: unindexable object
>>>

It really doesn't make any sense since sets are unordered.  So we couldn't 
compare lists and sets in this manner anyway.

I *did* learn that dicts and sets use a hash table, not a tree, which also 
helps explain why the initial construction of the 1-element set takes so 
much longer than just simple assembly of the range into a list.

-- Paul


-- 
http://mail.python.org/mailman/listinfo/python-list


Re: item access time: sets v. lists

2006-10-04 Thread [EMAIL PROTECTED]
David Isaac wrote:
> Paul M. wrote:
> > Random access to item in list/set when item exists
> > set  -> 0.000241650824337
> > list -> 0.0245168031132
> >
> > Random access to item in list/set when item does not exist
> > set  -> 0.000187733357172
> > list -> 0.522086186932
>
>
> OK, that's a much better set of answers
> including to questions I did not
> even know I wanted to ask until
> I saw your post.  But, how to
> explain the above??

The code was flawed, but it illustrates a point.  To find an arbitrary
item in a list takes time proportional to the length of the list--on
average, you have to scan n/2 items in a list of length n to find it.
"x in list" really has to check item 0, item 1, item 2, etc until it
finds x or reaches the end of the list.

Obviously in a sorted list like this, finding 0 will be much faster
than finding  since you have to scan almost the whole list to find
the latter.  Of course, if you know the list is sorted you could use a
binary search (which will find the item in log(n) comparisons on
average).

So really to find a random item that is in the list is going to take
more like 0.250 seconds on average, give or take a little.

When the item doesn't occur in the list, you wind up having to check
every element to see if it's there.  (binary search would work for this
case as well for a sorted list).

A set, on the other hand, uses a hash table so finding an element takes
constant time (it's one hash lookup, independent of the size of the
set)--and determining an item isn't there is likewise constant time.

(if your data is degenerate and you have many things in the set that
all hash to the same value then the lookup could take longer, but
that's pretty unlikely to make a big difference in practice with python
sets and user data).

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: item access time: sets v. lists

2006-10-04 Thread Neil Cerutti
On 2006-10-04, Paul McGuire <[EMAIL PROTECTED]> wrote:
> "Neil Cerutti" <[EMAIL PROTECTED]> wrote in message 
> news:[EMAIL PROTECTED]
>
>> Look at the code again. It's not testing what it says it's
>> testing.
>
> It isnt?
>
> The only quibble I can see is that there really is no "first"
> element in a set.  I picked the "0 in set" and "0 in list" to
> pick the fastest case for list, which does a linear search
> through the list elements.
>
> Where did I go wrong on the test descriptions?

It seems to be timing "testing for membership", not "random
access". Random access is just seq[n]; at least, that's the
assumption I made.

Perhaps I read the test description out of context and got
confused.

-- 
Neil Cerutti
8 new choir robes are currently needed, due to the addition of
several new members and to the deterioration of some of the
older ones. --Church Bulletin Blooper 
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: item access time: sets v. lists

2006-10-04 Thread Duncan Booth
"Paul McGuire" <[EMAIL PROTECTED]> wrote:

> By contrast, a set (and also the keys in a dict) use a tree structure
> to index more quickly into the list of items

'dict' and I believe also 'set' use a hash table, not a tree structure.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: item access time: sets v. lists

2006-10-04 Thread Paul McGuire
"David Isaac" <[EMAIL PROTECTED]> wrote in message 
news:[EMAIL PROTECTED]
> Paul M. wrote:
>> Random access to item in list/set when item exists
>> set  -> 0.000241650824337
>> list -> 0.0245168031132
>>
>> Random access to item in list/set when item does not exist
>> set  -> 0.000187733357172
>> list -> 0.522086186932
>
>
> OK, that's a much better set of answers
> including to questions I did not
> even know I wanted to ask until
> I saw your post.  But, how to
> explain the above??
>
> Thanks,
> Alan Isaac
>
>

Searching for an item in a list is a linear search, and all 1 items must 
be checked before determining that  a non-existent item is not in the list, 
and sure enough, our list takes about 20 times as long to find that 10500 is 
not in the range 1 as it does to find 500 in the range 1.  By 
contrast, a set (and also the keys in a dict) use a tree structure to index 
more quickly into the list of items, so this access (and determination of 
non-existence) will be much faster, especially so for a large list.

-- Paul


-- 
http://mail.python.org/mailman/listinfo/python-list


Re: item access time: sets v. lists

2006-10-04 Thread Paul McGuire

"Neil Cerutti" <[EMAIL PROTECTED]> wrote in message 
news:[EMAIL PROTECTED]

> Look at the code again. It's not testing what it says it's
> testing.
>

It isnt?

The only quibble I can see is that there really is no "first" element in a 
set.  I picked the "0 in set" and "0 in list" to pick the fastest case for 
list, which does a linear search through the list elements.

Where did I go wrong on the test descriptions?

-- Paul


-- 
http://mail.python.org/mailman/listinfo/python-list


Re: item access time: sets v. lists

2006-10-04 Thread Neil Cerutti
On 2006-10-04, David Isaac <[EMAIL PROTECTED]> wrote:
> Paul M. wrote:
>> Random access to item in list/set when item exists
>> set  -> 0.000241650824337
>> list -> 0.0245168031132
>>
>> Random access to item in list/set when item does not exist
>> set  -> 0.000187733357172
>> list -> 0.522086186932
>
>
> OK, that's a much better set of answers
> including to questions I did not
> even know I wanted to ask until
> I saw your post.  But, how to
> explain the above??

Look at the code again. It's not testing what it says it's
testing.

print "\nRandom access to first item in list/set"
t1=timeit.Timer("0 in z","z = set(xrange(1))")
t2=timeit.Timer("0 in z", "z = list(xrange(1))")
print "set  ->", t1.timeit(1000)
print "list ->", t2.timeit(1000)

print "\nRandom access to item in list/set when item exists"
t1=timeit.Timer("500 in z","z = set(xrange(1))")
t2=timeit.Timer("500 in z", "z = list(xrange(1))")
print "set  ->", t1.timeit(1000)
print "list ->", t2.timeit(1000)

-- 
Neil Cerutti
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: item access time: sets v. lists

2006-10-04 Thread David Isaac
Paul M. wrote:
> Random access to item in list/set when item exists
> set  -> 0.000241650824337
> list -> 0.0245168031132
>
> Random access to item in list/set when item does not exist
> set  -> 0.000187733357172
> list -> 0.522086186932


OK, that's a much better set of answers
including to questions I did not
even know I wanted to ask until
I saw your post.  But, how to
explain the above??

Thanks,
Alan Isaac


-- 
http://mail.python.org/mailman/listinfo/python-list


Re: item access time: sets v. lists

2006-10-04 Thread Paul McGuire
"David Isaac" <[EMAIL PROTECTED]> wrote in message 
news:[EMAIL PROTECTED]
> Is it expected for access to set elements to be much
> slower than access to list elements?  Explanation?
> Thanks,
> Alan Isaac
>
 t1=timeit.Timer("for i in set(xrange(1)):pass","")
 t2=timeit.Timer("for i in list(xrange(1)):pass","")
 t1.timeit(1000)
> 9.806250235714316
 t2.timeit(1000)
> 3.9823075279120701
>
>
A couple of points:
1. Your use of timeit is a bit flawed.  You are not only timing the access 
into the set or list, but also the construction of the set/list, *every 
time*.  Use the second string arg (the one you are passing as "") for 
one-time initialization, so that you measure only access, which is what you 
say your are interested in.
2. Ooops, it turns out you're not really *accessing* the set/list, you are 
*iterating* over it.  Given that sets are implemented internally using a 
tree structure, I would expect that iterating over a list could be faster, 
although only marginally so.
3. Here are some stats for a little more correct timeits:

Construct list/set
set  -> 1.89524618352
list -> 0.299499796762

Iterate over list/set
set  -> 0.646887523414
list -> 0.552001162159

Random access to first item in list/set
set  -> 0.000189409547861
list -> 0.000160076210804

Random access to item in list/set when item exists
set  -> 0.000241650824337
list -> 0.0245168031132

Random access to item in list/set when item does not exist
set  -> 0.000187733357172
list -> 0.522086186932


The code:
import timeit

print "\nConstruct list/set"
t1=timeit.Timer("z = set(xrange(1))","")
t2=timeit.Timer("z = list(xrange(1))","")
print "set  ->", t1.timeit(1000)
print "list ->", t2.timeit(1000)

print "\nIterate over list/set"
t1=timeit.Timer("for i in z: pass","z = set(xrange(1))")
t2=timeit.Timer("for i in z: pass", "z = list(xrange(1))")
print "set  ->", t1.timeit(1000)
print "list ->", t2.timeit(1000)

print "\nRandom access to first item in list/set"
t1=timeit.Timer("0 in z","z = set(xrange(1))")
t2=timeit.Timer("0 in z", "z = list(xrange(1))")
print "set  ->", t1.timeit(1000)
print "list ->", t2.timeit(1000)

print "\nRandom access to item in list/set when item exists"
t1=timeit.Timer("500 in z","z = set(xrange(1))")
t2=timeit.Timer("500 in z", "z = list(xrange(1))")
print "set  ->", t1.timeit(1000)
print "list ->", t2.timeit(1000)

print "\nRandom access to item in list/set when item does not exist"
t1=timeit.Timer("10500 in z","z = set(xrange(1))")
t2=timeit.Timer("10500 in z", "z = list(xrange(1))")
print "set  ->", t1.timeit(1000)
print "list ->", t2.timeit(1000)


-- Paul


-- 
http://mail.python.org/mailman/listinfo/python-list


Re: item access time: sets v. lists

2006-10-04 Thread Dennis Benzinger
On Wed, 04 Oct 2006 16:02:56 GMT
"David Isaac" <[EMAIL PROTECTED]> wrote:

> Is it expected for access to set elements to be much
> slower than access to list elements?  Explanation?
> Thanks,
> Alan Isaac
> 
> >>> t1=timeit.Timer("for i in set(xrange(1)):pass","")
> >>> t2=timeit.Timer("for i in list(xrange(1)):pass","")
> [...]

You're measuring the time for creating the xrange and the set/list too.
Create them before you call Timer() and repeat your timing.


Dennis Benzinger
-- 
http://mail.python.org/mailman/listinfo/python-list


item access time: sets v. lists

2006-10-04 Thread David Isaac
Is it expected for access to set elements to be much
slower than access to list elements?  Explanation?
Thanks,
Alan Isaac

>>> t1=timeit.Timer("for i in set(xrange(1)):pass","")
>>> t2=timeit.Timer("for i in list(xrange(1)):pass","")
>>> t1.timeit(1000)
9.806250235714316
>>> t2.timeit(1000)
3.9823075279120701


-- 
http://mail.python.org/mailman/listinfo/python-list