subject:"best way to determine sequence ordering\?"

best way to determine sequence ordering?

2006-04-28 Thread John Salerno

If I want to make a list of four items, e.g. L = ['C', 'A', 'D', 'B'], 
and then figure out if a certain element precedes another element, what 
would be the best way to do that?

Looking at the built-in list functions, I thought I could do something like:

if L.index('A') < L.index('D'):
 # do some stuff

But I didn't know if maybe there was a preferred method for this type of 
operation, or perhaps a better function to use to figure out the 
ordering. Or even if I wanted to write my own utility function to make 
it look cleaner than above, I'd still need to use something like above 
in my function, so I wanted to know what might be the 'cleanest' looking 
solution.

Thanks.
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: best way to determine sequence ordering?

2006-04-28 Thread gry

index is about the best you can do with the structure you're using.
If you made the "items" instances of a class, then you could define a
__cmp__ member, which would let you do:

a=Item('A')
b=Item('B')
if ahttp://mail.python.org/mailman/listinfo/python-list

Re: best way to determine sequence ordering?

2006-04-28 Thread John Salerno

gry@ll.mit.edu wrote:
> index is about the best you can do with the structure you're using.
> If you made the "items" instances of a class, then you could define a
> __cmp__ member, which would let you do:
> 
> a=Item('A')
> b=Item('B')
> if a 
> The Item class could use any of various means to maintain order
> information. If there are not too many values, it could have a
> dictionary storing an integer for the order:
> 
> class Item(object):
>def __init__(self, value):
>   self.val=value
>   self.order = dict(c=0, a=1, d=2, b=3)
>def __cmp__(self, other):
>   return cmp(self.order[self.val], self.order[other.val])
> 
> If you don't care about performance, or you find it clearer, just use:
>   self.order = ['C', 'A', 'D', 'B']
> and
>def __cmp__(self, other):
>   return cmp(self.order.index(self.value),
> self.order.index(other.value))
> 
> 
> -- George Young
> 

Thanks. As I progressed with my little project, I was beginning to 
wonder about making a class, so your suggestions might be helpful if I 
convert it to that.
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: best way to determine sequence ordering?

2006-04-28 Thread Edward Elliott

gry@ll.mit.edu wrote:
> class Item(object):
>def __init__(self, value):
>   self.val=value
>   self.order = dict(c=0, a=1, d=2, b=3)
>def __cmp__(self, other):
>   return cmp(self.order[self.val], self.order[other.val])

An object that keeps track of the order it's stored in an external container
seems bass-ackwards to me (if you'll pardon the expression).  How do you
update the order?  Why does every object need to carry around a global
ordering?  What happens if you need two separate orderings like a,b,c,d and
a,c,d,b?  And it isn't clear at all what the < operator is comparing in
your example:

a=Item('A')
b=Item('B')
if a < b: something

I'd put the ordering logic in the container instead.  Something like this:

class mylist(list):
def before (me, a, b):
return me.index(a) < me.index(b)

l = mylist (['C', 'A', 'D', 'B'])
if l.before('A', 'D'):
something

This seems clearer and more flexible.  You'll still have to handle
exceptions when a or b aren't in the list.

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: best way to determine sequence ordering?

2006-04-28 Thread Steven Bethard

John Salerno wrote:
> If I want to make a list of four items, e.g. L = ['C', 'A', 'D', 'B'], 
> and then figure out if a certain element precedes another element, what 
> would be the best way to do that?
> 
> Looking at the built-in list functions, I thought I could do something 
> like:
> 
> if L.index('A') < L.index('D'):
> # do some stuff

This is probably a pretty reasonable approach as long as you're not too 
worried about efficiency.  It scans the list twice though, so if you 
start doing this with really big lists, it might be better to do 
something like:

 >>> L = ['C', 'A', 'D', 'B']
 >>> positions = dict((item, i) for i, item in enumerate(L))
 >>> positions
{'A': 1, 'C': 0, 'B': 3, 'D': 2}
 >>> positions['A'] < positions['D']
True

If you care about memory in addition to speed, you could do something like:

 >>> L = ['C', 'A', 'D', 'B']
 >>> items_to_compare = ['A', 'D']
 >>> positions = dict(
... (item, i)
... for i, item in enumerate(L)
... if item in items_to_compare
... )
 >>> positions
{'A': 1, 'D': 2}
 >>> positions['A'] < positions['D']
True

STeVe
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: best way to determine sequence ordering?

2006-04-28 Thread nikie

Steven Bethard wrote:

> John Salerno wrote:
> > If I want to make a list of four items, e.g. L = ['C', 'A', 'D', 'B'],
> > and then figure out if a certain element precedes another element, what
> > would be the best way to do that?
> >
> > Looking at the built-in list functions, I thought I could do something
> > like:
> >
> > if L.index('A') < L.index('D'):
> > # do some stuff
>
> This is probably a pretty reasonable approach as long as you're not too
> worried about efficiency.  It scans the list twice though, so if you
> start doing this with really big lists, it might be better to do
> something like:

On average, it'll only have go through half the list twice, as it can
break as soon as it finds the item it searches for. Don't think you can
get any more efficient than that.

>  >>> L = ['C', 'A', 'D', 'B']
>  >>> positions = dict((item, i) for i, item in enumerate(L))
>  >>> positions
> {'A': 1, 'C': 0, 'B': 3, 'D': 2}
>  >>> positions['A'] < positions['D']
> True

Isn't this bound to be less efficient? I mean, inserting the items into
a dict is probably O(n log n), which is definitely worse than O(n) for
searching the list twice. And, of course, it would yield different
results if 'A' or 'D' are in the list more than once.

> If you care about memory in addition to speed, you could do something like:
>
>  >>> L = ['C', 'A', 'D', 'B']
>  >>> items_to_compare = ['A', 'D']
>  >>> positions = dict(
> ... (item, i)
> ... for i, item in enumerate(L)
> ... if item in items_to_compare
> ... )
>  >>> positions
> {'A': 1, 'D': 2}
>  >>> positions['A'] < positions['D']
> True

Did you measure that? Is this really faster than using index twice, for
big lists? The number of comparisons (when dealing with strings, that's
probably what'll take the time) should be about twice as high as for
the index-version of the OP (assuming the items only exactly once in
the list).

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: best way to determine sequence ordering?

2006-04-28 Thread I V

On Fri, 28 Apr 2006 14:27:00 -0700, nikie wrote:
> Steven Bethard wrote:
> 
>>  >>> L = ['C', 'A', 'D', 'B']
>>  >>> positions = dict((item, i) for i, item in enumerate(L))

Do you need the generator expression here? dict(enumerate(L)) should be
equivalent, no?

> Isn't this bound to be less efficient? I mean, inserting the items into
> a dict is probably O(n log n), which is definitely worse than O(n) for
> searching the list twice. And, of course, it would yield different
> results if 'A' or 'D' are in the list more than once.

Although presumably the dict method might be quicker if you were comparing
the positions a large number of times.

Incidentally, does python have a built-in to do a binary search on a
sorted list? Obviously it's not too tricky to write one, but it would be
nice if there was one implemented in C.
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: best way to determine sequence ordering?

2006-04-28 Thread nikie

I V wrote:

> On Fri, 28 Apr 2006 14:27:00 -0700, nikie wrote:
> > Steven Bethard wrote:
> >
> >>  >>> L = ['C', 'A', 'D', 'B']
> >>  >>> positions = dict((item, i) for i, item in enumerate(L))
>
> Do you need the generator expression here? dict(enumerate(L)) should be
> equivalent, no?

I think the generator is needed to swap the item and the index.
dict(enumerate(L)) would yield a dict like {0:'C', 1:'A'...}

> > Isn't this bound to be less efficient? I mean, inserting the items into
> > a dict is probably O(n log n), which is definitely worse than O(n) for
> > searching the list twice. And, of course, it would yield different
> > results if 'A' or 'D' are in the list more than once.
>
> Although presumably the dict method might be quicker if you were comparing
> the positions a large number of times.

Only if you build the dict once, but called index each and every time,
which is comparing apples with oranges...

> Incidentally, does python have a built-in to do a binary search on a
> sorted list? Obviously it's not too tricky to write one, but it would be
> nice if there was one implemented in C.

I once read in an algorithm book that it took 10 years from the first
binary search publication until a correct version was published, so, it
actually is a bit tricky... Better stick to the bisect module. Don't
know if it's written in C, though.

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: best way to determine sequence ordering?

2006-04-28 Thread Steven Bethard

nikie wrote:
> Steven Bethard wrote:
> 
>> John Salerno wrote:
>>> If I want to make a list of four items, e.g. L = ['C', 'A', 'D', 'B'],
>>> and then figure out if a certain element precedes another element, what
>>> would be the best way to do that?
>>>
>>> Looking at the built-in list functions, I thought I could do something
>>> like:
>>>
>>> if L.index('A') < L.index('D'):
>>> # do some stuff
>> This is probably a pretty reasonable approach as long as you're not too
>> worried about efficiency.  It scans the list twice though, so if you
>> start doing this with really big lists, it might be better to do
>> something like:
> 
> On average, it'll only have go through half the list twice, as it can
> break as soon as it finds the item it searches for. Don't think you can
> get any more efficient than that.

Sure you can:

 a_index = None
 d_index = None
 for i, item in enumerate(L):
 if item == 'A':
 a_index = i
 elif item == 'D':
 d_index = i
 if a_index is not None and d_index is not None:
 break

 if a_index < d_index:
 # do some stuff

That goes through exactly the minimum number of elements.  But it's 
probably slower than two .index() calls since .index() is coded in C. 
But timeit and see if you're interested.

> 
>>  >>> L = ['C', 'A', 'D', 'B']
>>  >>> positions = dict((item, i) for i, item in enumerate(L))
>>  >>> positions
>> {'A': 1, 'C': 0, 'B': 3, 'D': 2}
>>  >>> positions['A'] < positions['D']
>> True
> 
> Isn't this bound to be less efficient? I mean, inserting the items into
> a dict is probably O(n log n)

No, it's O(N).  Dicts are just hash tables.  Insertion is O(1).  So N 
insertions is O(N).

This route is also dramatically more efficient if you need to make M 
comparisons between items instead of just 1 comparison.  In that case, 
using the dict is O(N + M) instead of the O(N * M) of the .index() route.

STeVe
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: best way to determine sequence ordering?

2006-04-28 Thread I V

On Fri, 28 Apr 2006 16:39:33 -0700, nikie wrote:
> I V wrote:
>> Do you need the generator expression here? dict(enumerate(L)) should be
>> equivalent, no?
> 
> I think the generator is needed to swap the item and the index.
> dict(enumerate(L)) would yield a dict like {0:'C', 1:'A'...}

Oh, of course, I wasn't paying attention.

>> Incidentally, does python have a built-in to do a binary search on a
>> sorted list? Obviously it's not too tricky to write one, but it would
>> be nice if there was one implemented in C.
> 
> I once read in an algorithm book that it took 10 years from the first
> binary search publication until a correct version was published, so, it
> actually is a bit tricky... Better stick to the bisect module. Don't
> know if it's written in C, though.

Thanks for pointing out the bisect module - that's exactly what I was
looking for.
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: best way to determine sequence ordering?

2006-04-28 Thread Kent Johnson

I V wrote:
> Incidentally, does python have a built-in to do a binary search on a
> sorted list? Obviously it's not too tricky to write one, but it would be
> nice if there was one implemented in C.

See the bisect module.

Kent
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: best way to determine sequence ordering?

2006-04-29 Thread nikie

Steven Bethard wrote:

> nikie wrote:
> > Steven Bethard wrote:
> >
> >> John Salerno wrote:
> >>> If I want to make a list of four items, e.g. L = ['C', 'A', 'D', 'B'],
> >>> and then figure out if a certain element precedes another element, what
> >>> would be the best way to do that?
> >>>
> >>> Looking at the built-in list functions, I thought I could do something
> >>> like:
> >>>
> >>> if L.index('A') < L.index('D'):
> >>> # do some stuff
> >> This is probably a pretty reasonable approach as long as you're not too
> >> worried about efficiency.  It scans the list twice though, so if you
> >> start doing this with really big lists, it might be better to do
> >> something like:
> >
> > On average, it'll only have go through half the list twice, as it can
> > break as soon as it finds the item it searches for. Don't think you can
> > get any more efficient than that.
>
> Sure you can:
>
>  a_index = None
>  d_index = None
>  for i, item in enumerate(L):
>  if item == 'A':
>  a_index = i
>  elif item == 'D':
>  d_index = i
>  if a_index is not None and d_index is not None:
>  break
>
>  if a_index < d_index:
>  # do some stuff
>
> That goes through exactly the minimum number of elements.  But it's
> probably slower than two .index() calls since .index() is coded in C.
> But timeit and see if you're interested.

On my PC, calling index twice is 5 times faster.

But let's just pretend for a moment Python code was compiled and
optimized as well as C code. Then, this function would still not be
faster than calling index twice. You seem to be under the impression
that "looping through a list" takes a lot of time and "comparing items"
takes none. The reverse is the case: Looping through a list means the
processor has to do something like "increment an index" once for every
item in the list (for optimized C code! Looping through a generator
like the one returned by enumerate in interpreted code is a lot more
complex). Comparing two items on the other hand involves (as lists
aren't statically typed) looking up a cmp-method dynamically, calling
it dynamically, and, of course, a string comparison, which again
involves two pointer lookups for every character that has to be
compared. So, if you want to know how fast your function will be,
you'll have to count the number of comparisons. (Of course, we're
ignoring the fact that smaller functions tend to run quicker due to
cache-, branch-prediction and optimization-effects) If you call your
function with the list ['C', 'A', 'D', 'B'], it will compare the first
item 'C' to 'A' and 'D', than the second, 'A' to 'A' and the third 'D'
to 'A' and 'D', so it'll have to do 5 comparisons, correct? If you call
index to find the first occurence of the item 'A' in the same list, it
will have to do 2 comparisons (it can break as soon as it finds the
iten), and 3 comparisons are needed for finding the item 'D', so it'll
do 5 comparisons, too. Plus, you have a small overhead for comparing
"a_index" and "d_index" to none (which is faster than a
sting-comparison, but still takes time, probably more time than
incrementing an index for looping through a list). Things get even
worse if 'A' and 'D' aren't "neighbors" in the list, because than all
the items bewteen 'A' and 'D' will have to be compared to 'A' and 'D'
in the version above, but only to 'D' if you call index twice. So, the
function above isn't only less readable, it's also slower on the
average case.

You might however be able to tweak the performance of the index-version
a little bit if you call it only on small chunks of the array at a
time, using the index()-versions that take a start- and stop index, so
the whole list only has to be moved from the memory to the cache once.
But I'm not sure the performance is memory-bound at all (always hard to
tell in an interpreted language).

> >
> >>  >>> L = ['C', 'A', 'D', 'B']
> >>  >>> positions = dict((item, i) for i, item in enumerate(L))
> >>  >>> positions
> >> {'A': 1, 'C': 0, 'B': 3, 'D': 2}
> >>  >>> positions['A'] < positions['D']
> >> True
> >
> > Isn't this bound to be less efficient? I mean, inserting the items into
> > a dict is probably O(n log n)
>
> No, it's O(N).  Dicts are just hash tables.  Insertion is O(1).  So N
> insertions is O(N).

I've measured that with the timeit module. The time it takes to build a
dict with N entries doesn't seem to be proportional to N, (t-c)/n keeps
increasing. Try this:

import timeit, math

def time(N):
return timeit.Timer("dict([('%%010i'%%i,i) for i in range(%i)])" %
N).timeit(number=5)

c = time(1000)*2-time(2000)

for i in range(1000,100,1000):
t = time(i)
print "%5i -> %f (t/n = %f)" % (i,t, (t-c)/i*1000)

> This route is also dramatically more efficient if you need to make M
> comparisons between items instead of just 1 comparison.  In that case,
> using the dict is O(N + M) instead of the O(N * M) of the .index() route.

Assu

Re: best way to determine sequence ordering?

2006-04-29 Thread Steven Bethard

nikie wrote:
> Steven Bethard wrote:
> 
>> nikie wrote:
>>> Steven Bethard wrote:
>>>
 John Salerno wrote:
> If I want to make a list of four items, e.g. L = ['C', 'A', 'D', 'B'],
> and then figure out if a certain element precedes another element, what
> would be the best way to do that?
>
> Looking at the built-in list functions, I thought I could do something
> like:
>
> if L.index('A') < L.index('D'):
> # do some stuff
 This is probably a pretty reasonable approach as long as you're not too
 worried about efficiency.  It scans the list twice though, so if you
 start doing this with really big lists, it might be better to do
 something like:
>>> On average, it'll only have go through half the list twice, as it can
>>> break as soon as it finds the item it searches for. Don't think you can
>>> get any more efficient than that.
>> Sure you can:
>>
>>  a_index = None
>>  d_index = None
>>  for i, item in enumerate(L):
>>  if item == 'A':
>>  a_index = i
>>  elif item == 'D':
>>  d_index = i
>>  if a_index is not None and d_index is not None:
>>  break
>>
>>  if a_index < d_index:
>>  # do some stuff
>>
>> That goes through exactly the minimum number of elements.  But it's
>> probably slower than two .index() calls since .index() is coded in C.
>> But timeit and see if you're interested.
> 
> On my PC, calling index twice is 5 times faster.
> 
> But let's just pretend for a moment Python code was compiled and
> optimized as well as C code.

Ok, lets get comparable functions by writing them both in Python:

- temp.py -
def index(L, item):
 for i, x in enumerate(L):
 if x == item:
 return i
 return -1


def a_d_index(L):
 a_index = None
 d_index = None
 for i, item in enumerate(L):
 if item == 'A':
 a_index = i
 elif item == 'D':
 d_index = i
 if a_index is not None and d_index is not None:
 break
 return a_index, d_index
---

$ python -m timeit -s "import temp; L = ['C', 'A', 'D', 'B']" "a = 
temp.index(L, 'A'); d = temp.index(L, 'D'); a < d"
10 loops, best of 3: 5.73 usec per loop

$ python -m timeit -s "import temp; L = ['C', 'A', 'D', 'B']" "a, d = 
temp.a_d_index(L); a < d"
10 loops, best of 3: 3.75 usec per loop

The a_d_index() function is definitely faster.

I understand your point about comparison time, but in this case we're 
just comparing one element strings, so it's not so horrible.  Sure, if 
you used strings that are more complicated to compare (or worse yet, you 
used your own custom objects with __cmp__ methods) you could provoke the 
kind of behavior you're looking for.

But in this particular case, there really is some substantial overhead 
to Python's iterator protocol, and you can't just ignore that.

Of course the moral of the story (as is the moral of all such threads) 
is that if you're really interested in speeding things up you need to 
start timing things.

STeVe
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: best way to determine sequence ordering?

2006-04-29 Thread Justin Azoff

John Salerno wrote:
> If I want to make a list of four items, e.g. L = ['C', 'A', 'D', 'B'],
> and then figure out if a certain element precedes another element, what
> would be the best way to do that?
>
> Looking at the built-in list functions, I thought I could do something like:
>
> if L.index('A') < L.index('D'):
>  # do some stuff

This actually performs pretty well since list.index is implemented in
C.

The obvious (to me) implementation of:
def before(lst, a, b):
for x in lst:
if x == a:
return True
if x == b:
return False

runs about 10-50 times faster than the double index method if I use
psyco.  Without psyco, it ends up being faster for the cases where a or
b appears early on in the list, and the other appears towards the end.

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: best way to determine sequence ordering?

2006-04-29 Thread nikie

Steven Bethard wrote:

> nikie wrote:
> > Steven Bethard wrote:
> >
> >> nikie wrote:
> >>> Steven Bethard wrote:
> >>>
>  John Salerno wrote:
> > If I want to make a list of four items, e.g. L = ['C', 'A', 'D', 'B'],
> > and then figure out if a certain element precedes another element, what
> > would be the best way to do that?
> >
> > Looking at the built-in list functions, I thought I could do something
> > like:
> >
> > if L.index('A') < L.index('D'):
> > # do some stuff
>  This is probably a pretty reasonable approach as long as you're not too
>  worried about efficiency.  It scans the list twice though, so if you
>  start doing this with really big lists, it might be better to do
>  something like:
> >>> On average, it'll only have go through half the list twice, as it can
> >>> break as soon as it finds the item it searches for. Don't think you can
> >>> get any more efficient than that.
> >> Sure you can:
> >>
> >>  a_index = None
> >>  d_index = None
> >>  for i, item in enumerate(L):
> >>  if item == 'A':
> >>  a_index = i
> >>  elif item == 'D':
> >>  d_index = i
> >>  if a_index is not None and d_index is not None:
> >>  break
> >>
> >>  if a_index < d_index:
> >>  # do some stuff
> >>
> >> That goes through exactly the minimum number of elements.  But it's
> >> probably slower than two .index() calls since .index() is coded in C.
> >> But timeit and see if you're interested.
> >
> > On my PC, calling index twice is 5 times faster.
> >
> > But let's just pretend for a moment Python code was compiled and
> > optimized as well as C code.
>
> Ok, lets get comparable functions by writing them both in Python:

That's changing the subject, and you know that. The OP asked whether
using "L.index('A') < L.index('D')" was a good (pythonic, efficient)
idea. It is. Maybe it wouldn't be efficient if "List.index" was
implemented in python (because, as I have already said in my previous
post, looping through an enumerate-object in python is more complex
than a simple C-loop), but it is actually implemented in C. Even if you
wrote a C function that tried to do all the comparisons in one sweep
through the list, I'm willing to bet it won't be faster than the OP's
suggestion on the average (at least for moderate-sized lists,
otherwise, processing the list in blocks, using the cache more
efficiently might be a good idea).

That's what this thread was all about. Now, I don't really see what you
are trying to say: Are you still trying to convince the OP that he
should write a Python function like one of those you suggested, for
performance reasons? If not, I must have misunderstood your last posts,
and apologize for repeating the obvious ;-)

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: best way to determine sequence ordering?

2006-04-29 Thread Steven Bethard

nikie wrote:
> That's what this thread was all about. Now, I don't really see what you
> are trying to say: Are you still trying to convince the OP that he
> should write a Python function like one of those you suggested, for
> performance reasons?

Sure, if it really matters.  Code it in C, and you can do better than 
.index().  But I don't think it really matters. ;-)

STeVe
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: best way to determine sequence ordering?

2006-04-29 Thread Steven Bethard

Steven Bethard wrote:
> John Salerno wrote:
>> If I want to make a list of four items, e.g. L = ['C', 'A', 'D', 'B'], 
>> and then figure out if a certain element precedes another element, 
>> what would be the best way to do that?
>>
>> Looking at the built-in list functions, I thought I could do something 
>> like:
>>
>> if L.index('A') < L.index('D'):
>> # do some stuff
> 
> This is probably a pretty reasonable approach

Just to clarify, because there seems to be some confusion.  I would 
absolutely go with this approach unless you have profiled and found it 
to be a bottleneck.  It's clear, concise, and quite fast.

STeVe
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: best way to determine sequence ordering?

2006-04-29 Thread Edward Elliott

Steven Bethard wrote:
> Ok, lets get comparable functions by writing them both in Python:

First of all, your functions aren't quite comparable.  The first index takes
the value to locate as a variable, while the second has both values
hard-coded as literals.  Changing the second one to index2(L, a, b) makes a
slight but not significant difference in the timings.

Secondly, the lists you tested with are artificially short.  As you increase
the size of the list, the difference lessens to unimportance:

$ python -m timeit -s "import temp; L = ['x'] * 500 + ['C', 'A', 'D', 'B']"
"a, d = temp.index2(L,'A','D'); a < d"
1000 loops, best of 3: 227 usec per loop

$ python -m timeit -s "import temp; L = ['x'] * 500 + ['C', 'A', 'D', 'B']"
"a = temp.index1(L, 'A'); d = temp.index1(L, 'D'); a < d"
1000 loops, best of 3: 236 usec per loop

Third, the target values are very close together in those lists.  If there's
a large difference in their positions, then the "two-in-one-pass" algorithm
becomes much slower:

$ python -m timeit -s "import temp; L = ['C','A'] + ['x'] * 500 + ['D',
'B']" "a = temp.index1(L, 'A'); d = temp.index1(L, 'D'); a < d"
1 loops, best of 3: 120 usec per loop

$ python -m timeit -s "import temp; L = ['C','A'] + ['x'] * 500 + ['D',
'B']" "a, d = temp.index2(L,'A','D'); a < d"
1000 loops, best of 3: 267 usec per loop

Remember kids:
1. Numbers can show anything
2. Know your data set
3. Premature optimizations are evil


-- 
http://mail.python.org/mailman/listinfo/python-list

Re: best way to determine sequence ordering?

2006-04-29 Thread Steven Bethard

Edward Elliott wrote:
> Remember kids:
> 1. Numbers can show anything
> 2. Know your data set
> 3. Premature optimizations are evil

Amen. =)

STeVe
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: best way to determine sequence ordering?

2006-04-29 Thread Steven Bethard

John Salerno wrote:
> If I want to make a list of four items, e.g. L = ['C', 'A', 'D', 'B'], 
> and then figure out if a certain element precedes another element, what 
> would be the best way to do that?
> 
> Looking at the built-in list functions, I thought I could do something 
> like:
> 
> if L.index('A') < L.index('D'):
> # do some stuff

Ok, since, as happens some times, the discussion here has gotten 
extremely sidetracked from the original question, as a public service ;) 
I thought I'd give a brief summary of the options discussed and when 
they might be useful:

* The original::

   if L.index('A') < L.index('D'):
   # do some stuff

   If you're doing exactly one comparison, this is probably your best 
bet in most cases as .index() is coded in C.  It's clear, concise and 
correct.

* Justin Azoff's before()::

   def before(lst, a, b):
   for x in lst:
   if x == a:
   return True
   if x == b:
   return False

   if before(L, 'A', 'D'):
  # do some stuff

   You might gain a bit from this version if one of the elements you're 
looking for appears early in the list.  It only iterates through the 
list once, but performs two comparisons each time, so you'll only gain 
from it if you comparisons aren't too costly (e.g. you're comparing 
single character strings).

* building a dict of indicies::

   positions = dict((item, i) for i, item in enumerate(L))

   if positions['A'] < positions['D']:
   # do some stuff

   You'll only get a gain from this version if you need to do several 
comparisons instead of just one.

And most importantly, as Edward Elliot put it:

 Remember kids:
 1. Numbers can show anything
 2. Know your data set
 3. Premature optimizations are evil

HTH,

STeVe
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: best way to determine sequence ordering?

2006-04-30 Thread Kay Schluehr

> * building a dict of indicies::
>
>positions = dict((item, i) for i, item in enumerate(L))
>
>if positions['A'] < positions['D']:
># do some stuff
>
>You'll only get a gain from this version if you need to do several
> comparisons instead of just one.

Hi Steven,

your solution may not create the correct answer if an item occurs twice
in the list because the later occurrence overwrites the former during
dict creation:

>>> L = ['C', 'A', 'D', 'B', 'A']
>>> dict((item, i) for i, item in enumerate(L))
{'A': 4, 'C': 0, 'B': 3, 'D': 2}

This gives the impression that 'D' always precedes 'A' which is wrong.

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: best way to determine sequence ordering?

2006-04-30 Thread Steven Bethard

Kay Schluehr wrote:
>> * building a dict of indicies::
>>
>>positions = dict((item, i) for i, item in enumerate(L))
>>
>>if positions['A'] < positions['D']:
>># do some stuff
>>
>>You'll only get a gain from this version if you need to do several
>> comparisons instead of just one.
> 
> Hi Steven,
> 
> your solution may not create the correct answer if an item occurs twice
> in the list because the later occurrence overwrites the former during
> dict creation:
> 
 L = ['C', 'A', 'D', 'B', 'A']
 dict((item, i) for i, item in enumerate(L))
> {'A': 4, 'C': 0, 'B': 3, 'D': 2}
> 
> This gives the impression that 'D' always precedes 'A' which is wrong.

Yeah, thanks for the update.  I meant to include that too.

STeVe
-- 
http://mail.python.org/mailman/listinfo/python-list

best way to determine sequence ordering?

Re: best way to determine sequence ordering?

Re: best way to determine sequence ordering?

Re: best way to determine sequence ordering?

Re: best way to determine sequence ordering?

Re: best way to determine sequence ordering?

Re: best way to determine sequence ordering?

Re: best way to determine sequence ordering?

Re: best way to determine sequence ordering?

Re: best way to determine sequence ordering?

Re: best way to determine sequence ordering?

Re: best way to determine sequence ordering?

Re: best way to determine sequence ordering?

Re: best way to determine sequence ordering?

Re: best way to determine sequence ordering?

Re: best way to determine sequence ordering?

Re: best way to determine sequence ordering?

Re: best way to determine sequence ordering?

Re: best way to determine sequence ordering?

Re: best way to determine sequence ordering?

Re: best way to determine sequence ordering?

Re: best way to determine sequence ordering?

22 matches

Site Navigation

Mail list logo

Footer information