Re: [Tutor] List comprehensions to search a list--amazing!

2015-03-23 Thread Steven D'Aprano
On Mon, Mar 23, 2015 at 10:17:11PM -0400, Dave Angel wrote:
> On 03/23/2015 09:42 PM, boB Stepp wrote:

> >Can you give me a ballpark number for "large", where this would start
> >making a meaningful difference?
> >
> 
> Not really.  See Steve's response for some numbers.

o_O

Have you borrowed Guido's Time Machine? I hadn't even finished writing 
my post???



-- 
Steve
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] List comprehensions to search a list--amazing!

2015-03-23 Thread Dave Angel

On 03/23/2015 10:17 PM, Dave Angel wrote:

On 03/23/2015 09:42 PM, boB Stepp wrote:




Not really.  See Steve's


OOPS.  Peter's

> response for some numbers. If I had to guess,

I'd say that for lists over 100 items, you should use bisect or
equivalent.  But I'd also say you should have one algorithm in your
final code, even if it's sub-optimal for tiny lists.  If even a fraction
of the searches are going to be on a list of 10k items, you should
switch to a bisect approach.




--
DaveA
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] List comprehensions to search a list--amazing!

2015-03-23 Thread Steven D'Aprano
On Mon, Mar 23, 2015 at 08:42:23PM -0500, boB Stepp wrote:
> On Thu, Mar 19, 2015 at 12:10 AM, Dave Angel  wrote:
> > The catch to a list comprehension is it has to visit all the elements, while
> > a binary search would visit log-base-2 of them.  So instead of 1
> > elements, you'd be searching about 14 items.
> 
> I suspected as much, but had not verified this. Nonetheless, this may
> prove sufficiently fast. I will have to test this with my final data
> files. Right now I am using test cases, while I continue to design,
> check, rewrite, etc.
> 
> > For large lists, it'd probably be much quicker to use the bisect module.
> > https://docs.python.org/3.4/library/bisect.html
> 
> Can you give me a ballpark number for "large", where this would start
> making a meaningful difference?

Tell us what you consider a meaningful difference :-)

What counts as "too slow" will depend on:

- what you are doing
- how often you are doing it
- what else you're doing at the same time
- what hardware you have to run it on
- whether you are running the program only once, or repeatedly

etc. E.g. taking 30 seconds to iterate over a billion items in a list is 
insignificant if this is part of a bigger program which takes twenty 
minutes to run, but critical if you are hoping to run the program 
thirty times a second, hundreds of times a day.

But I've never heard anyone complain that a program was too fast :-)

(Actually, that's not quite true. Sometimes if the user is expecting 
a program function to take a while, say "Rebuild database", and it 
actually runs near instantaneously, it is indeed *too fast* because it 
can give the user the idea that the rebuild function isn't working. But 
that's a UI issue, not a speed issue.)

But if you twist my arm, and force me to pluck some round numbers from 
thin air, I would guess:

- for under ten items, a linear search will be insignificantly faster;

- for under a hundred items, there's no meaningful difference;

- for under a million items, binary search will be typically better but 
a linear search still acceptable ("everything is fast for small N");

- for over a million items, linear search will typically be 
unacceptably slow.

For certain definitions of what's acceptable or not :-)



-- 
Steve
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] List comprehensions to search a list--amazing!

2015-03-23 Thread Dave Angel

On 03/23/2015 09:42 PM, boB Stepp wrote:

On Thu, Mar 19, 2015 at 12:10 AM, Dave Angel  wrote:

The catch to a list comprehension is it has to visit all the elements, while
a binary search would visit log-base-2 of them.  So instead of 1
elements, you'd be searching about 14 items.


I suspected as much, but had not verified this. Nonetheless, this may
prove sufficiently fast. I will have to test this with my final data
files. Right now I am using test cases, while I continue to design,
check, rewrite, etc.


For large lists, it'd probably be much quicker to use the bisect module.
https://docs.python.org/3.4/library/bisect.html


Can you give me a ballpark number for "large", where this would start
making a meaningful difference?



Not really.  See Steve's response for some numbers. If I had to guess, 
I'd say that for lists over 100 items, you should use bisect or 
equivalent.  But I'd also say you should have one algorithm in your 
final code, even if it's sub-optimal for tiny lists.  If even a fraction 
of the searches are going to be on a list of 10k items, you should 
switch to a bisect approach.


I'd have to measure it, same as anyone.  And because of the 
reverse-ordering problem, you have to weigh the advantages of using a 
standard library (which is unlikely to be buggy), versus making an 
edited version which works directly on reversed lists.


It also can matter how many times you're searching the same list.  If 
you're going to be many lookups, it's worth keeping a reversed copy. 
You can reverse as simply as   rlist = mylist[::-1]




--
DaveA
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] List comprehensions to search a list--amazing!

2015-03-23 Thread boB Stepp
On Thu, Mar 19, 2015 at 4:52 AM, Peter Otten <__pete...@web.de> wrote:
> Dave Angel wrote:

[...]
> By the way, if you were to use a plain old loop the expected speedup over
> the listcomp would be 2. You can break out of the loop when you have found
> the gap, after iterating over one half of the list on average.
>
> So for-loops are twice as amazing ;)

Actually, this was my first approach to solving the problem.

>> The catch to a list comprehension is it has to visit all the elements,
>> while a binary search would visit log-base-2 of them.  So instead of
>> 1 elements, you'd be searching about 14 items.
>>
>> For large lists, it'd probably be much quicker to use the bisect module.
>> https://docs.python.org/3.4/library/bisect.html
>>
>>
>> Check out bisect.bisect_left() and bisect.bisect_right()
>>
>> I don't see how to directly use those functions on a list which is
>> reverse-sorted, but the source is available.  On my install, it's
>> located at:
>>
>> /usr/lib/python3.4/bisect.py
>>
>
> To back Dave's suggestion with some empirical data here are two ways to make
> bisect() work with a descending list (though if possible I would recommend
> that you change your script to use ascending lists).

I could easily do this, though the data naturally presents itself as I
stated originally.

> $ cat reverse_bisect2.py
> import bisect
> import random
>
> def break_listcomp(L, Vt):
> S = [i for i, V in enumerate(L) if L[i] >= Vt >= L[i + 1]]
> return S[0]
>
> def break_bisect_reverse(L, Vt):
> L.reverse()
> result = bisect.bisect(L, Vt)
> L.reverse()
> return len(L) - result -1
>
> class Items:
> def __init__(self, items):
> self.items = items
> def __len__(self):
> return len(self.items)
> def __getitem__(self, index):
> return self.items[len(self.items) - 1 - index]
>
> def break_bisect_virt(L, Vt):
> return len(L) - 1 - bisect.bisect(Items(L), Vt)
>
> random.seed(42)
> N = 10**6
> data = [random.randrange(N) for i in range(10**5)]
> data = data + data
> data.sort(reverse=True)
> sample = random.sample(data, 10)
> $
>
> break_bisect_reverse() reverses the list before and after applying bisect.
> This is still O(N), but the actual work is done in C.
>
> break_bisect_virt() wraps the actual list in a class that translates
>
> items[0] to items[len(items)-1]
> items[1] to items[len(items)-2]

Thank you for taking time to write this. I may have questions later.

> and so on, thus providing a reversed view of the list without moving any
> values. This severely slows down access to a single value, but as bisect
> needs much fewer lookups than the listcomp the overall result is still a
> massive speedup. The actual timings:
>
> $ python3 -m timeit -s 'from reverse_bisect2 import data, sample,
> break_listcomp as f' '[f(data, v) for v in sample]'
> 10 loops, best of 3: 781 msec per loop
>
> $ python3 -m timeit -s 'from reverse_bisect2 import data, sample,
> break_bisect_reverse as f' '[f(data, v) for v in sample]'
> 100 loops, best of 3: 15 msec per loop
>
> $ python3 -m timeit -s 'from reverse_bisect2 import data, sample,
> break_bisect_virt as f' '[f(data, v) for v in sample]'
> 1000 loops, best of 3: 221 usec per loop
>
> So reverse/bisect is 50 times faster than the listcomp, and
> bisect/virt is 3500 times faster than the listcomp.

You present a compelling case!

> I expect that a prepackaged linear interpolation function from numpy/scipy
> can still do better, and also handle the corner cases correctly. To use such
> a function you may have to reverse order of the values.

This is not an option for me as I would not be allowed to install numpy/scipy.



-- 
boB
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] List comprehensions to search a list--amazing!

2015-03-23 Thread boB Stepp
On Thu, Mar 19, 2015 at 12:10 AM, Dave Angel  wrote:
> The catch to a list comprehension is it has to visit all the elements, while
> a binary search would visit log-base-2 of them.  So instead of 1
> elements, you'd be searching about 14 items.

I suspected as much, but had not verified this. Nonetheless, this may
prove sufficiently fast. I will have to test this with my final data
files. Right now I am using test cases, while I continue to design,
check, rewrite, etc.

> For large lists, it'd probably be much quicker to use the bisect module.
> https://docs.python.org/3.4/library/bisect.html

Can you give me a ballpark number for "large", where this would start
making a meaningful difference?

> Check out bisect.bisect_left() and bisect.bisect_right()

It looks like this should work. Thanks! I will investigate.

> I don't see how to directly use those functions on a list which is
> reverse-sorted, but the source is available.  On my install, it's located
> at:
>
> /usr/lib/python3.4/bisect.py

And I see this is available on my oldest Python installlation, 2.4.4, too.

-- 
boB
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor