Re: queue versus list

2020-03-20 Thread duncan smith
On 20/03/2020 21:57, Barry wrote:
> 
> 
>> On 20 Mar 2020, at 00:42, duncan smith  wrote:
>>
>> Bingo. Performance is indistinguishable from that of the list. Thread
>> safety is unimportant (for my purposes), but the docs at
>> https://docs.python.org/2/library/collections.html#collections.deque
>> state "Deques support thread-safe, memory efficient appends and pops
>> from either side of the deque with approximately the same O(1)
> 
> I did a performance test of list vs sequel as a fifo that had 10k elements in 
> them.
> Then I timed insert elements at the tail and popping from the head.
> 
> Duque is 10x faster then list, rest run macOS python 3.8
> 
> Barry
> 
> 

Yes, I avoided the cost of popping elements from the head by just
iterating over the list. So the timings came out very similar. But the
deque allows me to remove those elements efficiently. It's also neat
because it lets me switch from what is basically a breadth first search
of a graph to a depth first search by simply changing popleft() to
popright(). Cheers.

Duncan
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: queue versus list

2020-03-20 Thread Cameron Simpson

On 20Mar2020 06:50, Dan Stromberg  wrote:

Actually, I think a deque is a doubly linked list of python lists.


On 20Mar2020 09:45, Antoon Pardon  wrote:
This doesn't seem correct. A deque is used to simulate a stack or a 
queue. It doesn't use heappush or heappop.


You are both correct; brain fade on my part. I do not know why I 
conflated a deque with a heap.


Apologies for the confusion.

Cheers,
Cameron Simpson 
--
https://mail.python.org/mailman/listinfo/python-list


Re: queue versus list

2020-03-20 Thread Barry



> On 20 Mar 2020, at 00:42, duncan smith  wrote:
> 
> Bingo. Performance is indistinguishable from that of the list. Thread
> safety is unimportant (for my purposes), but the docs at
> https://docs.python.org/2/library/collections.html#collections.deque
> state "Deques support thread-safe, memory efficient appends and pops
> from either side of the deque with approximately the same O(1)

I did a performance test of list vs sequel as a fifo that had 10k elements in 
them.
Then I timed insert elements at the tail and popping from the head.

Duque is 10x faster then list, rest run macOS python 3.8

Barry


-- 
https://mail.python.org/mailman/listinfo/python-list


Re: queue versus list

2020-03-20 Thread Dan Sommers
On Fri, 20 Mar 2020 06:50:29 -0700
Dan Stromberg  wrote:

> On Thu, Mar 19, 2020 at 6:11 PM Cameron Simpson  wrote:
> 
> > >> 4. If it doesn't need to be thread-safe, why not try deques instead?
> > >
> > >Bingo. Performance is indistinguishable from that of the list.
> >
> > A deque is implement using a list.
> >
> 
> Actually, I think a deque is a doubly linked list of python lists.

According to the source¹:

Data for deque objects is stored in a doubly-linked list of fixed
length blocks.

[...]

Textbook implementations of doubly-linked lists store one datum per
link, but that gives them a 200% memory overhead (a prev and next
link for each datum) and it costs one malloc() call per data
element.  By using fixed-length blocks, the link to data ratio is
significantly improved and there are proportionally fewer calls to
malloc() and free().  The data blocks of consecutive pointers also
improve cache locality.

¹ 
https://github.com/python/cpython/blob/3.8/Modules/_collectionsmodule.c#L19-L78

Dan

-- 
“Atoms are not things.” – Werner Heisenberg
Dan Sommers, http://www.tombstonezero.net/dan
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: queue versus list

2020-03-20 Thread Dan Stromberg
On Thu, Mar 19, 2020 at 6:11 PM Cameron Simpson  wrote:

> >> 4. If it doesn't need to be thread-safe, why not try deques instead?
> >
> >Bingo. Performance is indistinguishable from that of the list.
>
> A deque is implement using a list.
>

Actually, I think a deque is a doubly linked list of python lists.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: queue versus list

2020-03-20 Thread Antoon Pardon



Op 20/03/20 om 02:10 schreef Cameron Simpson:

On 20Mar2020 00:37, duncan smith  wrote:


Thread
safety is unimportant (for my purposes), but the docs at
https://docs.python.org/2/library/collections.html#collections.deque
state "Deques support thread-safe, memory efficient appends and pops
from either side of the deque with approximately the same O(1)
performance in either direction". Looking at the output from the
profiler for my implementation with the queue it looks like a deque is
being used but there's various other stuff happening relating to thread
safety. Is the lesson from this that if all you're doing is appending
and popping, then use a deque (rather than a queue) even if you want
thread safety? (It makes quite a difference in performance for my use
case.) Cheers.


A deque will change the ordering of the elements you process. Your 
list loop behaves in FIFO order, as does the queue. A dequeue (you 
you're doing a heappush and a heappop) will issue you the lowest 
current element on each iteration. If you merely need to process all 
the elements you're good. If the order matters you need to consider that.


A deque, being thread safe, will be using a lock. There will be a 
(quite small) overhead for that.



This doesn't seem correct. A deque is used to simulate a stack or a 
queue. It doesn't use heappush or heappop.


Antoon.

--
https://mail.python.org/mailman/listinfo/python-list


Re: queue versus list

2020-03-19 Thread Cameron Simpson

On 20Mar2020 00:37, duncan smith  wrote:

On 19/03/2020 20:40, MRAB wrote:

On 2020-03-19 20:08, duncan smith wrote:

   I have generator code along the following lines,

Q = queue.Queue()
Q.put(x)
while not Q.empty():
 x = Q.get()
 if :
 yield x
 else:
   Q.put()

If I change it to,

Q = []
Q.append(x)
for x in Q:
 if :
 yield x
 else:
   Q.append()

then it runs approximately 3 times more quickly.


A list is a very simple data structure, and therefore fast (for things 
that are inherently fast, like append; insert is expensive).


A queue takes a lock (cheap but an overhead) and might be expensive for 
get or put (this dependings on the data structure storing the queue 
items). A circular list (essentially a list using a base offset) can be 
fast until it needs resizing, a doubly linked list of nodes is always 
O(1) but is more expensive to add/remove (allocate node, adjust 
pointers); makye 3 times as slow :-) I haven't looked to see how Queue 
stores its items.



I seem to remember
(from somewhere) that the language specification doesn't guarantee that
the latter will continue to work, but that it's unlikely that future
implementations will break it.

[...snip...]
[MRAB] 1. I'm not sure it's a good idea to mutate a list while 
iterating over it.


AFAICR that's the bit that depends on the implementation of lists. As
long as iteration involves incrementing an index and indexing into the
list it will work.


If you're the only person using the list, as you are only _appending_ to 
it then the for loop is probably safe.


If you were inserting elements into the list there would be trouble (an 
iterator basicly needs keep an index as you suggest below, and an insert 
at or before the current index invalidates that, causing the iterator to 
issue the same element again (for example).



Apart from that, is there any good reason
why I shouldn't use the list? Is there another approach that might avoid
the overhead of using a queue? Results from profiling the two
implementations below in case it makes anything clearer. TIA.


Using a list seems fine for me if your code is as simple in reality as 
listed above.


2. The example with the list retains all of the results, whereas the 
one with the queue consumes them.


3. Queues are thread-safe because they're used for communication between
threads, but lists aren't thread-safe; that might be something to do
with the difference.

4. If it doesn't need to be thread-safe, why not try deques instead?


Bingo. Performance is indistinguishable from that of the list.


A deque is implement using a list.


Thread
safety is unimportant (for my purposes), but the docs at
https://docs.python.org/2/library/collections.html#collections.deque
state "Deques support thread-safe, memory efficient appends and pops
from either side of the deque with approximately the same O(1)
performance in either direction". Looking at the output from the
profiler for my implementation with the queue it looks like a deque is
being used but there's various other stuff happening relating to thread
safety. Is the lesson from this that if all you're doing is appending
and popping, then use a deque (rather than a queue) even if you want
thread safety? (It makes quite a difference in performance for my use
case.) Cheers.


A deque will change the ordering of the elements you process. Your list 
loop behaves in FIFO order, as does the queue. A dequeue (you you're 
doing a heappush and a heappop) will issue you the lowest current 
element on each iteration. If you merely need to process all the 
elements you're good. If the order matters you need to consider that.


A deque, being thread safe, will be using a lock. There will be a (quite 
small) overhead for that.


Cheers,
Cameron Simpson 
--
https://mail.python.org/mailman/listinfo/python-list


Re: queue versus list

2020-03-19 Thread duncan smith
On 19/03/2020 20:40, MRAB wrote:
> On 2020-03-19 20:08, duncan smith wrote:
>> Hello,
>>    I have generator code along the following lines,
>>
>>
>> Q = queue.Queue()
>> Q.put(x)
>> while not Q.empty():
>>  x = Q.get()
>>  if :
>>  yield x
>>  else:
>>    Q.put()
>>
>>
>> If I change it to,
>>
>>
>> Q = []
>> Q.append(x)
>> for x in Q:
>>  if :
>>  yield x
>>  else:
>>    Q.append()
>>
>>
>> then it runs approximately 3 times more quickly. I seem to remember
>> (from somewhere) that the language specification doesn't guarantee that
>> the latter will continue to work, but that it's unlikely that future
>> implementations will break it. Apart from that, is there any good reason
>> why I shouldn't use the list? Is there another approach that might avoid
>> the overhead of using a queue? Results from profiling the two
>> implementations below in case it makes anything clearer. TIA.
>>
> [snip]

Thanks MRAB and Paul,

> A number of points:
> 
> 1. I'm not sure it's a good idea to mutate a list while iterating over it.
> 

AFAICR that's the bit that depends on the implementation of lists. As
long as iteration involves incrementing an index and indexing into the
list it will work.

> 2. The example with the list retains all of the results, whereas the one
> with the queue consumes them.
> 
> 3. Queues are thread-safe because they're used for communication between
> threads, but lists aren't thread-safe; that might be something to do
> with the difference.
> 
> 4. If it doesn't need to be thread-safe, why not try deques instead?

Bingo. Performance is indistinguishable from that of the list. Thread
safety is unimportant (for my purposes), but the docs at
https://docs.python.org/2/library/collections.html#collections.deque
state "Deques support thread-safe, memory efficient appends and pops
from either side of the deque with approximately the same O(1)
performance in either direction". Looking at the output from the
profiler for my implementation with the queue it looks like a deque is
being used but there's various other stuff happening relating to thread
safety. Is the lesson from this that if all you're doing is appending
and popping, then use a deque (rather than a queue) even if you want
thread safety? (It makes quite a difference in performance for my use
case.) Cheers.

Duncan
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: queue versus list

2020-03-19 Thread MRAB

On 2020-03-19 20:08, duncan smith wrote:

Hello,
   I have generator code along the following lines,


Q = queue.Queue()
Q.put(x)
while not Q.empty():
 x = Q.get()
 if :
 yield x
 else:
   Q.put()


If I change it to,


Q = []
Q.append(x)
for x in Q:
 if :
 yield x
 else:
   Q.append()


then it runs approximately 3 times more quickly. I seem to remember
(from somewhere) that the language specification doesn't guarantee that
the latter will continue to work, but that it's unlikely that future
implementations will break it. Apart from that, is there any good reason
why I shouldn't use the list? Is there another approach that might avoid
the overhead of using a queue? Results from profiling the two
implementations below in case it makes anything clearer. TIA.


[snip]
A number of points:

1. I'm not sure it's a good idea to mutate a list while iterating over it.

2. The example with the list retains all of the results, whereas the one 
with the queue consumes them.


3. Queues are thread-safe because they're used for communication between 
threads, but lists aren't thread-safe; that might be something to do 
with the difference.


4. If it doesn't need to be thread-safe, why not try deques instead?
--
https://mail.python.org/mailman/listinfo/python-list


queue versus list

2020-03-19 Thread duncan smith
Hello,
  I have generator code along the following lines,


Q = queue.Queue()
Q.put(x)
while not Q.empty():
x = Q.get()
if :
yield x
else:
  Q.put()


If I change it to,


Q = []
Q.append(x)
for x in Q:
if :
yield x
else:
  Q.append()


then it runs approximately 3 times more quickly. I seem to remember
(from somewhere) that the language specification doesn't guarantee that
the latter will continue to work, but that it's unlikely that future
implementations will break it. Apart from that, is there any good reason
why I shouldn't use the list? Is there another approach that might avoid
the overhead of using a queue? Results from profiling the two
implementations below in case it makes anything clearer. TIA.

Duncan


 88752234 function calls in 68.603 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
10.0000.000   68.603   68.603 :1()
 1009   18.3430.018   68.6020.068
bit_trees.py:699(paired_tries_search)
  84259346.1160.000   10.3730.000 bit_trees.py:751(hamdist)
10.0000.0000.0000.000 bit_trees.py:809(cum_shape)
  23508675.9700.000   18.2000.000 queue.py:115(put)
  23508676.1140.000   16.6390.000 queue.py:147(get)
10.0000.0000.0000.000 queue.py:199(_init)
  47017351.9510.0002.6860.000 queue.py:202(_qsize)
  23508671.1490.0001.5120.000 queue.py:206(_put)
  23508671.0420.0001.4460.000 queue.py:210(_get)
10.0000.0000.0000.000 queue.py:27(__init__)
  23508682.9910.0004.3970.000 queue.py:90(empty)
30.0000.0000.0000.000 threading.py:215(__init__)
  47017342.6610.0003.7420.000 threading.py:239(__enter__)
  47017342.0970.0002.7190.000 threading.py:242(__exit__)
  47017342.4830.0003.8720.000 threading.py:254(_is_owned)
  47017348.1850.000   12.0570.000 threading.py:334(notify)
10.0000.0000.0000.000 {built-in method
_thread.allocate_lock}
  84259341.8740.0001.8740.000 {built-in method builtins.bin}
10.0000.000   68.603   68.603 {built-in method
builtins.exec}
  47017360.7350.0000.7350.000 {built-in method builtins.len}
  47017341.0800.0001.0800.000 {method '__enter__' of
'_thread.lock' objects}
  47017340.6220.0000.6220.000 {method '__exit__' of
'_thread.lock' objects}
  47017341.3890.0001.3890.000 {method 'acquire' of
'_thread.lock' objects}
  23508670.3630.0000.3630.000 {method 'append' of
'collections.deque' objects}
  84259342.3840.0002.3840.000 {method 'count' of 'str'
objects}
10.0000.0000.0000.000 {method 'disable' of
'_lsprof.Profiler' objects}
  47017340.6490.0000.6490.000 {method 'items' of 'dict'
objects}
  23508670.4040.0000.4040.000 {method 'popleft' of
'collections.deque' objects}



32331417 function calls in 26.992 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
10.1420.142   26.992   26.992 :1()
 1009   16.7410.017   26.8510.027
bit_trees.py:699(paired_tries_search)
  84259345.2700.0009.1750.000 bit_trees.py:755(hamdist)
10.0000.0000.0000.000 bit_trees.py:813(cum_shape)
  84259341.5760.0001.5760.000 {built-in method builtins.bin}
10.0000.000   26.992   26.992 {built-in method
builtins.exec}
10.0000.0000.0000.000 {built-in method builtins.len}
  23508670.3100.0000.3100.000 {method 'append' of 'list'
objects}
  84259342.3290.0002.3290.000 {method 'count' of 'str'
objects}
10.0000.0000.0000.000 {method 'disable' of
'_lsprof.Profiler' objects}
  47017340.6240.0000.6240.000 {method 'items' of 'dict'
objects}


-- 
https://mail.python.org/mailman/listinfo/python-list