On 20Mar2020 00:37, duncan smith <duncan@invalid.invalid> wrote:
On 19/03/2020 20:40, MRAB wrote:
On 2020-03-19 20:08, duncan smith wrote:
       I have generator code along the following lines,

Q = queue.Queue()
Q.put(x)
while not Q.empty():
     x = Q.get()
     if <some condition that depends on x>:
         yield x
     else:
       Q.put(<some value that depends on x>)

If I change it to,

Q = []
Q.append(x)
for x in Q:
     if <some condition that depends on x>:
         yield x
     else:
       Q.append(<some value that depends on x>)

then it runs approximately 3 times more quickly.

A list is a very simple data structure, and therefore fast (for things that are inherently fast, like append; insert is expensive).

A queue takes a lock (cheap but an overhead) and might be expensive for get or put (this dependings on the data structure storing the queue items). A circular list (essentially a list using a base offset) can be fast until it needs resizing, a doubly linked list of nodes is always O(1) but is more expensive to add/remove (allocate node, adjust pointers); makye 3 times as slow :-) I haven't looked to see how Queue stores its items.

I seem to remember
(from somewhere) that the language specification doesn't guarantee that
the latter will continue to work, but that it's unlikely that future
implementations will break it.
[...snip...]
[MRAB] 1. I'm not sure it's a good idea to mutate a list while iterating over it.

AFAICR that's the bit that depends on the implementation of lists. As
long as iteration involves incrementing an index and indexing into the
list it will work.

If you're the only person using the list, as you are only _appending_ to it then the for loop is probably safe.

If you were inserting elements into the list there would be trouble (an iterator basicly needs keep an index as you suggest below, and an insert at or before the current index invalidates that, causing the iterator to issue the same element again (for example).

Apart from that, is there any good reason
why I shouldn't use the list? Is there another approach that might avoid
the overhead of using a queue? Results from profiling the two
implementations below in case it makes anything clearer. TIA.

Using a list seems fine for me if your code is as simple in reality as listed above.

2. The example with the list retains all of the results, whereas the one with the queue consumes them.

3. Queues are thread-safe because they're used for communication between
threads, but lists aren't thread-safe; that might be something to do
with the difference.

4. If it doesn't need to be thread-safe, why not try deques instead?

Bingo. Performance is indistinguishable from that of the list.

A deque is implement using a list.

Thread
safety is unimportant (for my purposes), but the docs at
https://docs.python.org/2/library/collections.html#collections.deque
state "Deques support thread-safe, memory efficient appends and pops
from either side of the deque with approximately the same O(1)
performance in either direction". Looking at the output from the
profiler for my implementation with the queue it looks like a deque is
being used but there's various other stuff happening relating to thread
safety. Is the lesson from this that if all you're doing is appending
and popping, then use a deque (rather than a queue) even if you want
thread safety? (It makes quite a difference in performance for my use
case.) Cheers.

A deque will change the ordering of the elements you process. Your list loop behaves in FIFO order, as does the queue. A dequeue (you you're doing a heappush and a heappop) will issue you the lowest current element on each iteration. If you merely need to process all the elements you're good. If the order matters you need to consider that.

A deque, being thread safe, will be using a lock. There will be a (quite small) overhead for that.

Cheers,
Cameron Simpson <c...@cskk.id.au>
--
https://mail.python.org/mailman/listinfo/python-list

Reply via email to