Kent Johnson wrote:
Peter Hansen wrote:
  
Qiangning Hong wrote:

    
A class Collector, it spawns several threads to read from serial port.
Collector.get_data() will get all the data they have read since last
call.  Who can tell me whether my implementation correct?
      
[snip sample with a list]

    
I am not very sure about the get_data() method.  Will it cause data lose
if there is a thread is appending data to self.data at the same time?
      
That will not work, and you will get data loss, as Jeremy points out.

Normally Python lists are safe, but your key problem (in this code) is 
that you are rebinding self.data to a new list!  If another thread calls 
on_received() just after the line "x = self.data" executes, then the new 
data will never be seen.
    

Can you explain why not? self.data is still bound to the same list as x. At least if the execution sequence is 
x = self.data
                    self.data.append(a_piece_of_data)
self.data = ""

ISTM it should work.

I'm not arguing in favor of the original code, I'm just trying to understand your specific failure mode.

Thanks,
Kent
  
Here's the original code:

class Collector(object):
    def __init__(self):
        self.data = ""
        spawn_work_bees(callback=self.on_received)

    def on_received(self, a_piece_of_data):
        """This callback is executed in work bee threads!"""
        self.data.append(a_piece_of_data)

    def get_data(self):
        x = self.data
        self.data = ""
        return x
The more I look at this, the more I'm not sure whether data loss will occur.  For me, that's good enough reason to rewrite this code.  I'd rather be clear and certain than clever anyday. 

So, let's say you a thread T1 which starts in ``get_data()`` and makes it as far as ``x = self.data``.  Then another thread T2 comes along in ``on_received()`` and gets as far as ``self.data.append(a_piece_of_data)``.  ``x`` in T1's get_data()`` (as you pointed out) is still pointing to the list that T2 just appended to and T1 will return that list.  But what happens if you get multiple guys in ``get_data()`` and multiple guys in ``on_received()``?  I can't prove it, but it seems like you're going to have an uncertain outcome.  If you're just dealing with 2 threads, I can't see how that would be unsafe.  Maybe someone could come up with a use case that would disprove that.  But if you've got, say, 4 threads, 2 in each method....that's gonna get messy. 

And, honestly, I'm trying *really* hard to come up with a scenario that would lose data and I can't.  Maybe someone like Peter or Aahz or some little 13 year old in Topeka who's smarter than me can come up with something.  But I do know this - the more I think about this as to whether this is unsafe or not is making my head hurt.  If you have a piece of code that you have to spend that much time on trying to figure out if it is threadsafe or not, why would you leave it as is?  Maybe the rest of you are more confident in your thinking and programming skills than I am, but I would quickly slap a Queue in there.  If for nothing else than to rest from simulating in my head 1, 2, 3, 5, 10 threads in the ``get_data()`` method while various threads are in the ``on_received()`` method.  Aaaagghhh.....need....motrin......


Jeremy Jones
-- 
http://mail.python.org/mailman/listinfo/python-list

Reply via email to