Re: collect data using threads
Toby Dickenson wrote: > But it might not "show up" until too late. > > The consumer thread that called get_data presumably does something with that > list, such as iterating over its contents. It might only "show up" after that > iteration has finished, when the consumer has discarded its reference to the > shared list. I was going to point out that the consuming thread is the one calling get_data(), and therefore by the time it returns (to iterate over the contents), self.data has already been rebound to a new list. That was before Kent correctly analyzed this yet again and shows how the on_received call can itself be the source of the trouble, via the separate attribute lookup and append call. (I'm going to hand in my multi-threading merit badge and report to Aahz for another Queue "reprogramming" session for missing on this twice.) -Peter -- http://mail.python.org/mailman/listinfo/python-list
Re: collect data using threads
Previously, on Jun 14, Peter Hansen said: # James Tanis wrote: # > I may be wrong here, but shouldn't you just use a stack, or in other # > words, use the list as a stack and just pop the data off the top. I # > believe there is a method pop() already supplied for you. # # Just a note on terminology here. I believe the word "stack" generally # refers to a LIFO (last-in first-out) structure, not what the OP needs # which is a FIFO (first-in first-out). What can I say? Lack of sleep. # # Assuming you would refer to the .append() operation as "putting data on # the bottom", then to pop off the "top" you would use pop(0), not just # pop(). Right, except I'm not writing his code for him, and I don't think he expects me too. I was just referring to the existance of a pop() function, perhaps I should have said pop([int]) to be clearer. Its use would of course have to be tailored to his code depending on what he requires. # # Normally though, I think one would refer to these as the head and tail # (not top and bottom), and probably call the whole thing a queue, rather # than a stack. I agree, its been a while and I mixed the two names up, nothing more. --- James Tanis [EMAIL PROTECTED] http://pycoder.org -- http://mail.python.org/mailman/listinfo/python-list
Re: collect data using threads
Qiangning Hong wrote: > I actually had considered Queue and pop() before I wrote the above code. > However, because there is a lot of data to get every time I call > get_data(), I want a more CPU friendly way to avoid the while-loop and > empty checking, and then the above code comes out. But I am not very > sure whether it will cause serious problem or not, so I ask here. If > anyone can prove it is correct, I'll use it in my program, else I'll go > back to the Queue solution. OK, here is a real failure mode. Here is the code and the disassembly: >>> class Collector(object): ... def __init__(self): ... self.data = [] ... def on_received(self, a_piece_of_data): ... """This callback is executed in work bee threads!""" ... self.data.append(a_piece_of_data) ... def get_data(self): ... x = self.data ... self.data = [] ... return x ... >>> import dis >>> dis.dis(Collector.on_received) 6 0 LOAD_FAST0 (self) 3 LOAD_ATTR1 (data) 6 LOAD_ATTR2 (append) 9 LOAD_FAST1 (a_piece_of_data) 12 CALL_FUNCTION1 15 POP_TOP 16 LOAD_CONST 1 (None) 19 RETURN_VALUE >>> dis.dis(Collector.get_data) 8 0 LOAD_FAST0 (self) 3 LOAD_ATTR1 (data) 6 STORE_FAST 1 (x) 9 9 BUILD_LIST 0 12 LOAD_FAST0 (self) 15 STORE_ATTR 1 (data) 10 18 LOAD_FAST1 (x) 21 RETURN_VALUE Imagine the thread calling on_received() gets as far as LOAD_ATTR (data), LOAD_ATTR (append) or LOAD_FAST (a_piece_of_data), so it has a reference to self.data; then it blocks and the get_data() thread runs. The get_data() thread could call get_data() and *finish processing the returned list* before the on_received() thread runs again and actually appends to the list. The appended value will never be processed. If you want to avoid the overhead of a Queue.get() for each data element you could just put your own mutex into on_received() and get_data(). Kent -- http://mail.python.org/mailman/listinfo/python-list
Re: collect data using threads
On Tuesday 14 June 2005 17:47, Peter Hansen wrote: > Kent Johnson wrote: > > Peter Hansen wrote: > >> That will not work, and you will get data loss, as Jeremy points out. > >> > > Can you explain why not? self.data is still bound to the same list as x. > > At least if the execution sequence is x = self.data > >self.data.append(a_piece_of_data) > > self.data = [] > > Ah, since the entire list is being returned, you appear to be correct. > Interesting... this means the OP's code is actually appending things to > a list, over and over (presumably), then returning a reference to that > list and rebinding the internal variable to a new list. If another > thread calls on_received() and causes new data to be appended to "the > list" between those two statements, then it will show up in the returned > list (rather magically, at least to my way of looking at it) and will > not in fact be lost. But it might not "show up" until too late. The consumer thread that called get_data presumably does something with that list, such as iterating over its contents. It might only "show up" after that iteration has finished, when the consumer has discarded its reference to the shared list. -- Toby Dickenson -- http://mail.python.org/mailman/listinfo/python-list
Re: collect data using threads
Peter Hansen wrote: > James Tanis wrote: > >> I may be wrong here, but shouldn't you just use a stack, or in other >> words, use the list as a stack and just pop the data off the top. I >> believe there is a method pop() already supplied for you. > > Just a note on terminology here. I believe the word "stack" generally > refers to a LIFO (last-in first-out) structure, not what the OP needs > which is a FIFO (first-in first-out). Or, perhaps he doesn't need either... as Kent points out (I should have read his post before replying above) this isn't what I think James and I both thought it was but something a little less usual... -Peter -- http://mail.python.org/mailman/listinfo/python-list
Re: collect data using threads
James Tanis wrote: > # > > > A class Collector, it spawns several threads to read from serial port. > # > > > Collector.get_data() will get all the data they have read since last > # > > > call. Who can tell me whether my implementation correct? > # > > > > # Here's the original code: > # > # class Collector(object): > #def __init__(self): > #self.data = [] > #spawn_work_bees(callback=self.on_received) > # > #def on_received(self, a_piece_of_data): > #"""This callback is executed in work bee threads!""" > #self.data.append(a_piece_of_data) > # > #def get_data(self): > #x = self.data > #self.data = [] > #return x > # > I may be wrong here, but shouldn't you just use a stack, or in other > words, use the list as a stack and just pop the data off the top. I > believe there is a method pop() already supplied for you. Since > you wouldn't require an self.data = [] this should allow you to safely > remove the data you've already seen without accidentally removing data > that may have been added in the mean time. > I am the original poster. I actually had considered Queue and pop() before I wrote the above code. However, because there is a lot of data to get every time I call get_data(), I want a more CPU friendly way to avoid the while-loop and empty checking, and then the above code comes out. But I am not very sure whether it will cause serious problem or not, so I ask here. If anyone can prove it is correct, I'll use it in my program, else I'll go back to the Queue solution. To Jeremy Jones: I am very sorry to take you too much effort on this weird code. I should make it clear that there is only *one* thread (the main thread in my application) calls the get_data() method, periodically, driven by a timer. And for on_received(), there may be up to 16 threads accessing it simultaneously. -- Qiangning Hong ___ / BOFH Excuse #208: \ | | | Your mail is being routed through Germany ... and they're | \ censoring us. / --- \ ._ . \|\_|/__/| / / \/ \ \ /__|O||O|__ \ |/_ \_/\_/ _\ | | | () | || \/\___/\__/ // (_/ || | || | ||\ \//_/ \__// __ || __|| (() -- http://mail.python.org/mailman/listinfo/python-list
Re: collect data using threads
Kent Johnson wrote: > Peter Hansen wrote: >> That will not work, and you will get data loss, as Jeremy points out. >> > Can you explain why not? self.data is still bound to the same list as x. > At least if the execution sequence is x = self.data >self.data.append(a_piece_of_data) > self.data = [] Ah, since the entire list is being returned, you appear to be correct. Interesting... this means the OP's code is actually appending things to a list, over and over (presumably), then returning a reference to that list and rebinding the internal variable to a new list. If another thread calls on_received() and causes new data to be appended to "the list" between those two statements, then it will show up in the returned list (rather magically, at least to my way of looking at it) and will not in fact be lost. Good catch Kent. :-) -Peter -- http://mail.python.org/mailman/listinfo/python-list
Re: collect data using threads
James Tanis wrote: > I may be wrong here, but shouldn't you just use a stack, or in other > words, use the list as a stack and just pop the data off the top. I > believe there is a method pop() already supplied for you. Just a note on terminology here. I believe the word "stack" generally refers to a LIFO (last-in first-out) structure, not what the OP needs which is a FIFO (first-in first-out). Assuming you would refer to the .append() operation as "putting data on the bottom", then to pop off the "top" you would use pop(0), not just pop(). Normally though, I think one would refer to these as the head and tail (not top and bottom), and probably call the whole thing a queue, rather than a stack. -Peter -- http://mail.python.org/mailman/listinfo/python-list
Re: collect data using threads
Previously, on Jun 14, Jeremy Jones said: # Kent Johnson wrote: # # > Peter Hansen wrote: # > # > > Qiangning Hong wrote: # > > # > > # > > > A class Collector, it spawns several threads to read from serial port. # > > > Collector.get_data() will get all the data they have read since last # > > > call. Who can tell me whether my implementation correct? # > > > # > > [snip sample with a list] # > > # > > # > > > I am not very sure about the get_data() method. Will it cause data lose # > > > if there is a thread is appending data to self.data at the same time? # > > > # > > That will not work, and you will get data loss, as Jeremy points out. # > > # > > Normally Python lists are safe, but your key problem (in this code) is # > > that you are rebinding self.data to a new list! If another thread calls # > > on_received() just after the line "x = self.data" executes, then the new # > > data will never be seen. # > > # > # > Can you explain why not? self.data is still bound to the same list as x. At # > least if the execution sequence is x = self.data # >self.data.append(a_piece_of_data) # > self.data = [] # > # > ISTM it should work. # > # > I'm not arguing in favor of the original code, I'm just trying to understand # > your specific failure mode. # > # > Thanks, # > Kent # > # Here's the original code: # # class Collector(object): #def __init__(self): #self.data = [] #spawn_work_bees(callback=self.on_received) # #def on_received(self, a_piece_of_data): #"""This callback is executed in work bee threads!""" #self.data.append(a_piece_of_data) # #def get_data(self): #x = self.data #self.data = [] #return x # # The more I look at this, the more I'm not sure whether data loss will occur. # For me, that's good enough reason to rewrite this code. I'd rather be clear # and certain than clever anyday. # So, let's say you a thread T1 which starts in ``get_data()`` and makes it as # far as ``x = self.data``. Then another thread T2 comes along in # ``on_received()`` and gets as far as ``self.data.append(a_piece_of_data)``. # ``x`` in T1's get_data()`` (as you pointed out) is still pointing to the list # that T2 just appended to and T1 will return that list. But what happens if # you get multiple guys in ``get_data()`` and multiple guys in # ``on_received()``? I can't prove it, but it seems like you're going to have # an uncertain outcome. If you're just dealing with 2 threads, I can't see how # that would be unsafe. Maybe someone could come up with a use case that would # disprove that. But if you've got, say, 4 threads, 2 in each methodthat's # gonna get messy. # And, honestly, I'm trying *really* hard to come up with a scenario that would # lose data and I can't. Maybe someone like Peter or Aahz or some little 13 # year old in Topeka who's smarter than me can come up with something. But I do # know this - the more I think about this as to whether this is unsafe or not is # making my head hurt. If you have a piece of code that you have to spend that # much time on trying to figure out if it is threadsafe or not, why would you # leave it as is? Maybe the rest of you are more confident in your thinking and # programming skills than I am, but I would quickly slap a Queue in there. If # for nothing else than to rest from simulating in my head 1, 2, 3, 5, 10 # threads in the ``get_data()`` method while various threads are in the # ``on_received()`` method. Aaaagghhh.needmotrin.. # # # Jeremy Jones # I may be wrong here, but shouldn't you just use a stack, or in other words, use the list as a stack and just pop the data off the top. I believe there is a method pop() already supplied for you. Since you wouldn't require an self.data = [] this should allow you to safely remove the data you've already seen without accidentally removing data that may have been added in the mean time. --- James Tanis [EMAIL PROTECTED] http://pycoder.org -- http://mail.python.org/mailman/listinfo/python-list
Re: collect data using threads
Kent Johnson wrote: Peter Hansen wrote: Qiangning Hong wrote: A class Collector, it spawns several threads to read from serial port. Collector.get_data() will get all the data they have read since last call. Who can tell me whether my implementation correct? [snip sample with a list] I am not very sure about the get_data() method. Will it cause data lose if there is a thread is appending data to self.data at the same time? That will not work, and you will get data loss, as Jeremy points out. Normally Python lists are safe, but your key problem (in this code) is that you are rebinding self.data to a new list! If another thread calls on_received() just after the line "x = self.data" executes, then the new data will never be seen. Can you explain why not? self.data is still bound to the same list as x. At least if the execution sequence is x = self.data self.data.append(a_piece_of_data) self.data = "" ISTM it should work. I'm not arguing in favor of the original code, I'm just trying to understand your specific failure mode. Thanks, Kent Here's the original code: class Collector(object): def __init__(self): self.data = "" spawn_work_bees(callback=self.on_received) def on_received(self, a_piece_of_data): """This callback is executed in work bee threads!""" self.data.append(a_piece_of_data) def get_data(self): x = self.data self.data = "" return x The more I look at this, the more I'm not sure whether data loss will occur. For me, that's good enough reason to rewrite this code. I'd rather be clear and certain than clever anyday. So, let's say you a thread T1 which starts in ``get_data()`` and makes it as far as ``x = self.data``. Then another thread T2 comes along in ``on_received()`` and gets as far as ``self.data.append(a_piece_of_data)``. ``x`` in T1's get_data()`` (as you pointed out) is still pointing to the list that T2 just appended to and T1 will return that list. But what happens if you get multiple guys in ``get_data()`` and multiple guys in ``on_received()``? I can't prove it, but it seems like you're going to have an uncertain outcome. If you're just dealing with 2 threads, I can't see how that would be unsafe. Maybe someone could come up with a use case that would disprove that. But if you've got, say, 4 threads, 2 in each methodthat's gonna get messy. And, honestly, I'm trying *really* hard to come up with a scenario that would lose data and I can't. Maybe someone like Peter or Aahz or some little 13 year old in Topeka who's smarter than me can come up with something. But I do know this - the more I think about this as to whether this is unsafe or not is making my head hurt. If you have a piece of code that you have to spend that much time on trying to figure out if it is threadsafe or not, why would you leave it as is? Maybe the rest of you are more confident in your thinking and programming skills than I am, but I would quickly slap a Queue in there. If for nothing else than to rest from simulating in my head 1, 2, 3, 5, 10 threads in the ``get_data()`` method while various threads are in the ``on_received()`` method. Aaaagghhh.needmotrin.. Jeremy Jones -- http://mail.python.org/mailman/listinfo/python-list
Re: collect data using threads
Peter Hansen wrote: > Qiangning Hong wrote: > >> A class Collector, it spawns several threads to read from serial port. >> Collector.get_data() will get all the data they have read since last >> call. Who can tell me whether my implementation correct? > > [snip sample with a list] > >> I am not very sure about the get_data() method. Will it cause data lose >> if there is a thread is appending data to self.data at the same time? > > > That will not work, and you will get data loss, as Jeremy points out. > > Normally Python lists are safe, but your key problem (in this code) is > that you are rebinding self.data to a new list! If another thread calls > on_received() just after the line "x = self.data" executes, then the new > data will never be seen. Can you explain why not? self.data is still bound to the same list as x. At least if the execution sequence is x = self.data self.data.append(a_piece_of_data) self.data = [] ISTM it should work. I'm not arguing in favor of the original code, I'm just trying to understand your specific failure mode. Thanks, Kent -- http://mail.python.org/mailman/listinfo/python-list
Re: collect data using threads
Qiangning Hong wrote: > A class Collector, it spawns several threads to read from serial port. > Collector.get_data() will get all the data they have read since last > call. Who can tell me whether my implementation correct? [snip sample with a list] > I am not very sure about the get_data() method. Will it cause data lose > if there is a thread is appending data to self.data at the same time? That will not work, and you will get data loss, as Jeremy points out. Normally Python lists are safe, but your key problem (in this code) is that you are rebinding self.data to a new list! If another thread calls on_received() just after the line "x = self.data" executes, then the new data will never be seen. One option that would work safely** is to change get_data() to look like this: def get_data(self): count = len(self.data) result = self.data[:count] del self.data[count:] return result This does what yours was trying to do, but safely. Not that it doesn't reassign self.data, but rather uses a single operation (del) to remove all the "preserved" elements at once. It's possible that after the first or second line a call to on_received() will add data, but it simply won't be seen until the next call to get_data(), rather than being lost. ** I'm showing you this to help you understand why your own approach was wrong, not to give you code that you should use. The key problem with even my approach is that it *assumes things about the implementation*. Specifically, there are no guarantees in Python the Language (as opposed to CPython, the implementation) about the thread-safety of working with lists like this. In fact, in Jython (and possibly other Python implementations) this would definitely have problems. Unless you are certain your code will run only under CPython, and you're willing to put comments in the code about potential thread safety issues, you should probably just follow Jeremy's advice and use Queue. As a side benefit, Queues are much easier to work with! -Peter -- http://mail.python.org/mailman/listinfo/python-list
Re: collect data using threads
Qiangning Hong wrote: >A class Collector, it spawns several threads to read from serial port. >Collector.get_data() will get all the data they have read since last >call. Who can tell me whether my implementation correct? > >class Collector(object): >def __init__(self): >self.data = [] >spawn_work_bees(callback=self.on_received) > >def on_received(self, a_piece_of_data): >"""This callback is executed in work bee threads!""" >self.data.append(a_piece_of_data) > >def get_data(self): >x = self.data >self.data = [] >return x > >I am not very sure about the get_data() method. Will it cause data lose >if there is a thread is appending data to self.data at the same time? > >Is there a more pythonic/standard recipe to collect thread data? > > > This looks a little scary. If a thread is putting something in self.data (in the on_received() method) when someone else is getting something out (in the get_data() method), the data that is put into self.data could conceivably be lost because you are pointing self.data to an empty list each time get_data() is called and the list that self.data was pointing to when on_received() was called may just be dangling. Why not use the Queue from the Queue module? You can push stuff in from one side and (have as many threads pushing stuff onto it as you like) and pull stuff off from the other side (again, you can have as many consumers as you'd like as well) in a thread safe manner. HTH, Jeremy Jones -- http://mail.python.org/mailman/listinfo/python-list