Re: Parallel(?) programming with python
I would like to thank everybody who answered my question. The insight was very informative. This seems to be one of the few newsgroups still alive and kicking, with a lot of knowledgeable people taking the time to help others. I like how quick and easy it is to post questions and receive answers here as compared to web-based forums (although there are some disadvantages too). I'm implementing some of the ideas received here and I will surely have other questions as I go. But the project will take a long time because I'm doing this as a hobby during my vacation, that are unfortunately about to end. Thanks again, Community. On 08.08.22 12:47, Andreas Croci wrote: tI would like to write a program, that reads from the network a fixed amount of bytes and appends them to a list. This should happen once a second. Another part of the program should take the list, as it has been filled so far, every 6 hours or so, and do some computations on the data (a FFT). Every so often (say once a week) the list should be saved to a file, shorthened in the front by so many items, and filled further with the data coming fom the network. After the first saving of the whole list, only the new part (the data that have come since the last saving) should be appended to the file. A timestamp is in the data, so it's easy to say what is new and what was already there. I'm not sure how to do this properly: can I write a part of a program that keeps doing its job (appending data to the list once every second) while another part computes something on the data of the same list, ignoring the new data being written? Basically the question boils down to wether it is possible to have parts of a program (could be functions) that keep doing their job while other parts do something else on the same data, and what is the best way to do this. -- https://mail.python.org/mailman/listinfo/python-list
Re: Parallel(?) programming with python
Dennis Lee Bieber wrote at 2022-8-10 14:19 -0400: >On Wed, 10 Aug 2022 19:33:04 +0200, "Dieter Maurer" > ... >>You could also use the `sched` module from Python's library. > >Time to really read the library reference manual again... > > Though if I read this correctly, a long running action /will/ delay >others -- which could mean the (FFT) process could block collecting new >1-second readings while it is active. It also is "one-shot" on the >scheduled actions, meaning those actions still have to reschedule >themselves for the next time period. Both true. With `multiprocessing`, you can delegate long running activity to a separate process. -- https://mail.python.org/mailman/listinfo/python-list
Re: Parallel(?) programming with python
Please let me know if that is okay. On Wed, Aug 10, 2022 at 7:46 PM <2qdxy4rzwzuui...@potatochowder.com> wrote: > On 2022-08-09 at 17:04:51 +, > "Schachner, Joseph (US)" wrote: > > > Why would this application *require* parallel programming? This could > > be done in one, single thread program. Call time to get time and save > > it as start_time. Keep a count of the number of 6 hour intervals, > > initialize it to 0. > > In theory, you are correct. > > In practice, [stuff] happens. What if your program crashes? Or the > computer crashes? Or there's a Python update? Or an OS update? Where > does all that pending data go, and how will you recover it after you've > addressed whatever happened? ¹ > > OTOH, once you start writing the pending data to a file, then it's an > extremely simple leap to multiple programs (rather than multiple > threads) for all kinds of good reasons. > > ¹ FWIW, I used to develop highly available systems, such as telephone > switches, which allow [stuff] to happen, and yet continue to function. > It's pretty cool to yank a board (yes, physically remove it, without > warning) from the system without [apparently] disrupting anything. Such > systems also allow for hardware, OS, and application upgrades, too > (IIRC, we were allowed a handful of seconds of downtime per year to meet > our availability requirements). That said, designing and building such > a system for the sakes of simplicity and convenience of the application > we're talking about here would make a pretty good definition of > "overkill." > -- > https://mail.python.org/mailman/listinfo/python-list > -- https://mail.python.org/mailman/listinfo/python-list
Re: Parallel(?) programming with python
Thanks again for the info. On Wed, Aug 10, 2022 at 9:31 PM Peter J. Holzer wrote: > On 2022-08-10 14:19:37 -0400, Dennis Lee Bieber wrote: > > On Wed, 10 Aug 2022 19:33:04 +0200, "Dieter Maurer" > > > declaimed the following: > > >Schachner, Joseph (US) wrote at 2022-8-9 17:04 +: > > >>Why would this application *require* parallel programming? This > > >>could be done in one, single thread program. Call time to get time > > >>and save it as start_time. Keep a count of the number of 6 hour > > >>intervals, initialize it to 0. > [...] > > Though if I read this correctly, a long running action /will/ > > delay others -- which could mean the (FFT) process could block > > collecting new 1-second readings while it is active. > > Certainly, but does it matter? Data is received from some network > connection and network connections often involve quite a bit of > buffering. If the consumer is blocked for 3 or 4 or maybe even 20 > seconds, the producer might not even notice. (This of course depends > very much on the details which we know nothing about.) > > hp > > -- >_ | Peter J. Holzer| Story must make more sense than reality. > |_|_) || > | | | h...@hjp.at |-- Charles Stross, "Creative writing > __/ | http://www.hjp.at/ | challenge!" > -- > https://mail.python.org/mailman/listinfo/python-list > -- https://mail.python.org/mailman/listinfo/python-list
Re: Parallel(?) programming with python
On 2022-08-10 14:19:37 -0400, Dennis Lee Bieber wrote: > On Wed, 10 Aug 2022 19:33:04 +0200, "Dieter Maurer" > declaimed the following: > >Schachner, Joseph (US) wrote at 2022-8-9 17:04 +: > >>Why would this application *require* parallel programming? This > >>could be done in one, single thread program. Call time to get time > >>and save it as start_time. Keep a count of the number of 6 hour > >>intervals, initialize it to 0. [...] > Though if I read this correctly, a long running action /will/ > delay others -- which could mean the (FFT) process could block > collecting new 1-second readings while it is active. Certainly, but does it matter? Data is received from some network connection and network connections often involve quite a bit of buffering. If the consumer is blocked for 3 or 4 or maybe even 20 seconds, the producer might not even notice. (This of course depends very much on the details which we know nothing about.) hp -- _ | Peter J. Holzer| Story must make more sense than reality. |_|_) || | | | h...@hjp.at |-- Charles Stross, "Creative writing __/ | http://www.hjp.at/ | challenge!" signature.asc Description: PGP signature -- https://mail.python.org/mailman/listinfo/python-list
RE: Parallel(?) programming with python
There are many possible discussions we can have here and some are not really about whether and how to use Python. The user asked how to do what is a fairly standard task for some people and arguably is not necessarily best done using a single application running things in parallel. So, yes, if you have full access to your machine and can schedule tasks, then some obvious answers come to mind where one process listens and receives data and stores it, and another process periodically wakes up and grabs recent data and processes it and perhaps still another process comes up even less often and does some re-arrangement of old data. And, yes, for such large volumes of data it may be a poor design to hold all the data in memory for many hours or even days and various ways of using a database or files/folders with a naming structure are a good idea. But the original question remains, in my opinion, a not horrible one. All kinds of applications can be written with sets of tasks run largely in parallel with some form of communication between tasks using shared data structures like queues and perhaps locks and with a requirement that any tasks that take nontrivial time need a way to buffer any communications to not block others. Also, for people who want to start ONE process and let it run, and perhaps may not be able to easily schedule other processes on a system level, it can be advantageous to know how to set up something along those lines within a single python session. Of course, for efficiency reasons, any I/O to files slows things down but what is described here as the situation seems to be somewhat easier and safer to do in so many other ways. I think a main point is that there are good ways to avoid the data from being acted on by two parties that share memory. One is NOT to share memory for this purpose. Another might be to have the 6-hour process use a lock to move the data aside or send a message to the receiving process to pause a moment and set the data aside and begin collecting anew while the old is processed and so on. There are many such choices and the parts need not be in the same process or all written in python. But some solutions can be generalized easier than others. For example, can there become a need to collect data from multiple sources, perhaps using multiple listeners? -Original Message- From: Python-list On Behalf Of Dieter Maurer Sent: Wednesday, August 10, 2022 1:33 PM To: Schachner, Joseph (US) Cc: Andreas Croci ; python-list@python.org Subject: RE: Parallel(?) programming with python Schachner, Joseph (US) wrote at 2022-8-9 17:04 +: >Why would this application *require* parallel programming? This could be done in one, single thread program. Call time to get time and save it as start_time. Keep a count of the number of 6 hour intervals, initialize it to 0. You could also use the `sched` module from Python's library. -- https://mail.python.org/mailman/listinfo/python-list -- https://mail.python.org/mailman/listinfo/python-list
Re: Parallel(?) programming with python
On 2022-08-09 at 17:04:51 +, "Schachner, Joseph (US)" wrote: > Why would this application *require* parallel programming? This could > be done in one, single thread program. Call time to get time and save > it as start_time. Keep a count of the number of 6 hour intervals, > initialize it to 0. In theory, you are correct. In practice, [stuff] happens. What if your program crashes? Or the computer crashes? Or there's a Python update? Or an OS update? Where does all that pending data go, and how will you recover it after you've addressed whatever happened? ¹ OTOH, once you start writing the pending data to a file, then it's an extremely simple leap to multiple programs (rather than multiple threads) for all kinds of good reasons. ¹ FWIW, I used to develop highly available systems, such as telephone switches, which allow [stuff] to happen, and yet continue to function. It's pretty cool to yank a board (yes, physically remove it, without warning) from the system without [apparently] disrupting anything. Such systems also allow for hardware, OS, and application upgrades, too (IIRC, we were allowed a handful of seconds of downtime per year to meet our availability requirements). That said, designing and building such a system for the sakes of simplicity and convenience of the application we're talking about here would make a pretty good definition of "overkill." -- https://mail.python.org/mailman/listinfo/python-list
Re: Parallel(?) programming with python
On Wed, 10 Aug 2022 19:33:04 +0200, "Dieter Maurer" declaimed the following: >Schachner, Joseph (US) wrote at 2022-8-9 17:04 +: >>Why would this application *require* parallel programming? This could be >>done in one, single thread program. Call time to get time and save it as >>start_time. Keep a count of the number of 6 hour intervals, initialize it >>to 0. > >You could also use the `sched` module from Python's library. Time to really read the library reference manual again... Though if I read this correctly, a long running action /will/ delay others -- which could mean the (FFT) process could block collecting new 1-second readings while it is active. It also is "one-shot" on the scheduled actions, meaning those actions still have to reschedule themselves for the next time period. -- Wulfraed Dennis Lee Bieber AF6VN wlfr...@ix.netcom.comhttp://wlfraed.microdiversity.freeddns.org/ -- https://mail.python.org/mailman/listinfo/python-list
RE: Parallel(?) programming with python
Schachner, Joseph (US) wrote at 2022-8-9 17:04 +: >Why would this application *require* parallel programming? This could be >done in one, single thread program. Call time to get time and save it as >start_time. Keep a count of the number of 6 hour intervals, initialize it to >0. You could also use the `sched` module from Python's library. -- https://mail.python.org/mailman/listinfo/python-list
RE: Parallel(?) programming with python
Why would this application *require* parallel programming? This could be done in one, single thread program. Call time to get time and save it as start_time. Keep a count of the number of 6 hour intervals, initialize it to 0. Once a second read data an append to list. At 6 hours after start time, call a function that does an FFT (see comment about scipy below) and increment the count of 6 hour intervals. Call time and save new start time. Continue execution. After 28 six hour intervals, save the list and then slice the list to shorten it as you want. Reset the count of 6 hour intervals to zero. The FFT might take a second, even if you use scipy, depending on how long the list is (If you don’t know about numpy and scipy, look them up! You need them. Your list can be an array in numpy). Saving and slicing the list should take less than a second. This single thread approach avoids thinking about multiprocessing, locking and unlocking data structures, all that stuff that does not contribute to the goal of the program. --- Joseph S. Teledyne Confidential; Commercially Sensitive Business Data -Original Message- From: Andreas Croci Sent: Monday, August 8, 2022 6:47 AM To: python-list@python.org Subject: Parallel(?) programming with python tI would like to write a program, that reads from the network a fixed amount of bytes and appends them to a list. This should happen once a second. Another part of the program should take the list, as it has been filled so far, every 6 hours or so, and do some computations on the data (a FFT). Every so often (say once a week) the list should be saved to a file, shorthened in the front by so many items, and filled further with the data coming fom the network. After the first saving of the whole list, only the new part (the data that have come since the last saving) should be appended to the file. A timestamp is in the data, so it's easy to say what is new and what was already there. I'm not sure how to do this properly: can I write a part of a program that keeps doing its job (appending data to the list once every second) while another part computes something on the data of the same list, ignoring the new data being written? Basically the question boils down to wether it is possible to have parts of a program (could be functions) that keep doing their job while other parts do something else on the same data, and what is the best way to do this. -- https://mail.python.org/mailman/listinfo/python-list
Re: Parallel(?) programming with python
On Mon, 8 Aug 2022 19:39:27 +0200, Andreas Croci declaimed the following: > >Do you mean queues in the sense of deque (the data structure)? I ask >because I can see the advantage there when I try to pop data from the >front of it, but I don't see the sense of the following statement ("than Most likely this was a reference to the Queue module -- which is used to pass data from one thread to another. Your "fetch" thread would package up the "new" data to be processed by the FFT thread. The FFT thread is blocked waiting for data to appear on the queue -- when it appears, the FFT thread reads the entire packet of data and proceeds to process it. Note that in this scheme, the FFT thread is NOT on a timer -- the fetch thread controls the timing by when it puts data into the queue. cf: https://docs.python.org/3/library/threading.html https://docs.python.org/3/library/queue.html > >That would obviusly save some coding (but would introduce the need to >code the interaction with the database), but I'm not sure it would speed >up the thing. Would the RDBMS allow to read a table while something else >is writing to it? I doubt it and I'm not sure it doesn't flush the cache >before letting you read, which would include a normally slow disk access. > Depends upon the RDBMs. Some are "multi-version concurrency" -- they snapshot the data at the time of the read, while letting new writes proceed. But if one is doing read/modify/write, this can cause a problem as the RDBM will detect that a record was modified by someone else and prevent you from changing it -- you have to reselect the data to get the current version. You will want to treat each of your network fetches as a transaction -- and close the transaction fast. Your FFT process would need to select all data in the range to be processed, and load it into memory so you can free that transaction https://www.sqlite.org/lockingv3.html See section 3.0 and section 5.0 -- Wulfraed Dennis Lee Bieber AF6VN wlfr...@ix.netcom.comhttp://wlfraed.microdiversity.freeddns.org/ -- https://mail.python.org/mailman/listinfo/python-list
Re: Parallel(?) programming with python
Queues are better than lists for concurrency. If you get the right kind, they have implicit locking, making your code simpler and more robust at the same time. CPython threading is mediocre for software systems that have one or more CPU-bound threads, and your FFT might be CPU-bound. Rather than using threading directly, you probably should use https://docs.python.org/3/library/concurrent.futures.html , which gives you easy switching between threads and processes. Or if you, like me, get inordinately joyous over programs that run on more than one kind of Python, you could give up concurrent.futures and use _thread. Sadly, that gives up easy flipping between threads and processes, but gives you easy flipping between CPython and micropython. Better still, micropython appears to have more scalable threading than CPython, so if you decide you need 20 CPU-hungry threads someday, you are less likely to be in a bind. For reading from a socket, if you're not going the REST route, may I suggest https://stromberg.dnsalias.org/~strombrg/bufsock.html ? It deals with framing and lengths relatively smoothly. Otherwise, robust socket code tends to need while loops and tedious arithmetic. HTH On Mon, Aug 8, 2022 at 10:59 AM Andreas Croci wrote: > I would like to write a program, that reads from the network a fixed > amount of bytes and appends them to a list. This should happen once a > second. > -- https://mail.python.org/mailman/listinfo/python-list
Re: Parallel(?) programming with python
On 09Aug2022 00:22, Oscar Benjamin wrote: >On Mon, 8 Aug 2022 at 19:01, Andreas Croci wrote: >> Basically the question boils down to wether it is possible to have >> parts >> of a program (could be functions) that keep doing their job while other >> parts do something else on the same data, and what is the best way to do >> this. Which is of course feasible, as others have outlined. >Why do these "parts of a program" need to be part of the *same* >program. I would write this as just two separate programs. One >collects the data and writes it to a file. The other periodically >reads the file and computes the DFT. I would also write these as separate programmes, or at least as distinct modes of the same programme (eg "myprog poll" and "myprog archive" etc). Largely because you might run the "poll" regularly and briefly, and the processes phase separately and less frequently. You don't need to keep a single programme lurking around forever - fire it up as required. However, I want to point out that this _in no way_ removes the need for access contol and mutexes. It will change the mechanism (because your two programmes are now operating separately) and makes it more concrete in your mind what _actually and precisely_ needs protection. For example, you probably want to avoid _processing_ a data file at the same time as _updating_ that file. Depending on what you're doing this can be as simple as keeping "to be updated" files with distinct names from "available to be processed/archived" files. This is a standard difficulty with "hot folder" upload areas. A common approach might be to write a file with a "temp" style name (eg ".tmp*") until completed, then rename it to its official name (eg "datafile*"). And then your processing/archiving side can simply ignore the "in progress" files because they do not match the names it cares about. Anyway, those are specifics, which will be driven by what you're actually doing. The point is that you still need to coordinate use of the files suitably for your needs. Doing this in one long running programme with Threads/mutexes or separate programmes sharing a data directory just changes the mechanisms. Cheers, Cameron Simpson -- https://mail.python.org/mailman/listinfo/python-list
RE: Parallel(?) programming with python
Stefan, You are correct that the goal of a lock is to do something rather quickly and atomically, so your design should not do something complex or long before releasing the lock. In your example, you have a producer adding data as regularly as every second and another that wakes up rarely and processes all the data since the last time. So you may want to augment the code you had to do something fast like point another variable at the data gathered so far and move the original variable to an empty list or whatever. Then you release the lock within fractions of a second and let the regular job keep adding to the initially empty list while the other part of the code processes without a lock. A design like the above has the busy worker constantly checking the lock. An alternative if you are sure the other process will only show up almost exactly at 6 hours on the clock, is to have the busy one check the time instead, but that may be more expensive. Still other architectures are possible, such as writing to not a single list for six hours, but some data structure with multiple sub-lists such as one where you switch every minute or so. The second process can note how many entries there are at the moment, and does all but the last and notes the location so the next time it starts there. This would work if you did not need every last bit of data as the two do not interfere with each other. And no real locks would be needed as the only thing the two parts share is the position or identity of the current last fragment which only one process actually touches. Just some ideas. Lots of other variations are very possible. -Original Message- From: Python-list On Behalf Of Stefan Ram Sent: Monday, August 8, 2022 7:21 AM To: python-list@python.org Subject: Re: Parallel(?) programming with python Andreas Croci writes: >Basically the question boils down to wether it is possible to have >parts of a program (could be functions) that keep doing their job while >other parts do something else on the same data, and what is the best >way to do this. Yes, but this is difficult. If you ask this question here, you might not be ready for this. I haven't learned it yet myself, but nevertheless tried to write a small example program quickly, which might still contain errors because of my lack of education. import threading import time def write_to_list( list, lock, event ): for i in range( 10 ): lock.acquire() try: list.append( i ) finally: lock.release() event.set() time.sleep( 3 ) def read_from_list( list, lock, event ): while True: event.wait() print( "Waking up." ) event.clear() if len( list ): print( "List contains " + str( list[ 0 ]) + "." ) lock.acquire() try: del list[ 0 ] finally: lock.release() else: print( "List is empty." ) list = [] lock = threading.Lock() event = threading.Event() threading.Thread( target=write_to_list, args=[ list, lock, event ]).start() threading.Thread( target=read_from_list, args=[ list, lock, event ]).start() In basketball, first you must learn to dribble and pass, before you can begin to shoot. With certain reservations, texts that can be considered to learn Python are: "Object-Oriented Programming in Python Documentation" - a PDF file, Introduction to Programming Using Python - Y Daniel Liang (2013), How to Think Like a Computer Scientist - Peter Wentworth (2012-08-12), The Coder's Apprentice - Pieter Spronck (2016-09-21), and Python Programming - John Zelle (2009). -- https://mail.python.org/mailman/listinfo/python-list -- https://mail.python.org/mailman/listinfo/python-list
Re: Parallel(?) programming with python
On Mon, 8 Aug 2022 at 19:01, Andreas Croci wrote: > > tI would like to write a program, that reads from the network a fixed > amount of bytes and appends them to a list. This should happen once a > second. > > Another part of the program should take the list, as it has been filled > so far, every 6 hours or so, and do some computations on the data (a FFT). > > Every so often (say once a week) the list should be saved to a file, > shorthened in the front by so many items, and filled further with the > data coming fom the network. After the first saving of the whole list, > only the new part (the data that have come since the last saving) should > be appended to the file. A timestamp is in the data, so it's easy to say > what is new and what was already there. > > I'm not sure how to do this properly: can I write a part of a program > that keeps doing its job (appending data to the list once every second) > while another part computes something on the data of the same list, > ignoring the new data being written? > > Basically the question boils down to wether it is possible to have parts > of a program (could be functions) that keep doing their job while other > parts do something else on the same data, and what is the best way to do > this. Why do these "parts of a program" need to be part of the *same* program. I would write this as just two separate programs. One collects the data and writes it to a file. The other periodically reads the file and computes the DFT. Note that a lot of the complexity discussed in other posts to do with threads and locks etc comes from the supposed constraint that this needs to be done with threads or something else that can work in parallel *within the same program*. If you relax that constraint the problem becomes a lot simpler. -- Oscar -- https://mail.python.org/mailman/listinfo/python-list
Re: Parallel(?) programming with python
On 08Aug2022 11:20, Stefan Ram wrote: >Andreas Croci writes: >>Basically the question boils down to wether it is possible to have parts >>of a program (could be functions) that keep doing their job while other >>parts do something else on the same data, and what is the best way to do >>this. > > Yes, but this is difficult. If you ask this question here, > you might not be ready for this. This is a very standard requirement for any concurrent activity and the typical approach is a mutex (mutual exclusion). You've already hit on the "standard" approach: a `threading.Lock` object. >lock.acquire() >try: >list.append( i ) >finally: >lock.release() Small note, which makes writing this much clearer. Lock objects are context managers. So: with lock: list.append(i) is all you need. Cheers, Cameron Simpson -- https://mail.python.org/mailman/listinfo/python-list
Re: Parallel(?) programming with python
On 2022-08-08 13:53:20 +0200, Andreas Croci wrote: > I'm in principle ok with locks, if it must be. What I fear is that the lock > could last long and prevent the function that writes into the list from > doing so every second. With an FFT on a list that contains a few bytes taken > every second over one week time (604.800 samples), I believe it's very > likely that the FFT function takes longer than a second to return. You woudn't lock the part performing the FFT, of course, only the part manipulating the shared list. That said, CPython (the reference implementation of Python) has what is called the Global Interpreter Lock (GIL) which locks every single Python instruction. So you can't have two threads actually computing anything at the same time - at least not if the computation is written in Python. Math packages like Numpy may or may not release the lock while they are busy. hp PS: I also agree with what others have said about the perils of multi-threaded programming. -- _ | Peter J. Holzer| Story must make more sense than reality. |_|_) || | | | h...@hjp.at |-- Charles Stross, "Creative writing __/ | http://www.hjp.at/ | challenge!" signature.asc Description: PGP signature -- https://mail.python.org/mailman/listinfo/python-list
Re: Parallel(?) programming with python
> On 8 Aug 2022, at 20:24, MRAB wrote: > > On 2022-08-08 12:20, Stefan Ram wrote: >> Andreas Croci writes: >>> Basically the question boils down to wether it is possible to have parts of >>> a program (could be functions) that keep doing their job while other parts >>> do something else on the same data, and what is the best way to do this. >> Yes, but this is difficult. If you ask this question here, >> you might not be ready for this. >> I haven't learned it yet myself, but nevertheless tried to >> write a small example program quickly, which might still >> contain errors because of my lack of education. >> import threading >> import time >> def write_to_list( list, lock, event ): >> for i in range( 10 ): >> lock.acquire() >> try: >> list.append( i ) >> finally: >> lock.release() >> event.set() >> time.sleep( 3 ) >> def read_from_list( list, lock, event ): >> while True: >> event.wait() >> print( "Waking up." ) >> event.clear() >> if len( list ): >> print( "List contains " + str( list[ 0 ]) + "." ) >> lock.acquire() >> try: >> del list[ 0 ] >> finally: >> lock.release() >> else: >> print( "List is empty." ) >> list = [] >> lock = threading.Lock() >> event = threading.Event() >> threading.Thread( target=write_to_list, args=[ list, lock, event ]).start() >> threading.Thread( target=read_from_list, args=[ list, lock, event ]).start() >> In basketball, first you must learn to dribble and pass, >> before you can begin to shoot. >> With certain reservations, texts that can be considered >> to learn Python are: >> "Object-Oriented Programming in Python Documentation" - a PDF file, >> Introduction to Programming Using Python - Y Daniel Liang (2013), >> How to Think Like a Computer Scientist - Peter Wentworth (2012-08-12), >> The Coder's Apprentice - Pieter Spronck (2016-09-21), and >> Python Programming - John Zelle (2009). > When working with threads, you should use queues, not lists, because queues > do their own locking and can wait for items to arrive, with a timeout, if > desired: Lists do not need to be locked in python because of the GIL. However you need locks to synchronise between threads. And as you say a queue has all that locking built in. Barry > > > import queue > import threading > import time > > def write_to_item_queue(item_queue): >for i in range(10): >print("Put", i, "in queue.", flush=True) >item_queue.put(i) >time.sleep(3) > ># Using None to indicate that there's no more to come. >item_queue.put(None) > > def read_from_item_queue(item_queue): >while True: >try: >item = item_queue.get() >except item_queue.Empty: >print("Queue is empty; should've have got here!", flush=True) >else: >print("Queue contains " + str(item) + ".", flush=True) > >if item is None: ># Using None to indicate that there's no more to come. >break > > item_queue = queue.Queue() > > write_thread = threading.Thread(target=write_to_item_queue, args=[item_queue]) > write_thread.start() > > read_thread = threading.Thread(target=read_from_item_queue, args=[item_queue]) > read_thread.start() > > # Wait for the threads to finish. > write_thread.join() > read_thread.join() > > print("Finished.") > -- > https://mail.python.org/mailman/listinfo/python-list > -- https://mail.python.org/mailman/listinfo/python-list
Re: Parallel(?) programming with python
On 2022-08-08 12:20, Stefan Ram wrote: Andreas Croci writes: Basically the question boils down to wether it is possible to have parts of a program (could be functions) that keep doing their job while other parts do something else on the same data, and what is the best way to do this. Yes, but this is difficult. If you ask this question here, you might not be ready for this. I haven't learned it yet myself, but nevertheless tried to write a small example program quickly, which might still contain errors because of my lack of education. import threading import time def write_to_list( list, lock, event ): for i in range( 10 ): lock.acquire() try: list.append( i ) finally: lock.release() event.set() time.sleep( 3 ) def read_from_list( list, lock, event ): while True: event.wait() print( "Waking up." ) event.clear() if len( list ): print( "List contains " + str( list[ 0 ]) + "." ) lock.acquire() try: del list[ 0 ] finally: lock.release() else: print( "List is empty." ) list = [] lock = threading.Lock() event = threading.Event() threading.Thread( target=write_to_list, args=[ list, lock, event ]).start() threading.Thread( target=read_from_list, args=[ list, lock, event ]).start() In basketball, first you must learn to dribble and pass, before you can begin to shoot. With certain reservations, texts that can be considered to learn Python are: "Object-Oriented Programming in Python Documentation" - a PDF file, Introduction to Programming Using Python - Y Daniel Liang (2013), How to Think Like a Computer Scientist - Peter Wentworth (2012-08-12), The Coder's Apprentice - Pieter Spronck (2016-09-21), and Python Programming - John Zelle (2009). When working with threads, you should use queues, not lists, because queues do their own locking and can wait for items to arrive, with a timeout, if desired: import queue import threading import time def write_to_item_queue(item_queue): for i in range(10): print("Put", i, "in queue.", flush=True) item_queue.put(i) time.sleep(3) # Using None to indicate that there's no more to come. item_queue.put(None) def read_from_item_queue(item_queue): while True: try: item = item_queue.get() except item_queue.Empty: print("Queue is empty; should've have got here!", flush=True) else: print("Queue contains " + str(item) + ".", flush=True) if item is None: # Using None to indicate that there's no more to come. break item_queue = queue.Queue() write_thread = threading.Thread(target=write_to_item_queue, args=[item_queue]) write_thread.start() read_thread = threading.Thread(target=read_from_item_queue, args=[item_queue]) read_thread.start() # Wait for the threads to finish. write_thread.join() read_thread.join() print("Finished.") -- https://mail.python.org/mailman/listinfo/python-list
Re: Parallel(?) programming with python
On 8/8/2022 4:47 AM, Andreas Croci wrote: tI would like to write a program, that reads from the network a fixed amount of bytes and appends them to a list. This should happen once a second. Another part of the program should take the list, as it has been filled so far, every 6 hours or so, and do some computations on the data (a FFT). Every so often (say once a week) the list should be saved to a file, shorthened in the front by so many items, and filled further with the data coming fom the network. After the first saving of the whole list, only the new part (the data that have come since the last saving) should be appended to the file. A timestamp is in the data, so it's easy to say what is new and what was already there. I'm not sure how to do this properly: can I write a part of a program that keeps doing its job (appending data to the list once every second) while another part computes something on the data of the same list, ignoring the new data being written? Basically the question boils down to wether it is possible to have parts of a program (could be functions) that keep doing their job while other parts do something else on the same data, and what is the best way to do this. You might be able to do what you need by making the file system work for you: Use numbered files, something like DATA/0001, DATA/0002, etc. Start by initializing a file number variable to 1 and creating an empty file, DATA/0001. The current time will be your start time. In an infinite loop, just as in Stefan's example: Read from the network and append to the current data file. This shouldn't take long unless the file is on a remote system. If six hours have gone by (compare the current time to the start time), close the current date file, create a thread (see Stefan's example) to call your FFT with the name of the current file, increment the file number, and open a new empty data file. If you want to, you can consolidate files every week or so. The Python library has functions that will let you get a list files in a directory. If you're on a Linux or UNIX system, you can use shell commands to append, copy or rename files. Have fun. Louis -- https://mail.python.org/mailman/listinfo/python-list
Re: Parallel(?) programming with python
On Mon, 8 Aug 2022 12:47:26 +0200, Andreas Croci declaimed the following: >tI would like to write a program, that reads from the network a fixed >amount of bytes and appends them to a list. This should happen once a >second. > Ignoring leap seconds, there are 86400 seconds in a day -- how many bytes are you planning to read each second? Maybe more important? Is this a constant network connection feeding you bytes (in which case the bytes available to read will be controlled by the sender -- which may be sending continuously and building up a back log if you don't empty the stream. Or are you planning to make a socket connection, read n-bytes, close socket? >Another part of the program should take the list, as it has been filled >so far, every 6 hours or so, and do some computations on the data (a FFT). > "6 hours or so"? That leaves one open to all sorts of variable timing. In either event, a 6 hour interval is more suited to a process started by a cron job (Linux/Unix) or Task Scheduler (Windows). Having a thread sleep for 6 hours means no safeguard if the parent process should die at some point (and if you are keeping the data in an internal list, you lose all that data too) >Every so often (say once a week) the list should be saved to a file, This REQUIRES the process to not fail at any point, nor any system restarts, etc. And (see prior paragraphs) how much data are you accumulating. In one week you have 604800 "reads". If you are reading 10 bytes each time, that makes 6MB of data you could potentially lose (on most modern hardware, 6MB is not a memory concern... Even 32-bit OS should be able to find space for 600MB of data...). Much better would be to write the file as you read each chunk. If the file is configured right, a separate process should be able to do read-only processing of the file even while the write process is on-going. OR, you attempt an open/write/close cycle which could be blocked while your FFT is processing -- you'd have to detect that situation and buffer the read data until you get a subsequent successful open, at which time you'd write all the backlog data. Or you could even have your FFT process copy the data to the long term file, while the write process just starts a new file when it finds itself blocked (and the FFT deletes the file it was reading). >shorthened in the front by so many items, and filled further with the >data coming fom the network. After the first saving of the whole list, >only the new part (the data that have come since the last saving) should >be appended to the file. A timestamp is in the data, so it's easy to say >what is new and what was already there. > Personally, this sounds more suited for something like SQLite3... Insert new records as the data is read, with timestamps. FFT process selects records based upon last data ID (that it processed previously) to end of new data. SQLite3 database IS the long-term storage. Might need a second table to hold the FFT process "last data ID" so on start up it can determine where to begin. >I'm not sure how to do this properly: can I write a part of a program >that keeps doing its job (appending data to the list once every second) >while another part computes something on the data of the same list, >ignoring the new data being written? > Well, if you really want ONE program -- you'll likely be looking at the Threading module (I don't do "async", and your task doesn't seem suited for async type call backs -- one thread that does the fetching of data, and a second that does the FFT processing, which will be sleeping most of the time). But either way, I'd suggest not keeping the data in an internal list; use some RDBM to keep the long-term data, accumulating it as you fetch it, and letting the FFT read from the database for its processing. -- Wulfraed Dennis Lee Bieber AF6VN wlfr...@ix.netcom.comhttp://wlfraed.microdiversity.freeddns.org/ -- https://mail.python.org/mailman/listinfo/python-list
RE: Parallel(?) programming with python
>> But, an easier and often >> better option for concurrent data access is use a (relational) >> database, then the appropriate transaction isolation levels >> when reading and/or writing. >> > > That would obviusly save some coding (but would introduce the need to > code the interaction with the database), but I'm not sure it would speed > up the thing. Would the RDBMS allow to read a table while something else > is writing to it? I doubt it and I'm not sure it doesn't flush the cache > before letting you read, which would include a normally slow disk access. SQLite for example allows only 1 write transaction at a time, but in WAL mode you can have as many read transactions as you want all going along at the same time as that 1 writer. It also allows you to specify how thorough it is in flushing data to disk, including not forcing a sync to disk at all and just leaving that to the OS to do on its own time. -- https://mail.python.org/mailman/listinfo/python-list
Re: Parallel(?) programming with python
Thank you for your reply. On 08.08.22 14:55, Julio Di Egidio wrote: Concurrent programming is quite difficult, plus you better think in terms of queues than shared data... Do you mean queues in the sense of deque (the data structure)? I ask because I can see the advantage there when I try to pop data from the front of it, but I don't see the sense of the following statement ("than shared data"). I mean, I called my structure a list, but it may well be a queue instead. That wouldn't prevent it from being shared in the idea I described: one function would still append data to it while the other is reading what is there up to a certain point and calculate the FFT of it. But, an easier and often better option for concurrent data access is use a (relational) database, then the appropriate transaction isolation levels when reading and/or writing. That would obviusly save some coding (but would introduce the need to code the interaction with the database), but I'm not sure it would speed up the thing. Would the RDBMS allow to read a table while something else is writing to it? I doubt it and I'm not sure it doesn't flush the cache before letting you read, which would include a normally slow disk access. Andreas Julio -- https://mail.python.org/mailman/listinfo/python-list
Re: Parallel(?) programming with python
Thanks for your reply. On 08.08.22 13:20, Stefan Ram wrote: Yes, but this is difficult. If you ask this question here, you might not be ready for this. Indeed. I haven't learned it yet myself, but nevertheless tried to write a small example program quickly, which might still contain errors because of my lack of education. import threading import time def write_to_list( list, lock, event ): for i in range( 10 ): lock.acquire() try: list.append( i ) finally: lock.release() event.set() time.sleep( 3 ) def read_from_list( list, lock, event ): while True: event.wait() print( "Waking up." ) event.clear() if len( list ): print( "List contains " + str( list[ 0 ]) + "." ) lock.acquire() try: del list[ 0 ] finally: lock.release() else: print( "List is empty." ) list = [] lock = threading.Lock() event = threading.Event() threading.Thread( target=write_to_list, args=[ list, lock, event ]).start() threading.Thread( target=read_from_list, args=[ list, lock, event ]).start() If I understand some things correctly, a "lock" would be something that, as the name says, locks, meaning prevents parts of the program from executing on the locked resource until ohter parts have finished doing their things and have released the lock. If this is correct, it's not exactly what I wanted, because this way "parts of the program" would not "keep doing their things, while other parts do other things on the same data". I'm in principle ok with locks, if it must be. What I fear is that the lock could last long and prevent the function that writes into the list from doing so every second. With an FFT on a list that contains a few bytes taken every second over one week time (604.800 samples), I believe it's very likely that the FFT function takes longer than a second to return. Then I would have to import all the data I have missed since the lock was aquired, which is doable, but I would like to avoid it if possible. In basketball, first you must learn to dribble and pass, before you can begin to shoot. Sure. With certain reservations, texts that can be considered to learn Python are: "Object-Oriented Programming in Python Documentation" - a PDF file, Introduction to Programming Using Python - Y Daniel Liang (2013), How to Think Like a Computer Scientist - Peter Wentworth (2012-08-12), The Coder's Apprentice - Pieter Spronck (2016-09-21), and Python Programming - John Zelle (2009). Thank you for the list. I an currently taking a Udemy course and at the same time reading the tutorials on python.org. I hope I will some day come to any of the books you suggest (I'm doing this only in my spare time and it will take forever). -- https://mail.python.org/mailman/listinfo/python-list