Re: Long running process - how to speed up?
Shaozhong SHI wrote: > Can it be divided into several processes? I'd do it like this: from time import sleep from threading import Thread t = Thread(target=lambda: sleep(1)) t.run() # do your work here t.wait() -- https://mail.python.org/mailman/listinfo/python-list
Re: Long running process - how to speed up?
On Sun, 20 Feb 2022 18:05:33 +, Shaozhong SHI declaimed the following: >I am trying this approach, > >import multiprocessing as mp > >def my_func(x): > print(x**x) > Not much of a processing load there, especially for your small set of integers. I suspect you are consuming a significant time just having the OS create and tear down each process. In truth, for that example, I suspect plain threads will run faster because you don't have the overhead of setting up a process with new stdin/stdout/stderr. The exponential operation probably runs within one Python threading quantum, and the print() will trigger a thread switch to allow the next one to compute. >def main(): > pool = mp.Pool(mp.cpu_count()) > result = pool.map(my_func, [4,2,3]) -=-=- >>> import multiprocessing as mp >>> mp.cpu_count() 8 >>> -=-=- Really under-loaded on my Win10 system (hyperthreaded processors count as 2 CPUs, so a quad-core HT reports as 8 CPUs).Even an older Raspberry-Pi 3B (quad core) reports -=-=- md_admin@microdiversity:~$ python3 Python 3.7.3 (default, Jan 22 2021, 20:04:44) [GCC 8.3.0] on linux Type "help", "copyright", "credits" or "license" for more information. >>> import multiprocessing as mp >>> mp.cpu_count() 4 >>> exit() md_admin@microdiversity:~$ -=-=- -- Wulfraed Dennis Lee Bieber AF6VN wlfr...@ix.netcom.comhttp://wlfraed.microdiversity.freeddns.org/ -- https://mail.python.org/mailman/listinfo/python-list
Re: Long running process - how to speed up?
dAVId, I would like to assume that the processing needed is quite a bit more than calculating the square of each X. But as we are getting negligible information on anything useful, why continue interacting. I am trying to imagin a scenario with a million rows of sorts in a CSV (typically not used for a single column of data, as that tends not to use commas) where you read in the data and want to generate just the square of each number in what may be a disorganized order as threads are not guaranteed to do anything but interleave! Parallelism can be used in cases where the data is segmented properly and the algorithm adjusted to fit the needs. But the overhead can be substantial. For the trivial task mentioned, which I have to hope is not the actual task, you can get quite a decent speed by reading it into a numpy data structure and using a vectorized way to produce the squares and simply print that. The original question here is turning out to be mysterious as it began by asking how to speed up some slow process not yet explained and some mumbling about calling a sleep function. I note some parallel algorithms require a variant of that in that some parts must wait for other parts to complete and arrange to be dormant till signaled or schedule themselves to be woken regularly and check if things are ready for them to resume. Sleeping is a very common occurence in systems that are time-shared. -Original Message- From: Shaozhong SHI To: Mats Wichmann Cc: python-list@python.org Sent: Sun, Feb 20, 2022 1:05 pm Subject: Re: Long running process - how to speed up? On Sat, 19 Feb 2022 at 19:44, Mats Wichmann wrote: > On 2/19/22 05:09, Shaozhong SHI wrote: > > Can it be divided into several processes? > > Regards, > > David > > The answer is: "maybe". Multiprocessing doesn't happen for free, you > have to figure out how to divide the task up, requiring thought and > effort. We can't guess to what extent the problem you have is amenable > to multiprocessing. > > Google for "dataframe" and "multiprocessing" and you should get some > hits (in my somewhat limited experience in this area, people usually > load the csv data into Pandas before they get started working with it). > > > -- > https://mail.python.org/mailman/listinfo/python-list I am trying this approach, import multiprocessing as mp def my_func(x): print(x**x) def main(): pool = mp.Pool(mp.cpu_count()) result = pool.map(my_func, [4,2,3]) if __name__ == "__main__": main() I modified the script and set off a test run. However, I have no idea whether this approach will be faster than conventional approach. Any one has idea? Regards, David -- https://mail.python.org/mailman/listinfo/python-list -- https://mail.python.org/mailman/listinfo/python-list
Re: Long running process - how to speed up?
On Mon, 21 Feb 2022 at 05:07, Shaozhong SHI wrote: > However, I have no idea whether this approach will be faster than > conventional approach. > > Any one has idea? Try it. Find out. The only way to know is to measure. I can't see the sleep call though. You may need to post your actual code instead of trivial examples. ChrisA -- https://mail.python.org/mailman/listinfo/python-list
Re: Re: Long running process - how to speed up?
On Sat, 19 Feb 2022 at 18:51, Alan Gauld wrote: > On 19/02/2022 11:28, Shaozhong SHI wrote: > > > I have a cvs file of 932956 row > > That's not a lot in modern computing terms. > > > and have to have time.sleep in a Python > > script. > > Why? Is it a requirement by your customer? Your manager? > time.sleep() is not usually helpful if you want to do > things quickly. > > > It takes a long time to process. > > What is a "long time"? minutes? hours? days? weeks? > > It should take a million times as long as it takes to > process one row. But you have given no clue what you > are doing in each row. > - reading a database? > - reading from the network? or the internet? > - writing to a database? or the internet? > - performing highly complex math operations? > > Or perhaps the processing load is in analyzing the totality > of the data after reading it all? A very different type > of problem. But we just don't know. > > All of these factors will affect performance. > > > How can I speed up the processing? > > It all depends on the processing. > You could try profiling your code to see where the time is spent. > > > Can I do multi-processing? > > Of course. But there is no guarantee that will speed things > up if there is a bottleneck on a single resource somewhere. > But it might be possible to divide and conquer and get better > speed. It all depends on what you are doing. We can't tell. > > We cannot answer such a vague question with any specific > solution. > > -- > Alan G > Author of the Learn to Program web site > http://www.alan-g.me.uk/ > http://www.amazon.com/author/alan_gauld > Follow my photo-blog on Flickr at: > http://www.flickr.com/photos/alangauldphotos > > -- > https://mail.python.org/mailman/listinfo/python-list Do not know these answers yet. Now, it appeared to hang/stop at a point and does not move on. Regards, David -- https://mail.python.org/mailman/listinfo/python-list
Re: Long running process - how to speed up?
On Sat, 19 Feb 2022 at 19:44, Mats Wichmann wrote: > On 2/19/22 05:09, Shaozhong SHI wrote: > > Can it be divided into several processes? > > Regards, > > David > > The answer is: "maybe". Multiprocessing doesn't happen for free, you > have to figure out how to divide the task up, requiring thought and > effort. We can't guess to what extent the problem you have is amenable > to multiprocessing. > > Google for "dataframe" and "multiprocessing" and you should get some > hits (in my somewhat limited experience in this area, people usually > load the csv data into Pandas before they get started working with it). > > > -- > https://mail.python.org/mailman/listinfo/python-list I am trying this approach, import multiprocessing as mp def my_func(x): print(x**x) def main(): pool = mp.Pool(mp.cpu_count()) result = pool.map(my_func, [4,2,3]) if __name__ == "__main__": main() I modified the script and set off a test run. However, I have no idea whether this approach will be faster than conventional approach. Any one has idea? Regards, David -- https://mail.python.org/mailman/listinfo/python-list
Re: Long running process - how to speed up?
Indeed not a clear request. Timing is everything but there are times ... For many purposes, people may read in the entire CSV at a gulp into some data structure like a pandas DataFrame. The file is then closed and any processing done later does whatever you want. Of course you can easily read on line at a time in Python and parse it by comma and any other processing and act on one row at a time or in small batches so you never need huge amounts of memory. But using other methods to read in the entire set of data is often better optimized and faster, and being able to do some things with the data is faster if done in vectorized fashion using add-ons like numpy and pandas. We have no idea what is being used and none of this explains a need to use some form of sleep. Multi-processing helps only if you can make steps in the processing run in parallel without interfering with each other or making things happen out of order. Yes, you could read in the data and assign say 10,000 rows to a thread to process and then get more and assign, if done quite carefully. The results might need to be carefully combined and any shared variables might need locks and so on. Not necessarily worth it if the data is not too large and the calculations are small. And it remains unclear where you want to sleep or why. Parallelism can be important if the sleep is to wait for the user to respond to something while processing continues in the background. Is it possible that whatever you are calling to do processing has some kind of sleep within it and you may be calling it as often as per row? In that case, ask why it does that and can you avoid that? Yes, running in parallel may let you move forward but again, it has to be done carefully and having thousands of processes sleeping at the same time may be worse! I note badly defined questions get horrible answers. Mine included. -Original Message- From: Alan Gauld To: python-list@python.org Sent: Sat, Feb 19, 2022 7:33 am Subject: Fwd: Re: Long running process - how to speed up? On 19/02/2022 11:28, Shaozhong SHI wrote: > I have a cvs file of 932956 row That's not a lot in modern computing terms. > and have to have time.sleep in a Python > script. Why? Is it a requirement by your customer? Your manager? time.sleep() is not usually helpful if you want to do things quickly. > It takes a long time to process. What is a "long time"? minutes? hours? days? weeks? It should take a million times as long as it takes to process one row. But you have given no clue what you are doing in each row. - reading a database? - reading from the network? or the internet? - writing to a database? or the internet? - performing highly complex math operations? Or perhaps the processing load is in analyzing the totality of the data after reading it all? A very different type of problem. But we just don't know. All of these factors will affect performance. > How can I speed up the processing? It all depends on the processing. You could try profiling your code to see where the time is spent. > Can I do multi-processing? Of course. But there is no guarantee that will speed things up if there is a bottleneck on a single resource somewhere. But it might be possible to divide and conquer and get better speed. It all depends on what you are doing. We can't tell. We cannot answer such a vague question with any specific solution. -- Alan G Author of the Learn to Program web site http://www.alan-g.me.uk/ http://www.amazon.com/author/alan_gauld Follow my photo-blog on Flickr at: http://www.flickr.com/photos/alangauldphotos -- https://mail.python.org/mailman/listinfo/python-list -- https://mail.python.org/mailman/listinfo/python-list
Re: Long running process - how to speed up?
On 2/19/22 05:09, Shaozhong SHI wrote: > Can it be divided into several processes? > Regards, > David The answer is: "maybe". Multiprocessing doesn't happen for free, you have to figure out how to divide the task up, requiring thought and effort. We can't guess to what extent the problem you have is amenable to multiprocessing. Google for "dataframe" and "multiprocessing" and you should get some hits (in my somewhat limited experience in this area, people usually load the csv data into Pandas before they get started working with it). -- https://mail.python.org/mailman/listinfo/python-list
Re: Long running process - how to speed up?
On Sat, 19 Feb 2022 11:28:31 +, Shaozhong SHI declaimed the following: >I have a cvs file of 932956 row and have to have time.sleep in a Python >script. It takes a long time to process. > I'd echo the others... Unless you better explain WHY you have .sleep() (along with how often it is called, and what duration you sleep) the first recommendation would be to remove it. The most common justification for .sleep() is that one has CPU-BOUND processing and needs to force context switches to let other operations proceed more often than the system quantum time. Not normally a concern given Python's GIL and the presence of multi-core chips. How are you processing the (near) million rows of that CSV? If you are loading all of them into a large list you could be running Python list reallocations, or OS page swapping (though I wouldn't expect that on most modern systems -- maybe on a Raspberry-Pi/BeagleBone Black). Note: >>> import sys >>> sys.getsizeof("a") 50 >>> even a one-character string expands to 50 bytes. An EMPTY string takes up 51 bytes... Except for the empty string, that comes to about 49+<#chars> IF all characters fit an 8-bit encoding -- if any non 8-bit characters are in the string, the #chars needs to be multiplied by either 2 or 4 depending upon the widest representation needed. If you are doing read-one-record, process-one-record, repeat -- and have the .sleep() inside that loop... definitely remove the .sleep(). That loop is already I/O bound, the fastest you can obtain is determined by how rapidly the OS can transfer records from the file system to your program. -- Wulfraed Dennis Lee Bieber AF6VN wlfr...@ix.netcom.comhttp://wlfraed.microdiversity.freeddns.org/ -- https://mail.python.org/mailman/listinfo/python-list
Fwd: Re: Long running process - how to speed up?
On 19/02/2022 11:28, Shaozhong SHI wrote: > I have a cvs file of 932956 row That's not a lot in modern computing terms. > and have to have time.sleep in a Python > script. Why? Is it a requirement by your customer? Your manager? time.sleep() is not usually helpful if you want to do things quickly. > It takes a long time to process. What is a "long time"? minutes? hours? days? weeks? It should take a million times as long as it takes to process one row. But you have given no clue what you are doing in each row. - reading a database? - reading from the network? or the internet? - writing to a database? or the internet? - performing highly complex math operations? Or perhaps the processing load is in analyzing the totality of the data after reading it all? A very different type of problem. But we just don't know. All of these factors will affect performance. > How can I speed up the processing? It all depends on the processing. You could try profiling your code to see where the time is spent. > Can I do multi-processing? Of course. But there is no guarantee that will speed things up if there is a bottleneck on a single resource somewhere. But it might be possible to divide and conquer and get better speed. It all depends on what you are doing. We can't tell. We cannot answer such a vague question with any specific solution. -- Alan G Author of the Learn to Program web site http://www.alan-g.me.uk/ http://www.amazon.com/author/alan_gauld Follow my photo-blog on Flickr at: http://www.flickr.com/photos/alangauldphotos -- https://mail.python.org/mailman/listinfo/python-list
Re: Long running process - how to speed up?
On Sat, Feb 19, 2022 at 3:29 AM Shaozhong SHI wrote: > I have a cvs file of 932956 row and have to have time.sleep in a Python > script. It takes a long time to process. > > How can I speed up the processing? Can I do multi-processing? > How are you doing it right now? Are you using the csv module? You might be able to use the GNU "split" command as a prelude to using the csv module in combination with multiprocessing. GNU split comes with Linuxes, but I'm sure you can get it for Windows. MacOS comes with a rather less powerful "split" command, but it still might work for you. You also could try Pypy3. HTH. -- https://mail.python.org/mailman/listinfo/python-list
Re: Long running process - how to speed up?
Hi, How I understand your script starts another script and should wait until second one completes its job. Right? If so you have several options depend on how your first script is written. If your script is async then ... there is good asyncio option proc = await asyncio.create_subprocess_shell( f"{execcmd}{execargs}", stdin=None, stdout=None ) await proc.wait() In this way you can starn many workers and you don't neet to wait then in sync manner. Anyway, just please give more info about what problem you face. 19.02.2022 14:28, Shaozhong SHI пишет: I have a cvs file of 932956 row and have to have time.sleep in a Python script. It takes a long time to process. How can I speed up the processing? Can I do multi-processing? Regards, David -- https://mail.python.org/mailman/listinfo/python-list
Re: Long running process - how to speed up?
Can it be divided into several processes? Regards, David On Saturday, 19 February 2022, Chris Angelico wrote: > On Sat, 19 Feb 2022 at 22:59, Karsten Hilbert > wrote: > > > > > > I have a cvs file of 932956 row and have to have time.sleep in a > Python > > > > script. It takes a long time to process. > > > > > > > > How can I speed up the processing? Can I do multi-processing? > > > > > > > Remove the time.sleep()? > > > > He's attesting to only having "time.sleep" in there... > > > > I doubt removing that will help much ;-) > > I honestly don't understand the question, hence offering the > stupidly-obvious suggestion in the hope that it would result in a > better question. A million rows of CSV, on its own, isn't all that > much to process, so it must be the processing itself (of which we have > no information other than this reference to time.sleep) that takes all > the time. > > ChrisA > -- > https://mail.python.org/mailman/listinfo/python-list > -- https://mail.python.org/mailman/listinfo/python-list
Re: Long running process - how to speed up?
On Feb 19, 2022 12:28, Shaozhong SHI wrote: I have a cvs file of 932956 row and have to have time.sleep in a Python script. It takes a long time to process. How can I speed up the processing? Can I do multi-processing? Perhaps a dask df: https://docs.dask.org/en/latest/generated/dask.dataframe.read_csv.html -- https://mail.python.org/mailman/listinfo/python-list
Re: Re: Long running process - how to speed up?
On Sat, 19 Feb 2022 at 22:59, Karsten Hilbert wrote: > > > > I have a cvs file of 932956 row and have to have time.sleep in a Python > > > script. It takes a long time to process. > > > > > > How can I speed up the processing? Can I do multi-processing? > > > > > Remove the time.sleep()? > > He's attesting to only having "time.sleep" in there... > > I doubt removing that will help much ;-) I honestly don't understand the question, hence offering the stupidly-obvious suggestion in the hope that it would result in a better question. A million rows of CSV, on its own, isn't all that much to process, so it must be the processing itself (of which we have no information other than this reference to time.sleep) that takes all the time. ChrisA -- https://mail.python.org/mailman/listinfo/python-list
Aw: Re: Long running process - how to speed up?
> > I have a cvs file of 932956 row and have to have time.sleep in a Python > > script. It takes a long time to process. > > > > How can I speed up the processing? Can I do multi-processing? > > > Remove the time.sleep()? He's attesting to only having "time.sleep" in there... I doubt removing that will help much ;-) Karsten -- https://mail.python.org/mailman/listinfo/python-list
Re: Long running process - how to speed up?
On Sat, 19 Feb 2022 at 22:30, Shaozhong SHI wrote: > > I have a cvs file of 932956 row and have to have time.sleep in a Python > script. It takes a long time to process. > > How can I speed up the processing? Can I do multi-processing? > Remove the time.sleep()? ChrisA -- https://mail.python.org/mailman/listinfo/python-list
Long running process - how to speed up?
I have a cvs file of 932956 row and have to have time.sleep in a Python script. It takes a long time to process. How can I speed up the processing? Can I do multi-processing? Regards, David -- https://mail.python.org/mailman/listinfo/python-list