EuroPython 2019: Monday and Tuesday activities for main conference attendees
Although the main conference starts on Wednesday, July 10th, there’s already so much to do for attendees with the main conference ticket on Monday 8th and Tuesday 9th. Beginners’ Day and Sponsored Trainings -- You can come to the workshops and trainings venue at FHNW Campus Muttenz and: - pick up your conference badge - attend the Beginners’ Day workshop - attend the sponsored trainings If you want to attend other workshops and trainings, you’ll need a separate training ticket or combined ticket. Details on the Beginners’ Day workshop and the sponsored trainings will be announced separately. Catering on training days not included -- Since we have to budget carefully, the lunch and coffee breaks are not included, if you don’t have a training or combined ticket. To not keep you hungry, we have arranged that you can buy lunch coupons (price to be announced later). You can also go to the grocery store at the ground floor. For coffee breaks you can go to the ground floor, to the 12th floor of the FHNW building, or outside at the beach bar (nice weather only) and buy drinks. * https://ep2019.europython.eu/registration/buy-tickets/ * Dates and Venues EuroPython will be held from July 8-14 2019 in Basel, Switzerland, at the Congress Center Basel (BCC) for the main conference days (Wed-Fri) and the FHNW Muttenz for the workshops/trainings/sprints days (Mon-Tue, Sat-Sun). Tickets can be purchased on our registration page: https://ep2019.europython.eu/registration/buy-tickets/ For more details, please have a look at our website and the FAQ: https://ep2019.europython.eu/faq Help spread the word Please help us spread this message by sharing it on your social networks as widely as possible. Thank you ! Link to the blog post: https://blog.europython.eu/post/185080400427/europython-2019-monday-and-tuesday-activities-for Tweet: https://twitter.com/europython/status/1131470223205445632 Enjoy, -- EuroPython 2019 Team https://ep2019.europython.eu/ https://www.europython-society.org/ -- https://mail.python.org/mailman/listinfo/python-list
Re: PEP 594 cgi & cgitb removal
On 2019-05-23, Paul Rubin wrote: > dieter writes: >> Should "cgi" disappear from the standard library > > It's also a concern that cgi may be disappearing from web servers. Last > I heard, nginx didn't support it. That's part of why I still use > apache, or (local only) even CGIHTTPServer.py. I don't know what the > current hotness in httpd's is though. nginx is the current hotness. CGI has not been hotness since the mid 90s. -- https://mail.python.org/mailman/listinfo/python-list
Re: PEP 594 cgi & cgitb removal
On 22/05/2019 19:29, Terry Reedy wrote: One of the factors being considered in removal decisions is the absence of anyone willing to list themselves in the expert's list https://devguide.python.org/experts/ as a maintainer for a module. At the moment, 3 other people have objected to the removal of these modules. I suspect that at least 2 of you 4 are at least as technically qualified to be a core developer as I am. A request to become the maintainer of cgi and cgitb *might* affect the decision. A quick read-through of the modules (I am supposed to be working right now) suggests that maintaining them wouldn't be a massive effort. Definitely something to think about. -- Rhodri James *-* Kynesim Ltd -- https://mail.python.org/mailman/listinfo/python-list
Handling an connection error with Twython
I am using Twython to post updates on Twitter. Lately there is now and then a problem with my internet connection. I am using: posted = twitter.update_status(status = message, in_reply_to_status_id = message_id, trim_user = True) What would be the best way to catch a connection error and try it (for example) again maximum three times with a delay of one minute? -- Cecil Westerhof Senior Software Engineer LinkedIn: http://www.linkedin.com/in/cecilwesterhof -- https://mail.python.org/mailman/listinfo/python-list
How do you organize your virtual environments?
Perhaps the subject isn't quite correct, but here's what I'm after. Suppose you have five applications, each going through a series of dev, test and prod phases. I will assume without further explanation or justification, that the dev phase is wholly within the purview of the developers who gets to organize their environments as they see fit. The test and prod phases involve actual deployment though, and so require a defined virtual environment. The five applications need not be terribly closely related. How do you organize those environments? The two extremes would seem to be: * per-application virtual environments, so ten in all * a common virtual environment for all applications, so just test and prod, two in all My way of thinking about virtual environments has always leaned in the direction of a per-application setup, as that requires less coordination (particularly when deploying to production), but I'm willing to be convinced to move in fewer-environments-is-better direction. Here's a concrete question for people who favor fewer environments: Suppose application #4 requires an update to some package used by all five applications. What's your protocol for deployment to test and prod? Thanks, Skip -- https://mail.python.org/mailman/listinfo/python-list
Re: How do you organize your virtual environments?
On Fri, May 24, 2019 at 12:42 AM Skip Montanaro wrote: > My way of thinking about virtual environments has always leaned in the > direction of a per-application setup, as that requires less > coordination (particularly when deploying to production), but I'm > willing to be convinced to move in fewer-environments-is-better > direction. Here's a concrete question for people who favor fewer > environments: Suppose application #4 requires an update to some > package used by all five applications. What's your protocol for > deployment to test and prod? > If the applications are separate (such that you could logically and sanely install app #2 on one computer and app #5 on another), separate venvs for each. That does mean that a security or other crucial update will need to be applied to each, but it also keeps everything self-contained; app #1 has a requirements.txt that names only the packages that app #1 needs, and it's running in a venv that has only the packages that its requirements.txt lists. OTOH, if the applications are more closely related, such that you really can't take them separately, then they're really one application with multiple components. In that case, I'd have them all in a single venv (and probably a single git repository for the source code), with multiple entry points within that. That keeps common dependencies together, makes it easier to import modules from one into another, etc. Generally speaking, I would have a single git repo correspond to a single venv, linked via the requirements.txt file(s). Versioning of the source code is thus tied to the versioning of your dependencies and vice versa. ChrisA -- https://mail.python.org/mailman/listinfo/python-list
Re: Installation Problems with Python 3.7.3
I got it working. Thanks On Mon, May 20, 2019 at 3:17 PM Igor Korot wrote: > Hi, > > On Mon, May 20, 2019 at 1:53 PM Carolyn Evans wrote: > > > > I am having trouble with re-installing python 3.7.3. > > Why do you need to reinstall? > What seems to be the problem? > > Thank you. > > > > > I keep getting the following message: > > > > Modify > > Repair > > Remove > > > > I have tried all three numerous times and can not complete the setup > > > > I am working on a Windows 8 system, 64 bit OS, 4GB ram. > > > > How can this be fixed? > > > > Thanks > > > > C. Evans > > -- > > https://mail.python.org/mailman/listinfo/python-list > -- https://mail.python.org/mailman/listinfo/python-list
More CPUs doen't equal more speed
I've got a short script that loops though a number of files and processes them one at a time. I had a bit of time today and figured I'd rewrite the script to process the files 4 at a time by using 4 different instances of python. My basic loop is: for i in range(0, len(filelist), CPU_COUNT): for z in range(i, i+CPU_COUNT): doit( filelist[z]) With the function doit() calling up the program to do the lifting. Setting CPU_COUNT to 1 or 5 (I have 6 cores) makes no difference in total speed. I'm processing about 1200 files and my total duration is around 2 minutes. No matter how many cores I use the total is within a 5 second range. This is not a big deal ... but I really thought that throwing more processors at a problem was a wonderful thing :) I figure that the cost of loading the python libraries and my source file and writing it out are pretty much i/o bound, but that is just a guess. Maybe I need to set my sights on bigger, slower programs to see a difference :) -- Listen to my FREE CD at http://www.mellowood.ca/music/cedars Bob van der Poel ** Wynndel, British Columbia, CANADA ** EMAIL: b...@mellowood.ca WWW: http://www.mellowood.ca -- https://mail.python.org/mailman/listinfo/python-list
Re: More CPUs doen't equal more speed
On Fri, May 24, 2019 at 5:37 AM Bob van der Poel wrote: > > I've got a short script that loops though a number of files and processes > them one at a time. I had a bit of time today and figured I'd rewrite the > script to process the files 4 at a time by using 4 different instances of > python. My basic loop is: > > for i in range(0, len(filelist), CPU_COUNT): > for z in range(i, i+CPU_COUNT): > doit( filelist[z]) > > With the function doit() calling up the program to do the lifting. Setting > CPU_COUNT to 1 or 5 (I have 6 cores) makes no difference in total speed. > I'm processing about 1200 files and my total duration is around 2 minutes. > No matter how many cores I use the total is within a 5 second range. Where's the part of the code that actually runs them across multiple CPUs? Also, are you spending your time waiting on the disk, the CPU, IPC, or something else? ChrisA -- https://mail.python.org/mailman/listinfo/python-list
RE: More CPUs doen't equal more speed
You really need to give more info on what you're doing in doit() to know what's going on. Are you using subprocess, threading, multiprocessing, etc? Going off of what you've put there those nested for loops are being run in the 1 main thread. If doit() kicks off a program and doesn't wait for it to finish, then you're just instantly starting 1,200 versions of the external program. If doit() _does_ wait for it to finish then you're not doing anything different than 1,200 one-at-a-time calls with no parallelization. How are you making sure you have CPU_COUNT versions running, only that many running, and kicking off the next one once any of those completes? -Original Message- From: Python-list [mailto:python-list-bounces+david.raymond=tomtom@python.org] On Behalf Of Bob van der Poel Sent: Thursday, May 23, 2019 2:40 PM To: Python Subject: More CPUs doen't equal more speed I've got a short script that loops though a number of files and processes them one at a time. I had a bit of time today and figured I'd rewrite the script to process the files 4 at a time by using 4 different instances of python. My basic loop is: for i in range(0, len(filelist), CPU_COUNT): for z in range(i, i+CPU_COUNT): doit( filelist[z]) With the function doit() calling up the program to do the lifting. Setting CPU_COUNT to 1 or 5 (I have 6 cores) makes no difference in total speed. I'm processing about 1200 files and my total duration is around 2 minutes. No matter how many cores I use the total is within a 5 second range. This is not a big deal ... but I really thought that throwing more processors at a problem was a wonderful thing :) I figure that the cost of loading the python libraries and my source file and writing it out are pretty much i/o bound, but that is just a guess. Maybe I need to set my sights on bigger, slower programs to see a difference :) -- Listen to my FREE CD at http://www.mellowood.ca/music/cedars Bob van der Poel ** Wynndel, British Columbia, CANADA ** EMAIL: b...@mellowood.ca WWW: http://www.mellowood.ca -- https://mail.python.org/mailman/listinfo/python-list -- https://mail.python.org/mailman/listinfo/python-list
Re: PEP 594 cgi & cgitb removal
> nginx is the current hotness. CGI has not been hotness since the mid 90s. Serverless is the new hotness, and serverless is CGI. Technology is cyclical. -- https://mail.python.org/mailman/listinfo/python-list
RE: More CPUs doen't equal more speed
Bob, As others have noted, you have not made it clear how what you are doing is running "in parallel." I have a similar need where I have thousands of folders and need to do an analysis based on the contents of one at a time and have 8 cores available but the process may run for months if run linearly. The results are placed within the same folder so each part can run independently as long as shared resources like memory are not abused. Your need is conceptually simple. Break up the list of filenames into N batches of about equal length. A simple approach might be to open N terminal or command windows and in each one start a python interpreter by hand running the same program which gets one of the file lists and works on it. Some may finish way ahead of others, of course. If anything they do writes to shared resources such as log files, you may want to be careful. And there is no guarantee that several will not run on the same CPU. There is also plenty of overhead associated with running full processes. I am not suggesting this but it is fairly easy to do and may get you enough speedup. But since you only seem to need a few minutes, this won't be much. Quite a few other solutions involve using some form of threads running within a process perhaps using a queue manager. Python has multiple ways to do this. You would simply feed all the info needed (file names in your case) to a thread that manages a queue. It would allow up to N threads to be started and whenever one finishes, would be woken to start a replacement till done. Unless one such thread takes very long, they should all finish reasonably close to each other. Again, lots of details to make sure the threads do not conflict with each other. But, no guarantee which core they get unless you use an underlying package that manages that. So you might want to research available packages that do much of the work for you and provide some guarantees. An interesting question is how to set the chosen value of N. Just because you have N cores, you do not necessarily choose N. There are other things happening on the same machine with sometimes thousands of processes or threads in the queue even when the machine is sitting there effectively doing nothing. If you will also keep multiple things open (mailer, WORD, browsers, ...) you need some bandwidth so everything else gets enough attention. So is N-1 or N-2 better? Then again, if your task has a mix of CPU and I/O activities then it may make sense to run more than N in parallel even if several of them end up on the same CORE as they may interleave with each other and one make use of the CPU while the others are waiting on I/O or anything slower. I am curious to hear what you end up with. I will be reading to see if others can point to modules that already support something like this with you supplying just a function to use for each thread. I suggest you consider your architecture carefully. Sometimes it is better to run program A (in Python or anything else) that sets up what is needed including saving various data structures on disk needed for each individual run. Then you start the program that reads from the above and does the parallel computations and again writes out what is needed such as log entries, or data in a CSV. Finally, when it is all done, another program can gather in the various outputs and produce a consolidated set of info. That may be extra work but minimizes the chance of the processes interfering with each other. It also may allow you to run or re-run smaller batches or even to farm out the work to other machines. If you create a few thousand directories (or just files) with names like do0001 then you can copy them to another machine where you ask it to work on do0* and yet another on do1* and so on, using the same script. This makes more sense for my project which literally may take months or years if run exhaustively on something like a grid search trying huge numbers of combinations. Good luck. Avi -Original Message- From: Python-list On Behalf Of Bob van der Poel Sent: Thursday, May 23, 2019 2:40 PM To: Python Subject: More CPUs doen't equal more speed I've got a short script that loops though a number of files and processes them one at a time. I had a bit of time today and figured I'd rewrite the script to process the files 4 at a time by using 4 different instances of python. My basic loop is: for i in range(0, len(filelist), CPU_COUNT): for z in range(i, i+CPU_COUNT): doit( filelist[z]) With the function doit() calling up the program to do the lifting. Setting CPU_COUNT to 1 or 5 (I have 6 cores) makes no difference in total speed. I'm processing about 1200 files and my total duration is around 2 minutes. No matter how many cores I use the total is within a 5 second range. This is not a big deal ... but I really thought that throwing more processors at a problem was a wonderful thing :) I figure that the cost of loading the python libraries and
Re: Handling an connection error with Twython
Cecil Westerhof writes: > I am using Twython to post updates on Twitter. Lately there is now and > then a problem with my internet connection. I am using: > posted = twitter.update_status(status = message, >in_reply_to_status_id = message_id, >trim_user = True) > > What would be the best way to catch a connection error and try it (for > example) again maximum three times with a delay of one minute? At the moment I solved it with the following: max_tries = 3 current_try = 1 while True: try: posted = twitter.update_status(status = message, in_reply_to_status_id = message_id, trim_user = True) return posted['id'] except TwythonError as e: print('Failed on try: {0}'.format(current_try)) if not 'Temporary failure in name resolution' in e.msg: raise if current_try == max_tries: raise current_try += 1 time.sleep(60) Is this a good way to do it, or can it be improved on? When it goes OK I just return from the function. If it goes wrong for something else as failure in the name resolution I re-raise the exception. When the maximum tries are done I re-raise the exception. Otherwise I wait a minute to try it again. -- Cecil Westerhof Senior Software Engineer LinkedIn: http://www.linkedin.com/in/cecilwesterhof -- https://mail.python.org/mailman/listinfo/python-list
Re: More CPUs doen't equal more speed
On 2019-05-23 22:41, Avi Gross via Python-list wrote: Bob, As others have noted, you have not made it clear how what you are doing is running "in parallel." I have a similar need where I have thousands of folders and need to do an analysis based on the contents of one at a time and have 8 cores available but the process may run for months if run linearly. The results are placed within the same folder so each part can run independently as long as shared resources like memory are not abused. Your need is conceptually simple. Break up the list of filenames into N batches of about equal length. A simple approach might be to open N terminal or command windows and in each one start a python interpreter by hand running the same program which gets one of the file lists and works on it. Some may finish way ahead of others, of course. If anything they do writes to shared resources such as log files, you may want to be careful. And there is no guarantee that several will not run on the same CPU. There is also plenty of overhead associated with running full processes. I am not suggesting this but it is fairly easy to do and may get you enough speedup. But since you only seem to need a few minutes, this won't be much. Quite a few other solutions involve using some form of threads running within a process perhaps using a queue manager. Python has multiple ways to do this. You would simply feed all the info needed (file names in your case) to a thread that manages a queue. It would allow up to N threads to be started and whenever one finishes, would be woken to start a replacement till done. Unless one such thread takes very long, they should all finish reasonably close to each other. Again, lots of details to make sure the threads do not conflict with each other. But, no guarantee which core they get unless you use an underlying package that manages that. [snip] Because of the GIL, only 1 Python thread will actually be running at any time, so if it's processor-intensive, it's better to use multiprocessing. Of course, if it's already maxing out the disk, then using more cores won't make it faster. -- https://mail.python.org/mailman/listinfo/python-list
Re: Handling an connection error with Twython
On 2019-05-23 22:55, Cecil Westerhof wrote: Cecil Westerhof writes: I am using Twython to post updates on Twitter. Lately there is now and then a problem with my internet connection. I am using: posted = twitter.update_status(status = message, in_reply_to_status_id = message_id, trim_user = True) What would be the best way to catch a connection error and try it (for example) again maximum three times with a delay of one minute? At the moment I solved it with the following: max_tries = 3 current_try = 1 while True: try: posted = twitter.update_status(status = message, in_reply_to_status_id = message_id, trim_user = True) return posted['id'] except TwythonError as e: print('Failed on try: {0}'.format(current_try)) if not 'Temporary failure in name resolution' in e.msg: raise if current_try == max_tries: raise current_try += 1 time.sleep(60) Is this a good way to do it, or can it be improved on? When it goes OK I just return from the function. If it goes wrong for something else as failure in the name resolution I re-raise the exception. When the maximum tries are done I re-raise the exception. Otherwise I wait a minute to try it again. You have a 'while' loop with a counter; you can replace that with a 'for' loop. -- https://mail.python.org/mailman/listinfo/python-list
Re: More CPUs doen't equal more speed
Thanks all! The sound you are hearing is my head smacking against my hand! Or is it my hand against my head? Anyway, yes the problem is that I was naively using command.getoutput() which blocks until the command is finished. So, of course, only one process was being run at one time! Bad me! I guess I should be looking at subprocess.Popen(). Now, a more relevant question ... if I do it this way I then need to poll though a list of saved process IDs to see which have finished? Right? My initial thought is to batch them up in small groups (say CPU_COUNT-1) and wait for that batch to finish, etc. Would it be foolish to send send a large number (1200 in this case since this is the number of files) and let the OS worry about scheduling and have my program poll 1200 IDs? Someone mentioned the GIL. If I launch separate processes then I don't encounter this issue? Right? On Thu, May 23, 2019 at 4:24 PM MRAB wrote: > On 2019-05-23 22:41, Avi Gross via Python-list wrote: > > Bob, > > > > As others have noted, you have not made it clear how what you are doing > is > > running "in parallel." > > > > I have a similar need where I have thousands of folders and need to do an > > analysis based on the contents of one at a time and have 8 cores > available > > but the process may run for months if run linearly. The results are > placed > > within the same folder so each part can run independently as long as > shared > > resources like memory are not abused. > > > > Your need is conceptually simple. Break up the list of filenames into N > > batches of about equal length. A simple approach might be to open N > terminal > > or command windows and in each one start a python interpreter by hand > > running the same program which gets one of the file lists and works on > it. > > Some may finish way ahead of others, of course. If anything they do > writes > > to shared resources such as log files, you may want to be careful. And > there > > is no guarantee that several will not run on the same CPU. There is also > > plenty of overhead associated with running full processes. I am not > > suggesting this but it is fairly easy to do and may get you enough > speedup. > > But since you only seem to need a few minutes, this won't be much. > > > > Quite a few other solutions involve using some form of threads running > > within a process perhaps using a queue manager. Python has multiple ways > to > > do this. You would simply feed all the info needed (file names in your > case) > > to a thread that manages a queue. It would allow up to N threads to be > > started and whenever one finishes, would be woken to start a replacement > > till done. Unless one such thread takes very long, they should all finish > > reasonably close to each other. Again, lots of details to make sure the > > threads do not conflict with each other. But, no guarantee which core > they > > get unless you use an underlying package that manages that. > > > [snip] > > Because of the GIL, only 1 Python thread will actually be running at any > time, so if it's processor-intensive, it's better to use multiprocessing. > > Of course, if it's already maxing out the disk, then using more cores > won't make it faster. > -- > https://mail.python.org/mailman/listinfo/python-list > -- Listen to my FREE CD at http://www.mellowood.ca/music/cedars Bob van der Poel ** Wynndel, British Columbia, CANADA ** EMAIL: b...@mellowood.ca WWW: http://www.mellowood.ca -- https://mail.python.org/mailman/listinfo/python-list
Re: More CPUs doen't equal more speed
On Fri, May 24, 2019 at 10:07 AM Bob van der Poel wrote: > > Thanks all! The sound you are hearing is my head smacking against my hand! > Or is it my hand against my head? > > Anyway, yes the problem is that I was naively using command.getoutput() > which blocks until the command is finished. So, of course, only one process > was being run at one time! Bad me! > > I guess I should be looking at subprocess.Popen(). Now, a more relevant > question ... if I do it this way I then need to poll though a list of saved > process IDs to see which have finished? Right? My initial thought is to > batch them up in small groups (say CPU_COUNT-1) and wait for that batch to > finish, etc. Would it be foolish to send send a large number (1200 in this > case since this is the number of files) and let the OS worry about > scheduling and have my program poll 1200 IDs? That might create a lot of contention, resulting in poor performance. But depending on what your tasks saturate on, that might not matter all that much, and it _would_ be a simple and straight-forward technique. In fact, you could basically just write your code like this: for job in jobs: start_process() for process in processes: wait_for_process() Once they're all started, you just wait for the first one to finish. Then when that's finished, wait for the next, and the next, and the next. If the first process started is actually the slowest to run, all the others will be in the "done" state for a while, but that's not a big deal. > Someone mentioned the GIL. If I launch separate processes then I don't > encounter this issue? Right? The GIL is basically irrelevant here. Most of the work is being done in subprocesses, so your code is spending all its time waiting. What I'd recommend is a thread pool. Broadly speaking, it would look something like this: jobs = [...] def run_jobs(): while jobs: try: job = jobs.pop() except IndexError: break # deal with race start_subprocess() wait_for_subprocess() threads = [threading.Thread(target=run_jobs).start() for _ in THREAD_COUNT] for thread in threads: thread.join() Note that this has the same "start them all, then wait on them in order" model. In this case, though, there won't be 1200 threads - there'll be THREAD_COUNT of them (which may not be the same as your CPU count, but you could use that same figure as an initial estimate). Within each thread, the logic is also quite simple: take a job, do the job, repeat till you run out of jobs. The GIL ensures that "job = jobs.pop()" is a safe atomic operation that can't possibly corrupt internal state, and will always retrieve a unique job every time. The run_jobs function simply runs one job at a time, waiting for its completion. This kind of pattern keeps everything clean and simple, and is easy to tweak for performance. ChrisA -- https://mail.python.org/mailman/listinfo/python-list
Re: More CPUs doen't equal more speed
On 2019-05-24 01:22, Chris Angelico wrote: On Fri, May 24, 2019 at 10:07 AM Bob van der Poel wrote: Thanks all! The sound you are hearing is my head smacking against my hand! Or is it my hand against my head? Anyway, yes the problem is that I was naively using command.getoutput() which blocks until the command is finished. So, of course, only one process was being run at one time! Bad me! I guess I should be looking at subprocess.Popen(). Now, a more relevant question ... if I do it this way I then need to poll though a list of saved process IDs to see which have finished? Right? My initial thought is to batch them up in small groups (say CPU_COUNT-1) and wait for that batch to finish, etc. Would it be foolish to send send a large number (1200 in this case since this is the number of files) and let the OS worry about scheduling and have my program poll 1200 IDs? That might create a lot of contention, resulting in poor performance. But depending on what your tasks saturate on, that might not matter all that much, and it _would_ be a simple and straight-forward technique. In fact, you could basically just write your code like this: for job in jobs: start_process() for process in processes: wait_for_process() Once they're all started, you just wait for the first one to finish. Then when that's finished, wait for the next, and the next, and the next. If the first process started is actually the slowest to run, all the others will be in the "done" state for a while, but that's not a big deal. Someone mentioned the GIL. If I launch separate processes then I don't encounter this issue? Right? The GIL is basically irrelevant here. Most of the work is being done in subprocesses, so your code is spending all its time waiting. What I'd recommend is a thread pool. Broadly speaking, it would look something like this: jobs = [...] def run_jobs(): while jobs: try: job = jobs.pop() except IndexError: break # deal with race start_subprocess() wait_for_subprocess() threads = [threading.Thread(target=run_jobs).start() for _ in THREAD_COUNT] for thread in threads: thread.join() Note that this has the same "start them all, then wait on them in order" model. In this case, though, there won't be 1200 threads - there'll be THREAD_COUNT of them (which may not be the same as your CPU count, but you could use that same figure as an initial estimate). Within each thread, the logic is also quite simple: take a job, do the job, repeat till you run out of jobs. The GIL ensures that "job = jobs.pop()" is a safe atomic operation that can't possibly corrupt internal state, and will always retrieve a unique job every time. The run_jobs function simply runs one job at a time, waiting for its completion. This kind of pattern keeps everything clean and simple, and is easy to tweak for performance. Personally, I'd use a queue (from the 'queue' module), instead of a list, for the job pool. -- https://mail.python.org/mailman/listinfo/python-list
Re: More CPUs doen't equal more speed
On Fri, May 24, 2019 at 10:48 AM MRAB wrote: > > On 2019-05-24 01:22, Chris Angelico wrote: > > What I'd recommend is a thread pool. Broadly speaking, it would look > > something like this: > > > > jobs = [...] > > > > def run_jobs(): > > while jobs: > > try: job = jobs.pop() > > except IndexError: break # deal with race > > > Personally, I'd use a queue (from the 'queue' module), instead of a > list, for the job pool. It's not going to be materially different, since there's nothing adding more jobs part way. Either way works, and for the sake of a simple demo, I stuck to a core data type that any Python programmer will know, rather than potentially introducing another module :) But yes, the queue is a common choice here. ChrisA -- https://mail.python.org/mailman/listinfo/python-list
Re: More CPUs doen't equal more speed
On 23May2019 17:04, bvdp wrote: Anyway, yes the problem is that I was naively using command.getoutput() which blocks until the command is finished. So, of course, only one process was being run at one time! Bad me! I guess I should be looking at subprocess.Popen(). Now, a more relevant question ... if I do it this way I then need to poll though a list of saved process IDs to see which have finished? Right? My initial thought is to batch them up in small groups (say CPU_COUNT-1) and wait for that batch to finish, etc. Would it be foolish to send send a large number (1200 in this case since this is the number of files) and let the OS worry about scheduling and have my program poll 1200 IDs? Someone mentioned the GIL. If I launch separate processes then I don't encounter this issue? Right? Yes, but it becomes more painful to manage. If you're issues distinct separate commands anyway, dispatch many or all and then wait for them as a distinct step. If the commands start thrashing the rest of the OS resources (such as the disc) then you may want to do some capacity limitation, such as a counter or semaphore to limit how many go at once. Now, waiting for a subcommand can be done in a few ways. If you're then parent of all the processes you can keep a set() of the issued process ids and then call os.wait() repeatedly, which returns the pid of a completed child process. Check it against your set. If you need to act on the specific process, use a dict to map pids to some record of the subprocess. Alternatively, you can spawn a Python Thread for each subcommand, have the Thread dispatch the subcommand _and_ wait for it (i.e. keep your command.getoutput() method, but in a Thread). Main programme waits for the Threads by join()ing them. Because a thread waiting for something external (the subprocess) doesn't hold the GIL, other stuff can proceed. Basicly, if something is handed off the to OS and then Python waits for that (via an os.* call or a Popen.wait() call etc etc) then it will release the GIL while it is blocked, so other Threads _will_ get to work. This is all efficient, and there's any number of variations on the wait step depending what your needs are. The GIL isn't the disaster most people seem think. It can be a bottleneck for pure Python compute intensive work. But Python's interpreted - if you _really_ want performance the core compute will be compiled to something more efficient (eg a C extension) or handed to another process (transcode video in pure Python - argh! - but call the ffmpeg command as a subprocess - yes!); handed off, the GIL should be released, allowing other Python side work to continue. Cheers, Cameron Simpson -- https://mail.python.org/mailman/listinfo/python-list
Re: More CPUs doen't equal more speed
On 5/23/2019 2:39 PM, Bob van der Poel wrote: I've got a short script that loops though a number of files and processes them one at a time. I had a bit of time today and figured I'd rewrite the script to process the files 4 at a time by using 4 different instances of python. As others have said, you give no evidence that you are doing that. The python test suite runner has an argument to use multiple cores. For me, 'python -m test -j0' runs about 6 times faster than 'python -m test'. The speedup would be faster except than there is one test file that takes over two minutes, and may run at least a minute after all other processes have quit. (I have suggested than long running files be split to even out the load, but that has not gained favor yet.) For this use, more CPUs *does* equal more speed. -- Terry Jan Reedy -- https://mail.python.org/mailman/listinfo/python-list
Re: More CPUs doen't equal more speed
On 5/23/2019 2:39 PM, Bob van der Poel wrote: I'm processing about 1200 files and my total duration is around 2 minutes. A followup to my previous response, which has not shown up yet. The python test suite is over 400 files. You might look at how test.regrtest runs them in parallel when -j? is passed. -- Terry Jan Reedy -- https://mail.python.org/mailman/listinfo/python-list
Re: Handling an connection error with Twython
MRAB writes: > On 2019-05-23 22:55, Cecil Westerhof wrote: >> Cecil Westerhof writes: >> >>> I am using Twython to post updates on Twitter. Lately there is now and >>> then a problem with my internet connection. I am using: >>> posted = twitter.update_status(status = message, >>>in_reply_to_status_id = message_id, >>>trim_user = True) >>> >>> What would be the best way to catch a connection error and try it (for >>> example) again maximum three times with a delay of one minute? >> >> At the moment I solved it with the following: >> max_tries = 3 >> current_try = 1 >> while True: >> try: >> posted = twitter.update_status(status = message, >> in_reply_to_status_id = >> message_id, >> trim_user = True) >> return posted['id'] >> except TwythonError as e: >> print('Failed on try: {0}'.format(current_try)) >> if not 'Temporary failure in name resolution' in e.msg: >> raise >> if current_try == max_tries: >> raise >> current_try += 1 >> time.sleep(60) >> >> Is this a good way to do it, or can it be improved on? >> >> When it goes OK I just return from the function. >> If it goes wrong for something else as failure in the name resolution >> I re-raise the exception. >> When the maximum tries are done I re-raise the exception. >> Otherwise I wait a minute to try it again. >> > You have a 'while' loop with a counter; you can replace that with a > 'for' loop. I did not do that consciously, because I have to try until it is successful an I return, or I reached the max tries and re-raise the exception. With a for loop I could exit the loop and cannot re-raise the exception. -- Cecil Westerhof Senior Software Engineer LinkedIn: http://www.linkedin.com/in/cecilwesterhof -- https://mail.python.org/mailman/listinfo/python-list