EuroPython 2019: Monday and Tuesday activities for main conference attendees

2019-05-23 Thread M.-A. Lemburg
Although the main conference starts on Wednesday, July 10th, there’s
already so much to do for attendees with the main conference ticket on
Monday 8th and Tuesday 9th.

Beginners’ Day and Sponsored Trainings
--

You can come to the workshops and trainings venue at FHNW Campus Muttenz
and:

- pick up your conference badge
- attend the Beginners’ Day workshop
- attend the sponsored trainings

If you want to attend other workshops and trainings, you’ll need a
separate training ticket or combined ticket.

Details on the Beginners’ Day workshop and the sponsored trainings
will be announced separately.

Catering on training days not included
--

Since we have to budget carefully, the lunch and coffee breaks are not
included, if you don’t have a training or combined ticket.

To not keep you hungry, we have arranged that you can buy lunch
coupons (price to be announced later). You can also go to the grocery
store at the ground floor. For coffee breaks you can go to the ground
floor, to the 12th floor of the FHNW building, or outside at the beach
bar (nice weather only) and buy drinks.

  * https://ep2019.europython.eu/registration/buy-tickets/ *


Dates and Venues


EuroPython will be held from July 8-14 2019 in Basel, Switzerland, at
the Congress Center Basel (BCC) for the main conference days (Wed-Fri)
and the FHNW Muttenz for the workshops/trainings/sprints days
(Mon-Tue, Sat-Sun).

Tickets can be purchased on our registration page:

https://ep2019.europython.eu/registration/buy-tickets/

For more details, please have a look at our website and the FAQ:

https://ep2019.europython.eu/faq


Help spread the word


Please help us spread this message by sharing it on your social
networks as widely as possible. Thank you !

Link to the blog post:

https://blog.europython.eu/post/185080400427/europython-2019-monday-and-tuesday-activities-for

Tweet:

https://twitter.com/europython/status/1131470223205445632


Enjoy,
--
EuroPython 2019 Team
https://ep2019.europython.eu/
https://www.europython-society.org/

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: PEP 594 cgi & cgitb removal

2019-05-23 Thread Jon Ribbens via Python-list
On 2019-05-23, Paul Rubin  wrote:
> dieter  writes:
>> Should "cgi" disappear from the standard library
>
> It's also a concern that cgi may be disappearing from web servers.  Last
> I heard, nginx didn't support it.  That's part of why I still use
> apache, or (local only) even CGIHTTPServer.py.  I don't know what the
> current hotness in httpd's is though.

nginx is the current hotness. CGI has not been hotness since the mid 90s.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: PEP 594 cgi & cgitb removal

2019-05-23 Thread Rhodri James

On 22/05/2019 19:29, Terry Reedy wrote:
One of the factors being considered in removal decisions is the absence 
of anyone willing to list themselves in the expert's list

https://devguide.python.org/experts/
as a maintainer for a module.

At the moment, 3 other people have objected to the removal of these 
modules.  I suspect that at least 2 of you 4 are at least as technically 
qualified to be a core developer as I am.  A request to become the 
maintainer of cgi and cgitb *might* affect the decision.


A quick read-through of the modules (I am supposed to be working right 
now) suggests that maintaining them wouldn't be a massive effort. 
Definitely something to think about.


--
Rhodri James *-* Kynesim Ltd
--
https://mail.python.org/mailman/listinfo/python-list


Handling an connection error with Twython

2019-05-23 Thread Cecil Westerhof
I am using Twython to post updates on Twitter. Lately there is now and
then a problem with my internet connection. I am using:
posted = twitter.update_status(status = message,
   in_reply_to_status_id = message_id,
   trim_user = True)

What would be the best way to catch a connection error and try it (for
example) again maximum three times with a delay of one minute?

-- 
Cecil Westerhof
Senior Software Engineer
LinkedIn: http://www.linkedin.com/in/cecilwesterhof
-- 
https://mail.python.org/mailman/listinfo/python-list


How do you organize your virtual environments?

2019-05-23 Thread Skip Montanaro
Perhaps the subject isn't quite correct, but here's what I'm after.
Suppose you have five applications, each going through a series of
dev, test and prod phases. I will assume without further explanation
or justification, that the dev phase is wholly within the purview of
the developers who gets to organize their environments as they see
fit.

The test and prod phases involve actual deployment though, and so
require a defined virtual environment. The five applications need not
be terribly closely related. How do you organize those environments?
The two extremes would seem to be:

* per-application virtual environments, so ten in all

* a common virtual environment for all applications, so just test and
prod, two in all

My way of thinking about virtual environments has always leaned in the
direction of a per-application setup, as that requires less
coordination (particularly when deploying to production), but I'm
willing to be convinced to move in fewer-environments-is-better
direction. Here's a concrete question for people who favor fewer
environments: Suppose application #4 requires an update to some
package used by all five applications. What's your protocol for
deployment to test and prod?

Thanks,

Skip
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: How do you organize your virtual environments?

2019-05-23 Thread Chris Angelico
On Fri, May 24, 2019 at 12:42 AM Skip Montanaro
 wrote:
> My way of thinking about virtual environments has always leaned in the
> direction of a per-application setup, as that requires less
> coordination (particularly when deploying to production), but I'm
> willing to be convinced to move in fewer-environments-is-better
> direction. Here's a concrete question for people who favor fewer
> environments: Suppose application #4 requires an update to some
> package used by all five applications. What's your protocol for
> deployment to test and prod?
>

If the applications are separate (such that you could logically and
sanely install app #2 on one computer and app #5 on another), separate
venvs for each. That does mean that a security or other crucial update
will need to be applied to each, but it also keeps everything
self-contained; app #1 has a requirements.txt that names only the
packages that app #1 needs, and it's running in a venv that has only
the packages that its requirements.txt lists.

OTOH, if the applications are more closely related, such that you
really can't take them separately, then they're really one application
with multiple components. In that case, I'd have them all in a single
venv (and probably a single git repository for the source code), with
multiple entry points within that. That keeps common dependencies
together, makes it easier to import modules from one into another,
etc.

Generally speaking, I would have a single git repo correspond to a
single venv, linked via the requirements.txt file(s). Versioning of
the source code is thus tied to the versioning of your dependencies
and vice versa.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Installation Problems with Python 3.7.3

2019-05-23 Thread Carolyn Evans
I got it working.

Thanks

On Mon, May 20, 2019 at 3:17 PM Igor Korot  wrote:

> Hi,
>
> On Mon, May 20, 2019 at 1:53 PM Carolyn Evans  wrote:
> >
> > I am having trouble with re-installing python 3.7.3.
>
> Why do you need to reinstall?
> What seems to be the problem?
>
> Thank you.
>
> >
> > I keep getting the following message:
> >
> > Modify
> > Repair
> > Remove
> >
> > I have tried all three numerous times and can not complete the setup
> >
> > I am working on a Windows 8 system,  64 bit OS,  4GB ram.
> >
> > How can this be fixed?
> >
> > Thanks
> >
> > C. Evans
> > --
> > https://mail.python.org/mailman/listinfo/python-list
>
-- 
https://mail.python.org/mailman/listinfo/python-list


More CPUs doen't equal more speed

2019-05-23 Thread Bob van der Poel
I've got a short script that loops though a number of files and processes
them one at a time. I had a bit of time today and figured I'd rewrite the
script to process the files 4 at a time by using 4 different instances of
python. My basic loop is:

for i in range(0, len(filelist), CPU_COUNT):
for z in range(i, i+CPU_COUNT):
doit( filelist[z])

With the function doit() calling up the program to do the lifting. Setting
CPU_COUNT to 1 or 5 (I have 6 cores) makes no difference in total speed.
I'm processing about 1200 files and my total duration is around 2 minutes.
No matter how many cores I use the total is within a 5 second range.

This is not a big deal ... but I really thought that throwing more
processors at a problem was a wonderful thing :) I figure that the cost of
loading the python libraries and my source file and writing it out are
pretty much i/o bound, but that is just a guess.

Maybe I need to set my sights on bigger, slower programs to see a
difference :)

-- 

 Listen to my FREE CD at http://www.mellowood.ca/music/cedars 
Bob van der Poel ** Wynndel, British Columbia, CANADA **
EMAIL: b...@mellowood.ca
WWW:   http://www.mellowood.ca
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: More CPUs doen't equal more speed

2019-05-23 Thread Chris Angelico
On Fri, May 24, 2019 at 5:37 AM Bob van der Poel  wrote:
>
> I've got a short script that loops though a number of files and processes
> them one at a time. I had a bit of time today and figured I'd rewrite the
> script to process the files 4 at a time by using 4 different instances of
> python. My basic loop is:
>
> for i in range(0, len(filelist), CPU_COUNT):
> for z in range(i, i+CPU_COUNT):
> doit( filelist[z])
>
> With the function doit() calling up the program to do the lifting. Setting
> CPU_COUNT to 1 or 5 (I have 6 cores) makes no difference in total speed.
> I'm processing about 1200 files and my total duration is around 2 minutes.
> No matter how many cores I use the total is within a 5 second range.

Where's the part of the code that actually runs them across multiple
CPUs? Also, are you spending your time waiting on the disk, the CPU,
IPC, or something else?

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


RE: More CPUs doen't equal more speed

2019-05-23 Thread David Raymond
You really need to give more info on what you're doing in doit() to know what's 
going on. Are you using subprocess, threading, multiprocessing, etc?

Going off of what you've put there those nested for loops are being run in the 
1 main thread. If doit() kicks off a program and doesn't wait for it to finish, 
then you're just instantly starting 1,200 versions of the external program. If 
doit() _does_ wait for it to finish then you're not doing anything different 
than 1,200 one-at-a-time calls with no parallelization.

How are you making sure you have CPU_COUNT versions running, only that many 
running, and kicking off the next one once any of those completes?



-Original Message-
From: Python-list 
[mailto:python-list-bounces+david.raymond=tomtom@python.org] On Behalf Of 
Bob van der Poel
Sent: Thursday, May 23, 2019 2:40 PM
To: Python
Subject: More CPUs doen't equal more speed

I've got a short script that loops though a number of files and processes
them one at a time. I had a bit of time today and figured I'd rewrite the
script to process the files 4 at a time by using 4 different instances of
python. My basic loop is:

for i in range(0, len(filelist), CPU_COUNT):
for z in range(i, i+CPU_COUNT):
doit( filelist[z])

With the function doit() calling up the program to do the lifting. Setting
CPU_COUNT to 1 or 5 (I have 6 cores) makes no difference in total speed.
I'm processing about 1200 files and my total duration is around 2 minutes.
No matter how many cores I use the total is within a 5 second range.

This is not a big deal ... but I really thought that throwing more
processors at a problem was a wonderful thing :) I figure that the cost of
loading the python libraries and my source file and writing it out are
pretty much i/o bound, but that is just a guess.

Maybe I need to set my sights on bigger, slower programs to see a
difference :)

-- 

 Listen to my FREE CD at http://www.mellowood.ca/music/cedars 
Bob van der Poel ** Wynndel, British Columbia, CANADA **
EMAIL: b...@mellowood.ca
WWW:   http://www.mellowood.ca
-- 
https://mail.python.org/mailman/listinfo/python-list
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: PEP 594 cgi & cgitb removal

2019-05-23 Thread Gunnar Þór Magnússon
> nginx is the current hotness. CGI has not been hotness since the mid 90s.

Serverless is the new hotness, and serverless is CGI. Technology is cyclical.
-- 
https://mail.python.org/mailman/listinfo/python-list


RE: More CPUs doen't equal more speed

2019-05-23 Thread Avi Gross via Python-list
Bob,

As others have noted, you have not made it clear how what you are doing is
running "in parallel."

I have a similar need where I have thousands of folders and need to do an
analysis based on the contents of one at a time and have 8 cores available
but the process may run for months if run linearly. The results are placed
within the same folder so each part can run independently as long as shared
resources like memory are not abused.

Your need is conceptually simple. Break up the list of filenames into N
batches of about equal length. A simple approach might be to open N terminal
or command windows and in each one start a python interpreter by hand
running the same program which gets one of the file lists and works on it.
Some may finish way ahead of others, of course. If anything they do writes
to shared resources such as log files, you may want to be careful. And there
is no guarantee that several will not run on the same CPU. There is also
plenty of overhead associated with running full processes. I am not
suggesting this but it is fairly easy to do and may get you enough speedup.
But since you only seem to need a few minutes, this won't be much.

Quite a few other solutions involve using some form of threads running
within a process perhaps using a queue manager. Python has multiple ways to
do this. You would simply feed all the info needed (file names in your case)
to a thread that manages a queue. It would allow up to N threads to be
started and whenever one finishes, would be woken to start a replacement
till done. Unless one such thread takes very long, they should all finish
reasonably close to each other. Again, lots of details to make sure the
threads do not conflict with each other. But, no guarantee which core they
get unless you use an underlying package that manages that. 

So you might want to research available packages that do much of the work
for you and provide some guarantees.

An interesting question is how to set the chosen value of N. Just because
you have N cores, you do not necessarily choose N. There are other things
happening on the same machine with sometimes thousands of processes or
threads in the queue even when the machine is sitting there effectively
doing nothing. If you will also keep multiple things open (mailer, WORD,
browsers, ...) you need some bandwidth so everything else gets enough
attention. So is N-1 or N-2 better? Then again, if your task has a mix of
CPU and I/O activities then it may make sense to run more than N in parallel
even if several of them end up on the same CORE as they may interleave with
each other and one make use of the CPU while the others are waiting on I/O
or anything slower.

I am curious to hear what you end up with. I will be reading to see if
others can point to modules that already support something like this with
you supplying just a function to use for each thread.

I suggest you consider your architecture carefully. Sometimes it is better
to run program A (in Python or anything else) that sets up what is needed
including saving various data structures on disk needed for each individual
run. Then you start the program that reads from the above and does the
parallel computations and again writes out what is needed such as log
entries, or data in a CSV. Finally, when it is all done, another program can
gather in the various outputs and produce a consolidated set of info. That
may be extra work but minimizes the chance of the processes interfering with
each other. It also may allow you to run or re-run smaller batches or even
to farm out the work to other machines. If you create a few thousand
directories (or just files)  with names like do0001 then you can copy them
to another machine where you ask it to work on do0* and yet another on do1*
and so on, using the same script. This makes more sense for my project which
literally may take months or years if run exhaustively on something like a
grid search trying huge numbers of combinations.

Good luck.

Avi

-Original Message-
From: Python-list  On
Behalf Of Bob van der Poel
Sent: Thursday, May 23, 2019 2:40 PM
To: Python 
Subject: More CPUs doen't equal more speed

I've got a short script that loops though a number of files and processes
them one at a time. I had a bit of time today and figured I'd rewrite the
script to process the files 4 at a time by using 4 different instances of
python. My basic loop is:

for i in range(0, len(filelist), CPU_COUNT):
for z in range(i, i+CPU_COUNT):
doit( filelist[z])

With the function doit() calling up the program to do the lifting. Setting
CPU_COUNT to 1 or 5 (I have 6 cores) makes no difference in total speed.
I'm processing about 1200 files and my total duration is around 2 minutes.
No matter how many cores I use the total is within a 5 second range.

This is not a big deal ... but I really thought that throwing more
processors at a problem was a wonderful thing :) I figure that the cost of
loading the python libraries and 

Re: Handling an connection error with Twython

2019-05-23 Thread Cecil Westerhof
Cecil Westerhof  writes:

> I am using Twython to post updates on Twitter. Lately there is now and
> then a problem with my internet connection. I am using:
> posted = twitter.update_status(status = message,
>in_reply_to_status_id = message_id,
>trim_user = True)
>
> What would be the best way to catch a connection error and try it (for
> example) again maximum three times with a delay of one minute?

At the moment I solved it with the following:
max_tries   = 3
current_try = 1
while True:
try:
posted = twitter.update_status(status = message,
   in_reply_to_status_id = message_id,
   trim_user = True)
return posted['id']
except TwythonError as e:
print('Failed on try: {0}'.format(current_try))
if not 'Temporary failure in name resolution' in e.msg:
raise
if current_try == max_tries:
raise
current_try += 1
time.sleep(60)

Is this a good way to do it, or can it be improved on?

When it goes OK I just return from the function.
If it goes wrong for something else as failure in the name resolution
I re-raise the exception.
When the maximum tries are done I re-raise the exception.
Otherwise I wait a minute to try it again.

-- 
Cecil Westerhof
Senior Software Engineer
LinkedIn: http://www.linkedin.com/in/cecilwesterhof
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: More CPUs doen't equal more speed

2019-05-23 Thread MRAB

On 2019-05-23 22:41, Avi Gross via Python-list wrote:

Bob,

As others have noted, you have not made it clear how what you are doing is
running "in parallel."

I have a similar need where I have thousands of folders and need to do an
analysis based on the contents of one at a time and have 8 cores available
but the process may run for months if run linearly. The results are placed
within the same folder so each part can run independently as long as shared
resources like memory are not abused.

Your need is conceptually simple. Break up the list of filenames into N
batches of about equal length. A simple approach might be to open N terminal
or command windows and in each one start a python interpreter by hand
running the same program which gets one of the file lists and works on it.
Some may finish way ahead of others, of course. If anything they do writes
to shared resources such as log files, you may want to be careful. And there
is no guarantee that several will not run on the same CPU. There is also
plenty of overhead associated with running full processes. I am not
suggesting this but it is fairly easy to do and may get you enough speedup.
But since you only seem to need a few minutes, this won't be much.

Quite a few other solutions involve using some form of threads running
within a process perhaps using a queue manager. Python has multiple ways to
do this. You would simply feed all the info needed (file names in your case)
to a thread that manages a queue. It would allow up to N threads to be
started and whenever one finishes, would be woken to start a replacement
till done. Unless one such thread takes very long, they should all finish
reasonably close to each other. Again, lots of details to make sure the
threads do not conflict with each other. But, no guarantee which core they
get unless you use an underlying package that manages that.


[snip]

Because of the GIL, only 1 Python thread will actually be running at any 
time, so if it's processor-intensive, it's better to use multiprocessing.


Of course, if it's already maxing out the disk, then using more cores 
won't make it faster.

--
https://mail.python.org/mailman/listinfo/python-list


Re: Handling an connection error with Twython

2019-05-23 Thread MRAB

On 2019-05-23 22:55, Cecil Westerhof wrote:

Cecil Westerhof  writes:


I am using Twython to post updates on Twitter. Lately there is now and
then a problem with my internet connection. I am using:
posted = twitter.update_status(status = message,
   in_reply_to_status_id = message_id,
   trim_user = True)

What would be the best way to catch a connection error and try it (for
example) again maximum three times with a delay of one minute?


At the moment I solved it with the following:
 max_tries   = 3
 current_try = 1
 while True:
 try:
 posted = twitter.update_status(status = message,
in_reply_to_status_id = message_id,
trim_user = True)
 return posted['id']
 except TwythonError as e:
 print('Failed on try: {0}'.format(current_try))
 if not 'Temporary failure in name resolution' in e.msg:
 raise
 if current_try == max_tries:
 raise
 current_try += 1
 time.sleep(60)

Is this a good way to do it, or can it be improved on?

When it goes OK I just return from the function.
If it goes wrong for something else as failure in the name resolution
I re-raise the exception.
When the maximum tries are done I re-raise the exception.
Otherwise I wait a minute to try it again.

You have a 'while' loop with a counter; you can replace that with a 
'for' loop.

--
https://mail.python.org/mailman/listinfo/python-list


Re: More CPUs doen't equal more speed

2019-05-23 Thread Bob van der Poel
Thanks all! The sound you are hearing is my head smacking against my hand!
Or is it my hand against my head?

Anyway, yes the problem is that I was naively using command.getoutput()
which blocks until the command is finished. So, of course, only one process
was being run at one time! Bad me!

I guess I should be looking at subprocess.Popen(). Now, a more relevant
question ... if I do it this way I then need to poll though a list of saved
process IDs to see which have finished? Right? My initial thought is to
batch them up in small groups (say CPU_COUNT-1) and wait for that batch to
finish, etc. Would it be foolish to send send a large number (1200 in this
case since this is the number of files) and let the OS worry about
scheduling and have my program poll 1200 IDs?

Someone mentioned the GIL. If I launch separate processes then I don't
encounter this issue? Right?


On Thu, May 23, 2019 at 4:24 PM MRAB  wrote:

> On 2019-05-23 22:41, Avi Gross via Python-list wrote:
> > Bob,
> >
> > As others have noted, you have not made it clear how what you are doing
> is
> > running "in parallel."
> >
> > I have a similar need where I have thousands of folders and need to do an
> > analysis based on the contents of one at a time and have 8 cores
> available
> > but the process may run for months if run linearly. The results are
> placed
> > within the same folder so each part can run independently as long as
> shared
> > resources like memory are not abused.
> >
> > Your need is conceptually simple. Break up the list of filenames into N
> > batches of about equal length. A simple approach might be to open N
> terminal
> > or command windows and in each one start a python interpreter by hand
> > running the same program which gets one of the file lists and works on
> it.
> > Some may finish way ahead of others, of course. If anything they do
> writes
> > to shared resources such as log files, you may want to be careful. And
> there
> > is no guarantee that several will not run on the same CPU. There is also
> > plenty of overhead associated with running full processes. I am not
> > suggesting this but it is fairly easy to do and may get you enough
> speedup.
> > But since you only seem to need a few minutes, this won't be much.
> >
> > Quite a few other solutions involve using some form of threads running
> > within a process perhaps using a queue manager. Python has multiple ways
> to
> > do this. You would simply feed all the info needed (file names in your
> case)
> > to a thread that manages a queue. It would allow up to N threads to be
> > started and whenever one finishes, would be woken to start a replacement
> > till done. Unless one such thread takes very long, they should all finish
> > reasonably close to each other. Again, lots of details to make sure the
> > threads do not conflict with each other. But, no guarantee which core
> they
> > get unless you use an underlying package that manages that.
> >
> [snip]
>
> Because of the GIL, only 1 Python thread will actually be running at any
> time, so if it's processor-intensive, it's better to use multiprocessing.
>
> Of course, if it's already maxing out the disk, then using more cores
> won't make it faster.
> --
> https://mail.python.org/mailman/listinfo/python-list
>


-- 

 Listen to my FREE CD at http://www.mellowood.ca/music/cedars 
Bob van der Poel ** Wynndel, British Columbia, CANADA **
EMAIL: b...@mellowood.ca
WWW:   http://www.mellowood.ca
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: More CPUs doen't equal more speed

2019-05-23 Thread Chris Angelico
On Fri, May 24, 2019 at 10:07 AM Bob van der Poel  wrote:
>
> Thanks all! The sound you are hearing is my head smacking against my hand!
> Or is it my hand against my head?
>
> Anyway, yes the problem is that I was naively using command.getoutput()
> which blocks until the command is finished. So, of course, only one process
> was being run at one time! Bad me!
>
> I guess I should be looking at subprocess.Popen(). Now, a more relevant
> question ... if I do it this way I then need to poll though a list of saved
> process IDs to see which have finished? Right? My initial thought is to
> batch them up in small groups (say CPU_COUNT-1) and wait for that batch to
> finish, etc. Would it be foolish to send send a large number (1200 in this
> case since this is the number of files) and let the OS worry about
> scheduling and have my program poll 1200 IDs?

That might create a lot of contention, resulting in poor performance.
But depending on what your tasks saturate on, that might not matter
all that much, and it _would_ be a simple and straight-forward
technique. In fact, you could basically just write your code like
this:

for job in jobs:
start_process()
for process in processes:
wait_for_process()

Once they're all started, you just wait for the first one to finish.
Then when that's finished, wait for the next, and the next, and the
next. If the first process started is actually the slowest to run, all
the others will be in the "done" state for a while, but that's not a
big deal.

> Someone mentioned the GIL. If I launch separate processes then I don't
> encounter this issue? Right?

The GIL is basically irrelevant here. Most of the work is being done
in subprocesses, so your code is spending all its time waiting.

What I'd recommend is a thread pool. Broadly speaking, it would look
something like this:

jobs = [...]

def run_jobs():
while jobs:
try: job = jobs.pop()
except IndexError: break # deal with race
start_subprocess()
wait_for_subprocess()

threads = [threading.Thread(target=run_jobs).start()
for _ in THREAD_COUNT]
for thread in threads:
thread.join()

Note that this has the same "start them all, then wait on them in
order" model. In this case, though, there won't be 1200 threads -
there'll be THREAD_COUNT of them (which may not be the same as your
CPU count, but you could use that same figure as an initial estimate).

Within each thread, the logic is also quite simple: take a job, do the
job, repeat till you run out of jobs. The GIL ensures that "job =
jobs.pop()" is a safe atomic operation that can't possibly corrupt
internal state, and will always retrieve a unique job every time. The
run_jobs function simply runs one job at a time, waiting for its
completion.

This kind of pattern keeps everything clean and simple, and is easy to
tweak for performance.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: More CPUs doen't equal more speed

2019-05-23 Thread MRAB

On 2019-05-24 01:22, Chris Angelico wrote:

On Fri, May 24, 2019 at 10:07 AM Bob van der Poel  wrote:


Thanks all! The sound you are hearing is my head smacking against my hand!
Or is it my hand against my head?

Anyway, yes the problem is that I was naively using command.getoutput()
which blocks until the command is finished. So, of course, only one process
was being run at one time! Bad me!

I guess I should be looking at subprocess.Popen(). Now, a more relevant
question ... if I do it this way I then need to poll though a list of saved
process IDs to see which have finished? Right? My initial thought is to
batch them up in small groups (say CPU_COUNT-1) and wait for that batch to
finish, etc. Would it be foolish to send send a large number (1200 in this
case since this is the number of files) and let the OS worry about
scheduling and have my program poll 1200 IDs?


That might create a lot of contention, resulting in poor performance.
But depending on what your tasks saturate on, that might not matter
all that much, and it _would_ be a simple and straight-forward
technique. In fact, you could basically just write your code like
this:

for job in jobs:
 start_process()
for process in processes:
 wait_for_process()

Once they're all started, you just wait for the first one to finish.
Then when that's finished, wait for the next, and the next, and the
next. If the first process started is actually the slowest to run, all
the others will be in the "done" state for a while, but that's not a
big deal.


Someone mentioned the GIL. If I launch separate processes then I don't
encounter this issue? Right?


The GIL is basically irrelevant here. Most of the work is being done
in subprocesses, so your code is spending all its time waiting.

What I'd recommend is a thread pool. Broadly speaking, it would look
something like this:

jobs = [...]

def run_jobs():
 while jobs:
 try: job = jobs.pop()
 except IndexError: break # deal with race
 start_subprocess()
 wait_for_subprocess()

threads = [threading.Thread(target=run_jobs).start()
 for _ in THREAD_COUNT]
for thread in threads:
 thread.join()

Note that this has the same "start them all, then wait on them in
order" model. In this case, though, there won't be 1200 threads -
there'll be THREAD_COUNT of them (which may not be the same as your
CPU count, but you could use that same figure as an initial estimate).

Within each thread, the logic is also quite simple: take a job, do the
job, repeat till you run out of jobs. The GIL ensures that "job =
jobs.pop()" is a safe atomic operation that can't possibly corrupt
internal state, and will always retrieve a unique job every time. The
run_jobs function simply runs one job at a time, waiting for its
completion.

This kind of pattern keeps everything clean and simple, and is easy to
tweak for performance.

Personally, I'd use a queue (from the 'queue' module), instead of a 
list, for the job pool.

--
https://mail.python.org/mailman/listinfo/python-list


Re: More CPUs doen't equal more speed

2019-05-23 Thread Chris Angelico
On Fri, May 24, 2019 at 10:48 AM MRAB  wrote:
>
> On 2019-05-24 01:22, Chris Angelico wrote:
> > What I'd recommend is a thread pool. Broadly speaking, it would look
> > something like this:
> >
> > jobs = [...]
> >
> > def run_jobs():
> >  while jobs:
> >  try: job = jobs.pop()
> >  except IndexError: break # deal with race
> >
> Personally, I'd use a queue (from the 'queue' module), instead of a
> list, for the job pool.

It's not going to be materially different, since there's nothing
adding more jobs part way. Either way works, and for the sake of a
simple demo, I stuck to a core data type that any Python programmer
will know, rather than potentially introducing another module :) But
yes, the queue is a common choice here.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: More CPUs doen't equal more speed

2019-05-23 Thread Cameron Simpson

On 23May2019 17:04, bvdp  wrote:

Anyway, yes the problem is that I was naively using command.getoutput()
which blocks until the command is finished. So, of course, only one process
was being run at one time! Bad me!

I guess I should be looking at subprocess.Popen(). Now, a more relevant
question ... if I do it this way I then need to poll though a list of saved
process IDs to see which have finished? Right? My initial thought is to
batch them up in small groups (say CPU_COUNT-1) and wait for that batch to
finish, etc. Would it be foolish to send send a large number (1200 in this
case since this is the number of files) and let the OS worry about
scheduling and have my program poll 1200 IDs?

Someone mentioned the GIL. If I launch separate processes then I don't
encounter this issue? Right?


Yes, but it becomes more painful to manage. If you're issues distinct 
separate commands anyway, dispatch many or all and then wait for them as 
a distinct step.  If the commands start thrashing the rest of the OS 
resources (such as the disc) then you may want to do some capacity 
limitation, such as a counter or semaphore to limit how many go at once.


Now, waiting for a subcommand can be done in a few ways.

If you're then parent of all the processes you can keep a set() of the 
issued process ids and then call os.wait() repeatedly, which returns the 
pid of a completed child process. Check it against your set. If you need 
to act on the specific process, use a dict to map pids to some record of 
the subprocess.


Alternatively, you can spawn a Python Thread for each subcommand, have 
the Thread dispatch the subcommand _and_ wait for it (i.e. keep your 
command.getoutput() method, but in a Thread). Main programme waits for 
the Threads by join()ing them.


Because a thread waiting for something external (the subprocess) doesn't 
hold the GIL, other stuff can proceed. Basicly, if something is handed 
off the to OS and then Python waits for that (via an os.* call or a 
Popen.wait() call etc etc) then it will release the GIL while it is 
blocked, so other Threads _will_ get to work.


This is all efficient, and there's any number of variations on the wait 
step depending what your needs are.


The GIL isn't the disaster most people seem think. It can be a 
bottleneck for pure Python compute intensive work. But Python's 
interpreted - if you _really_ want performance the core compute will be 
compiled to something more efficient (eg a C extension) or handed to 
another process (transcode video in pure Python - argh! - but call the 
ffmpeg command as a subprocess - yes!); handed off, the GIL should be 
released, allowing other Python side work to continue.


Cheers,
Cameron Simpson 
--
https://mail.python.org/mailman/listinfo/python-list


Re: More CPUs doen't equal more speed

2019-05-23 Thread Terry Reedy

On 5/23/2019 2:39 PM, Bob van der Poel wrote:

I've got a short script that loops though a number of files and processes
them one at a time. I had a bit of time today and figured I'd rewrite the
script to process the files 4 at a time by using 4 different instances of
python.


As others have said, you give no evidence that you are doing that.

The python test suite runner has an argument to use multiple cores.  For 
me, 'python -m test -j0' runs about 6 times faster than 'python -m 
test'.  The speedup would be faster except than there is one test file 
that takes over two minutes, and may run at least a minute after all 
other processes have quit.  (I have suggested than long running files be 
split to even out the load, but that has not gained favor yet.)  For 
this use, more CPUs *does* equal more speed.


--
Terry Jan Reedy

--
https://mail.python.org/mailman/listinfo/python-list


Re: More CPUs doen't equal more speed

2019-05-23 Thread Terry Reedy

On 5/23/2019 2:39 PM, Bob van der Poel wrote:


I'm processing about 1200 files and my total duration is around 2 minutes.


A followup to my previous response, which has not shown up yet.  The 
python test suite is over 400 files.  You might look at how 
test.regrtest runs them in parallel when -j? is passed.


--
Terry Jan Reedy

--
https://mail.python.org/mailman/listinfo/python-list


Re: Handling an connection error with Twython

2019-05-23 Thread Cecil Westerhof
MRAB  writes:

> On 2019-05-23 22:55, Cecil Westerhof wrote:
>> Cecil Westerhof  writes:
>>
>>> I am using Twython to post updates on Twitter. Lately there is now and
>>> then a problem with my internet connection. I am using:
>>> posted = twitter.update_status(status = message,
>>>in_reply_to_status_id = message_id,
>>>trim_user = True)
>>>
>>> What would be the best way to catch a connection error and try it (for
>>> example) again maximum three times with a delay of one minute?
>>
>> At the moment I solved it with the following:
>>  max_tries   = 3
>>  current_try = 1
>>  while True:
>>  try:
>>  posted = twitter.update_status(status = message,
>> in_reply_to_status_id = 
>> message_id,
>> trim_user = True)
>>  return posted['id']
>>  except TwythonError as e:
>>  print('Failed on try: {0}'.format(current_try))
>>  if not 'Temporary failure in name resolution' in e.msg:
>>  raise
>>  if current_try == max_tries:
>>  raise
>>  current_try += 1
>>  time.sleep(60)
>>
>> Is this a good way to do it, or can it be improved on?
>>
>> When it goes OK I just return from the function.
>> If it goes wrong for something else as failure in the name resolution
>> I re-raise the exception.
>> When the maximum tries are done I re-raise the exception.
>> Otherwise I wait a minute to try it again.
>>
> You have a 'while' loop with a counter; you can replace that with a
> 'for' loop.

I did not do that consciously, because I have to try until it is
successful an I return, or I reached the max tries and re-raise the
exception. With a for loop I could exit the loop and cannot re-raise
the exception.

-- 
Cecil Westerhof
Senior Software Engineer
LinkedIn: http://www.linkedin.com/in/cecilwesterhof
-- 
https://mail.python.org/mailman/listinfo/python-list