Re: [Tutor] Performance of Python loops vs multiple MySQL queries

2005-12-22 Thread Bernard Lebel
On 12/21/05, Kent Johnson [EMAIL PROTECTED] wrote:
 Bernard Lebel wrote:
  Hello,
 
  Finally, after a year and a half of learning and messing around with
  Python, I'm writing THE code that made learn Python in the first
  place: a render farm client management software. I may have several
  questions regarding this, but for now I only have one.

 Congratulations Bernard, you have come a long way!

[Bernard] Thanks a lot Kent, without and all the other gurus of this
list, I wouldn't made it that far!

 
  The script I'm writing is the client script that runs on render nodes.
  It checks a database for a job to do, then based on some factors like
  pooling, priority, age and stuff, will determine what job it can work
  on. The client connects, performs evaluation of jobs, get a job,
  update the database, and starts the actual job.
 
  Now, there might be up to 80 clients connecting to the database at any
  moment to get a job to do. So understandably, I want the evaluation
  step to be as fast as possible.
 
  Right now, my script works this way: it selects a bunch of rows based
  on a few factors. Then when this is done, everything else is done in
  the script, that is, doesn't rely on a MySQL query. The script builds
  a variety of sorted lists and dictionaries, to ultimately end up with
  a single job.
 
  So I am wondering this: in case there are only a handful of jobs to
  evaluate, then I understand the script may run fast. But if it has to
  build lists and dictionary on thousands of jobs, then I'm affrait that
  it might become slower than simply running a series of queries to the
  database using various ordering schemes.
 
 
  Any advice on this?

 I haven't looked at your code closely so I will just offer some general 
 advice.

 - Don't assume there is going to be a problem.

[Bernard] Okay perhaps by problem I have not been very accurate. I
meant sync problems. You see, when the script finds a job, it makes
updates in the database, that is, it adds an entry into another table,
and updates a certain field in the main jobs table. Other clients then
testing if there is something to do rely on information that must be
totally up-to-date. I just wanted to make sure I would not run into
the case of multiple clients getting incorrect results because of not
so up-to-date informations. Perhaps I should investigate table locks?


 Python dicts are very fast - they are the data structure underlying 
 namespaces and they
 have been heavily optimized for years.

[Bernard] Okay, good to know!


 - Measure! The only way to truly answer your question is to try it both ways 
 and time it.

[Bernard] You are right.


 My guess is that the dictionary approach will be faster. I assume the 
 database is on a
 remote host since it is serving multiple clients. So at a minimum you will 
 have the
 network round-trip delay for each query.

 - Your getJob() code seems to use some variables before they are assigned, 
 such as
 tPoolIDs and aJob. Is this working code? Also it would be easier to read if 
 you broke it
 up into smaller functions that each do a small piece of the problem.

[Bernard] This is not working code. tPoolIDs is bound after the first
query of the function, but aJob is an error of mine.

Indeed I could break down the getJob() function into smaller
functions. It's just that since the class is already having a fair
amount of methods and this is becoming some long code, I wanted to
keep everything into a single function.

Also there was a consideration of performance. I have one question on
the topic breaking code into small functions and performance. I have
read somewhere that *any* call whatoever, that is, methods, functions
and such, involve a performance cost. Is that right?
In the case it is true, the performance deterioration would be
proportional with the number of calls being made, so the larger the
number of iterations and the more function calls, the slower the code
would run, is that correct?


Thanks
Bernard


 Kent
 
 
  Here is a link to the current code (that build lists and
  dictionaries). Keep in mind it's in early alpha stage. The section to
  look for is the function getJob(), that starts at line 776. I have
  changed the extention to txt for security purposes.
  http://www.bernardlebel.com/scripts/nonxsi/farmclient_2.0_beta03.txt
 
 
 
  Thanks in advance
  Bernard
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Performance of Python loops vs multiple MySQL queries

2005-12-22 Thread Kent Johnson
Bernard Lebel wrote:
 On 12/21/05, Kent Johnson [EMAIL PROTECTED] wrote:
- Don't assume there is going to be a problem.
 
 
 [Bernard] Okay perhaps by problem I have not been very accurate. I
 meant sync problems. You see, when the script finds a job, it makes
 updates in the database, that is, it adds an entry into another table,
 and updates a certain field in the main jobs table. Other clients then
 testing if there is something to do rely on information that must be
 totally up-to-date. I just wanted to make sure I would not run into
 the case of multiple clients getting incorrect results because of not
 so up-to-date informations. Perhaps I should investigate table locks?

Yes, you should have a plan to avoid that kind of problem. Does your database 
support 
transactions? If so that is the easy way to ensure this - just put the whole 
getJob() into 
a transaction.

- Your getJob() code seems to use some variables before they are assigned, 
such as
tPoolIDs and aJob. Is this working code? Also it would be easier to read if 
you broke it
up into smaller functions that each do a small piece of the problem.
 
 
 [Bernard] This is not working code. tPoolIDs is bound after the first
 query of the function, but aJob is an error of mine.
 
 Indeed I could break down the getJob() function into smaller
 functions. It's just that since the class is already having a fair
 amount of methods and this is becoming some long code, I wanted to
 keep everything into a single function.

hmm, not a good choice. The code will be more readable and maintainable if it 
is broken 
up. If the class gets to big, think about making a helper class for some of the 
code. For 
example you might put the whole getJob() function into a class or module whose 
job it is 
to talk to the database and figure out the next job.

One hallmark of a good design is that each class or module has a single 
responsibility. In 
your class you have several responsibilities that could possibly be broken out 
into 
separate modules
- maintain the state of a single client - this is the job of the Client class
- low-level database access - the query() function could be in a separate module
- details of a single job might fit well in a Job class, this would greatly 
simplify 
Client.setJob()
- getJob() might move to a utility module that just accesses the database and 
returns a 
Job object.

 
 Also there was a consideration of performance. I have one question on
 the topic breaking code into small functions and performance. I have
 read somewhere that *any* call whatoever, that is, methods, functions
 and such, involve a performance cost. Is that right?

Yes.

 In the case it is true, the performance deterioration would be
 proportional with the number of calls being made, so the larger the
 number of iterations and the more function calls, the slower the code
 would run, is that correct?

Yes, it is correct. Worrying about it at this point is gross premature 
optimization. It 
will only be a problem if you have many many function calls in a 
performance-critical loop.

First make working code whose design clearly expresses the intent of the code. 
If it is 
too slow, then profile to find the hot spots and address them.

Kent

___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Performance of Python loops vs multiple MySQL queries

2005-12-22 Thread Bernard Lebel
Thanks for all the advice Kent.


Bernard



On 12/22/05, Kent Johnson [EMAIL PROTECTED] wrote:
 Bernard Lebel wrote:
  On 12/21/05, Kent Johnson [EMAIL PROTECTED] wrote:
 - Don't assume there is going to be a problem.
 
 
  [Bernard] Okay perhaps by problem I have not been very accurate. I
  meant sync problems. You see, when the script finds a job, it makes
  updates in the database, that is, it adds an entry into another table,
  and updates a certain field in the main jobs table. Other clients then
  testing if there is something to do rely on information that must be
  totally up-to-date. I just wanted to make sure I would not run into
  the case of multiple clients getting incorrect results because of not
  so up-to-date informations. Perhaps I should investigate table locks?

 Yes, you should have a plan to avoid that kind of problem. Does your database 
 support
 transactions? If so that is the easy way to ensure this - just put the whole 
 getJob() into
 a transaction.

 - Your getJob() code seems to use some variables before they are assigned, 
 such as
 tPoolIDs and aJob. Is this working code? Also it would be easier to read if 
 you broke it
 up into smaller functions that each do a small piece of the problem.
 
 
  [Bernard] This is not working code. tPoolIDs is bound after the first
  query of the function, but aJob is an error of mine.
 
  Indeed I could break down the getJob() function into smaller
  functions. It's just that since the class is already having a fair
  amount of methods and this is becoming some long code, I wanted to
  keep everything into a single function.

 hmm, not a good choice. The code will be more readable and maintainable if it 
 is broken
 up. If the class gets to big, think about making a helper class for some of 
 the code. For
 example you might put the whole getJob() function into a class or module 
 whose job it is
 to talk to the database and figure out the next job.

 One hallmark of a good design is that each class or module has a single 
 responsibility. In
 your class you have several responsibilities that could possibly be broken 
 out into
 separate modules
 - maintain the state of a single client - this is the job of the Client class
 - low-level database access - the query() function could be in a separate 
 module
 - details of a single job might fit well in a Job class, this would greatly 
 simplify
 Client.setJob()
 - getJob() might move to a utility module that just accesses the database and 
 returns a
 Job object.

 
  Also there was a consideration of performance. I have one question on
  the topic breaking code into small functions and performance. I have
  read somewhere that *any* call whatoever, that is, methods, functions
  and such, involve a performance cost. Is that right?

 Yes.

  In the case it is true, the performance deterioration would be
  proportional with the number of calls being made, so the larger the
  number of iterations and the more function calls, the slower the code
  would run, is that correct?

 Yes, it is correct. Worrying about it at this point is gross premature 
 optimization. It
 will only be a problem if you have many many function calls in a 
 performance-critical loop.

 First make working code whose design clearly expresses the intent of the 
 code. If it is
 too slow, then profile to find the hot spots and address them.

 Kent

 ___
 Tutor maillist  -  Tutor@python.org
 http://mail.python.org/mailman/listinfo/tutor

___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Performance of Python loops vs multiple MySQL queries

2005-12-22 Thread Alan Gauld
 Also there was a consideration of performance. I have one question on
 the topic breaking code into small functions and performance. I have
 read somewhere that *any* call whatoever, that is, methods, functions
 and such, involve a performance cost. Is that right?

Yes it is, but its not a huge cost. Unless you have a very time critical 
loop
calling lots of functions calling functions then don't worry unduly. And if
there is a problem use the profiler to tune the bits that need tuning.

The benefits of breaking code into functions in terms of readability and
maintenance far outweigh the performance hit in 99% of cases. Even in
the 1% its better to get it working slowly first then optimise later, 
exactly
where needed.

 proportional with the number of calls being made, so the larger the
 number of iterations and the more function calls, the slower the code
 would run, is that correct?

More or less, but a badly designed loop, or list comprehension will cancel
out any function call overhead very quickly. And disk reads or network
access will be order of magnitude slower still. Worrying about low level
performance tuning before you have a problem is usually a wasted effort.
High level performance tuning - getting the design clean - is another 
matter.
In all of the cases (bar one) where I've had major performamce issues to
fix they have been resolved at the architecture level (minimising network
or database accesses) not at the low level code.

HTH,

Alan G
Author of the learn to program web tutor
http://www.freenetpages.co.uk/hp/alan.gauld


___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


[Tutor] Performance of Python loops vs multiple MySQL queries

2005-12-21 Thread Bernard Lebel
Hello,

Finally, after a year and a half of learning and messing around with
Python, I'm writing THE code that made learn Python in the first
place: a render farm client management software. I may have several
questions regarding this, but for now I only have one.

The script I'm writing is the client script that runs on render nodes.
It checks a database for a job to do, then based on some factors like
pooling, priority, age and stuff, will determine what job it can work
on. The client connects, performs evaluation of jobs, get a job,
update the database, and starts the actual job.

Now, there might be up to 80 clients connecting to the database at any
moment to get a job to do. So understandably, I want the evaluation
step to be as fast as possible.

Right now, my script works this way: it selects a bunch of rows based
on a few factors. Then when this is done, everything else is done in
the script, that is, doesn't rely on a MySQL query. The script builds
a variety of sorted lists and dictionaries, to ultimately end up with
a single job.

So I am wondering this: in case there are only a handful of jobs to
evaluate, then I understand the script may run fast. But if it has to
build lists and dictionary on thousands of jobs, then I'm affrait that
it might become slower than simply running a series of queries to the
database using various ordering schemes.


Any advice on this?


Here is a link to the current code (that build lists and
dictionaries). Keep in mind it's in early alpha stage. The section to
look for is the function getJob(), that starts at line 776. I have
changed the extention to txt for security purposes.
http://www.bernardlebel.com/scripts/nonxsi/farmclient_2.0_beta03.txt



Thanks in advance
Bernard
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Performance of Python loops vs multiple MySQL queries

2005-12-21 Thread Kent Johnson
Bernard Lebel wrote:
 Hello,
 
 Finally, after a year and a half of learning and messing around with
 Python, I'm writing THE code that made learn Python in the first
 place: a render farm client management software. I may have several
 questions regarding this, but for now I only have one.

Congratulations Bernard, you have come a long way!
 
 The script I'm writing is the client script that runs on render nodes.
 It checks a database for a job to do, then based on some factors like
 pooling, priority, age and stuff, will determine what job it can work
 on. The client connects, performs evaluation of jobs, get a job,
 update the database, and starts the actual job.
 
 Now, there might be up to 80 clients connecting to the database at any
 moment to get a job to do. So understandably, I want the evaluation
 step to be as fast as possible.
 
 Right now, my script works this way: it selects a bunch of rows based
 on a few factors. Then when this is done, everything else is done in
 the script, that is, doesn't rely on a MySQL query. The script builds
 a variety of sorted lists and dictionaries, to ultimately end up with
 a single job.
 
 So I am wondering this: in case there are only a handful of jobs to
 evaluate, then I understand the script may run fast. But if it has to
 build lists and dictionary on thousands of jobs, then I'm affrait that
 it might become slower than simply running a series of queries to the
 database using various ordering schemes.
 
 
 Any advice on this?

I haven't looked at your code closely so I will just offer some general advice.

- Don't assume there is going to be a problem. Python dicts are very fast - 
they are the 
data structure underlying namespaces and they have been heavily optimized for 
years.

- Measure! The only way to truly answer your question is to try it both ways 
and time it.

My guess is that the dictionary approach will be faster. I assume the database 
is on a 
remote host since it is serving multiple clients. So at a minimum you will have 
the 
network round-trip delay for each query.

- Your getJob() code seems to use some variables before they are assigned, such 
as 
tPoolIDs and aJob. Is this working code? Also it would be easier to read if you 
broke it 
up into smaller functions that each do a small piece of the problem.

Kent
 
 
 Here is a link to the current code (that build lists and
 dictionaries). Keep in mind it's in early alpha stage. The section to
 look for is the function getJob(), that starts at line 776. I have
 changed the extention to txt for security purposes.
 http://www.bernardlebel.com/scripts/nonxsi/farmclient_2.0_beta03.txt
 
 
 
 Thanks in advance
 Bernard
 ___
 Tutor maillist  -  Tutor@python.org
 http://mail.python.org/mailman/listinfo/tutor
 
 


___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor