Re: [Tutor] Performance of Python loops vs multiple MySQL queries
On 12/21/05, Kent Johnson [EMAIL PROTECTED] wrote: Bernard Lebel wrote: Hello, Finally, after a year and a half of learning and messing around with Python, I'm writing THE code that made learn Python in the first place: a render farm client management software. I may have several questions regarding this, but for now I only have one. Congratulations Bernard, you have come a long way! [Bernard] Thanks a lot Kent, without and all the other gurus of this list, I wouldn't made it that far! The script I'm writing is the client script that runs on render nodes. It checks a database for a job to do, then based on some factors like pooling, priority, age and stuff, will determine what job it can work on. The client connects, performs evaluation of jobs, get a job, update the database, and starts the actual job. Now, there might be up to 80 clients connecting to the database at any moment to get a job to do. So understandably, I want the evaluation step to be as fast as possible. Right now, my script works this way: it selects a bunch of rows based on a few factors. Then when this is done, everything else is done in the script, that is, doesn't rely on a MySQL query. The script builds a variety of sorted lists and dictionaries, to ultimately end up with a single job. So I am wondering this: in case there are only a handful of jobs to evaluate, then I understand the script may run fast. But if it has to build lists and dictionary on thousands of jobs, then I'm affrait that it might become slower than simply running a series of queries to the database using various ordering schemes. Any advice on this? I haven't looked at your code closely so I will just offer some general advice. - Don't assume there is going to be a problem. [Bernard] Okay perhaps by problem I have not been very accurate. I meant sync problems. You see, when the script finds a job, it makes updates in the database, that is, it adds an entry into another table, and updates a certain field in the main jobs table. Other clients then testing if there is something to do rely on information that must be totally up-to-date. I just wanted to make sure I would not run into the case of multiple clients getting incorrect results because of not so up-to-date informations. Perhaps I should investigate table locks? Python dicts are very fast - they are the data structure underlying namespaces and they have been heavily optimized for years. [Bernard] Okay, good to know! - Measure! The only way to truly answer your question is to try it both ways and time it. [Bernard] You are right. My guess is that the dictionary approach will be faster. I assume the database is on a remote host since it is serving multiple clients. So at a minimum you will have the network round-trip delay for each query. - Your getJob() code seems to use some variables before they are assigned, such as tPoolIDs and aJob. Is this working code? Also it would be easier to read if you broke it up into smaller functions that each do a small piece of the problem. [Bernard] This is not working code. tPoolIDs is bound after the first query of the function, but aJob is an error of mine. Indeed I could break down the getJob() function into smaller functions. It's just that since the class is already having a fair amount of methods and this is becoming some long code, I wanted to keep everything into a single function. Also there was a consideration of performance. I have one question on the topic breaking code into small functions and performance. I have read somewhere that *any* call whatoever, that is, methods, functions and such, involve a performance cost. Is that right? In the case it is true, the performance deterioration would be proportional with the number of calls being made, so the larger the number of iterations and the more function calls, the slower the code would run, is that correct? Thanks Bernard Kent Here is a link to the current code (that build lists and dictionaries). Keep in mind it's in early alpha stage. The section to look for is the function getJob(), that starts at line 776. I have changed the extention to txt for security purposes. http://www.bernardlebel.com/scripts/nonxsi/farmclient_2.0_beta03.txt Thanks in advance Bernard ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Performance of Python loops vs multiple MySQL queries
Bernard Lebel wrote: On 12/21/05, Kent Johnson [EMAIL PROTECTED] wrote: - Don't assume there is going to be a problem. [Bernard] Okay perhaps by problem I have not been very accurate. I meant sync problems. You see, when the script finds a job, it makes updates in the database, that is, it adds an entry into another table, and updates a certain field in the main jobs table. Other clients then testing if there is something to do rely on information that must be totally up-to-date. I just wanted to make sure I would not run into the case of multiple clients getting incorrect results because of not so up-to-date informations. Perhaps I should investigate table locks? Yes, you should have a plan to avoid that kind of problem. Does your database support transactions? If so that is the easy way to ensure this - just put the whole getJob() into a transaction. - Your getJob() code seems to use some variables before they are assigned, such as tPoolIDs and aJob. Is this working code? Also it would be easier to read if you broke it up into smaller functions that each do a small piece of the problem. [Bernard] This is not working code. tPoolIDs is bound after the first query of the function, but aJob is an error of mine. Indeed I could break down the getJob() function into smaller functions. It's just that since the class is already having a fair amount of methods and this is becoming some long code, I wanted to keep everything into a single function. hmm, not a good choice. The code will be more readable and maintainable if it is broken up. If the class gets to big, think about making a helper class for some of the code. For example you might put the whole getJob() function into a class or module whose job it is to talk to the database and figure out the next job. One hallmark of a good design is that each class or module has a single responsibility. In your class you have several responsibilities that could possibly be broken out into separate modules - maintain the state of a single client - this is the job of the Client class - low-level database access - the query() function could be in a separate module - details of a single job might fit well in a Job class, this would greatly simplify Client.setJob() - getJob() might move to a utility module that just accesses the database and returns a Job object. Also there was a consideration of performance. I have one question on the topic breaking code into small functions and performance. I have read somewhere that *any* call whatoever, that is, methods, functions and such, involve a performance cost. Is that right? Yes. In the case it is true, the performance deterioration would be proportional with the number of calls being made, so the larger the number of iterations and the more function calls, the slower the code would run, is that correct? Yes, it is correct. Worrying about it at this point is gross premature optimization. It will only be a problem if you have many many function calls in a performance-critical loop. First make working code whose design clearly expresses the intent of the code. If it is too slow, then profile to find the hot spots and address them. Kent ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Performance of Python loops vs multiple MySQL queries
Thanks for all the advice Kent. Bernard On 12/22/05, Kent Johnson [EMAIL PROTECTED] wrote: Bernard Lebel wrote: On 12/21/05, Kent Johnson [EMAIL PROTECTED] wrote: - Don't assume there is going to be a problem. [Bernard] Okay perhaps by problem I have not been very accurate. I meant sync problems. You see, when the script finds a job, it makes updates in the database, that is, it adds an entry into another table, and updates a certain field in the main jobs table. Other clients then testing if there is something to do rely on information that must be totally up-to-date. I just wanted to make sure I would not run into the case of multiple clients getting incorrect results because of not so up-to-date informations. Perhaps I should investigate table locks? Yes, you should have a plan to avoid that kind of problem. Does your database support transactions? If so that is the easy way to ensure this - just put the whole getJob() into a transaction. - Your getJob() code seems to use some variables before they are assigned, such as tPoolIDs and aJob. Is this working code? Also it would be easier to read if you broke it up into smaller functions that each do a small piece of the problem. [Bernard] This is not working code. tPoolIDs is bound after the first query of the function, but aJob is an error of mine. Indeed I could break down the getJob() function into smaller functions. It's just that since the class is already having a fair amount of methods and this is becoming some long code, I wanted to keep everything into a single function. hmm, not a good choice. The code will be more readable and maintainable if it is broken up. If the class gets to big, think about making a helper class for some of the code. For example you might put the whole getJob() function into a class or module whose job it is to talk to the database and figure out the next job. One hallmark of a good design is that each class or module has a single responsibility. In your class you have several responsibilities that could possibly be broken out into separate modules - maintain the state of a single client - this is the job of the Client class - low-level database access - the query() function could be in a separate module - details of a single job might fit well in a Job class, this would greatly simplify Client.setJob() - getJob() might move to a utility module that just accesses the database and returns a Job object. Also there was a consideration of performance. I have one question on the topic breaking code into small functions and performance. I have read somewhere that *any* call whatoever, that is, methods, functions and such, involve a performance cost. Is that right? Yes. In the case it is true, the performance deterioration would be proportional with the number of calls being made, so the larger the number of iterations and the more function calls, the slower the code would run, is that correct? Yes, it is correct. Worrying about it at this point is gross premature optimization. It will only be a problem if you have many many function calls in a performance-critical loop. First make working code whose design clearly expresses the intent of the code. If it is too slow, then profile to find the hot spots and address them. Kent ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Performance of Python loops vs multiple MySQL queries
Also there was a consideration of performance. I have one question on the topic breaking code into small functions and performance. I have read somewhere that *any* call whatoever, that is, methods, functions and such, involve a performance cost. Is that right? Yes it is, but its not a huge cost. Unless you have a very time critical loop calling lots of functions calling functions then don't worry unduly. And if there is a problem use the profiler to tune the bits that need tuning. The benefits of breaking code into functions in terms of readability and maintenance far outweigh the performance hit in 99% of cases. Even in the 1% its better to get it working slowly first then optimise later, exactly where needed. proportional with the number of calls being made, so the larger the number of iterations and the more function calls, the slower the code would run, is that correct? More or less, but a badly designed loop, or list comprehension will cancel out any function call overhead very quickly. And disk reads or network access will be order of magnitude slower still. Worrying about low level performance tuning before you have a problem is usually a wasted effort. High level performance tuning - getting the design clean - is another matter. In all of the cases (bar one) where I've had major performamce issues to fix they have been resolved at the architecture level (minimising network or database accesses) not at the low level code. HTH, Alan G Author of the learn to program web tutor http://www.freenetpages.co.uk/hp/alan.gauld ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
[Tutor] Performance of Python loops vs multiple MySQL queries
Hello, Finally, after a year and a half of learning and messing around with Python, I'm writing THE code that made learn Python in the first place: a render farm client management software. I may have several questions regarding this, but for now I only have one. The script I'm writing is the client script that runs on render nodes. It checks a database for a job to do, then based on some factors like pooling, priority, age and stuff, will determine what job it can work on. The client connects, performs evaluation of jobs, get a job, update the database, and starts the actual job. Now, there might be up to 80 clients connecting to the database at any moment to get a job to do. So understandably, I want the evaluation step to be as fast as possible. Right now, my script works this way: it selects a bunch of rows based on a few factors. Then when this is done, everything else is done in the script, that is, doesn't rely on a MySQL query. The script builds a variety of sorted lists and dictionaries, to ultimately end up with a single job. So I am wondering this: in case there are only a handful of jobs to evaluate, then I understand the script may run fast. But if it has to build lists and dictionary on thousands of jobs, then I'm affrait that it might become slower than simply running a series of queries to the database using various ordering schemes. Any advice on this? Here is a link to the current code (that build lists and dictionaries). Keep in mind it's in early alpha stage. The section to look for is the function getJob(), that starts at line 776. I have changed the extention to txt for security purposes. http://www.bernardlebel.com/scripts/nonxsi/farmclient_2.0_beta03.txt Thanks in advance Bernard ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Performance of Python loops vs multiple MySQL queries
Bernard Lebel wrote: Hello, Finally, after a year and a half of learning and messing around with Python, I'm writing THE code that made learn Python in the first place: a render farm client management software. I may have several questions regarding this, but for now I only have one. Congratulations Bernard, you have come a long way! The script I'm writing is the client script that runs on render nodes. It checks a database for a job to do, then based on some factors like pooling, priority, age and stuff, will determine what job it can work on. The client connects, performs evaluation of jobs, get a job, update the database, and starts the actual job. Now, there might be up to 80 clients connecting to the database at any moment to get a job to do. So understandably, I want the evaluation step to be as fast as possible. Right now, my script works this way: it selects a bunch of rows based on a few factors. Then when this is done, everything else is done in the script, that is, doesn't rely on a MySQL query. The script builds a variety of sorted lists and dictionaries, to ultimately end up with a single job. So I am wondering this: in case there are only a handful of jobs to evaluate, then I understand the script may run fast. But if it has to build lists and dictionary on thousands of jobs, then I'm affrait that it might become slower than simply running a series of queries to the database using various ordering schemes. Any advice on this? I haven't looked at your code closely so I will just offer some general advice. - Don't assume there is going to be a problem. Python dicts are very fast - they are the data structure underlying namespaces and they have been heavily optimized for years. - Measure! The only way to truly answer your question is to try it both ways and time it. My guess is that the dictionary approach will be faster. I assume the database is on a remote host since it is serving multiple clients. So at a minimum you will have the network round-trip delay for each query. - Your getJob() code seems to use some variables before they are assigned, such as tPoolIDs and aJob. Is this working code? Also it would be easier to read if you broke it up into smaller functions that each do a small piece of the problem. Kent Here is a link to the current code (that build lists and dictionaries). Keep in mind it's in early alpha stage. The section to look for is the function getJob(), that starts at line 776. I have changed the extention to txt for security purposes. http://www.bernardlebel.com/scripts/nonxsi/farmclient_2.0_beta03.txt Thanks in advance Bernard ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor