Re: [Tutor] question about run time
I have made scripts that work on many files (sometimes just some tens) and appears that filesystem structure caching in Linux is very efficient. That's why it runs very fast later. I've seen this in Slackware, Debian, and RH, so I guess it's just a linux/FS/disk thing. Try doing 'find' combined with md5sum or something disk-intensive, but unrelated with python to see if it exhibits the same problem. Hugo Ertl, John wrote: I have been using python for sometime...and occasionally I noticed significant delay before the code would run but unitl now I have been able to write it off to other things. Now I have a short script that I wrote to check some files and print out a few lines. I have noticed that usually the first time I fire it up in the morning or after a long time of not running it, it takes 10-15 seconds to run and the output to the screen is very slow...maybe 1 second per line. If I run it soon after that it runs and the output is on the screen in less then a second. I would think this has to do with compiling but I am not sure. Any ideas how to speed this up? I am running python 2.4 on a RHE3.0 cluster. Thanks, John Ertl ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] question about run time
Ertl, John wrote: I have been using python for sometime...and occasionally I noticed significant delay before the code would run but unitl now I have been able to write it off to other things. Now I have a short script that I wrote to check some files and print out a few lines. I have noticed that usually the first time I fire it up in the morning or after a long time of not running it, it takes 10-15 seconds to run and the output to the screen is very slow...maybe 1 second per line. If I run it soon after that it runs and the output is on the screen in less then a second. I would think this has to do with compiling but I am not sure. Any ideas how to speed this up? Compiling is not that slow. Are you files huge? Possibly they are in the disk cache after the first run. Kent ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] question about run time
Kent, The files are very small (a few hundred lines). Maybe it is a network issue? But then why is it always slow the first time in the morning? I don't know network stuff but that seams a bit strange. Thanks, John Ertl -Original Message- From: Kent Johnson [mailto:[EMAIL PROTECTED] Sent: Tuesday, May 02, 2006 12:06 PM To: Ertl, John Cc: tutor@python.org Subject:Re: [Tutor] question about run time Ertl, John wrote: I have been using python for sometime...and occasionally I noticed significant delay before the code would run but unitl now I have been able to write it off to other things. Now I have a short script that I wrote to check some files and print out a few lines. I have noticed that usually the first time I fire it up in the morning or after a long time of not running it, it takes 10-15 seconds to run and the output to the screen is very slow...maybe 1 second per line. If I run it soon after that it runs and the output is on the screen in less then a second. I would think this has to do with compiling but I am not sure. Any ideas how to speed this up? Compiling is not that slow. Are you files huge? Possibly they are in the disk cache after the first run. Kent ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] question about run time
Ertl, John wrote: Kent, The files are very small (a few hundred lines). Maybe it is a network issue? But then why is it always slow the first time in the morning? I don't know network stuff but that seams a bit strange. Maybe the network access is slow and the files are cached locally after the first access? I think Windows does this... Some things you might want to try: - Open one of the files in a text editor. Close it and open it again. Is it faster the second time? - Write a simple python program to open one of the files and read it. Is it faster the second time you run it? HTH, I'm guessing here. I have definitely seen scripts that run faster the second time and attribute it to file caching somewhere...though I haven't seen such a significant difference as you. Kent Thanks, John Ertl -Original Message- From: Kent Johnson [mailto:[EMAIL PROTECTED] Sent: Tuesday, May 02, 2006 12:06 PM To: Ertl, John Cc: tutor@python.org Subject: Re: [Tutor] question about run time Ertl, John wrote: I have been using python for sometime...and occasionally I noticed significant delay before the code would run but unitl now I have been able to write it off to other things. Now I have a short script that I wrote to check some files and print out a few lines. I have noticed that usually the first time I fire it up in the morning or after a long time of not running it, it takes 10-15 seconds to run and the output to the screen is very slow...maybe 1 second per line. If I run it soon after that it runs and the output is on the screen in less then a second. I would think this has to do with compiling but I am not sure. Any ideas how to speed this up? Compiling is not that slow. Are you files huge? Possibly they are in the disk cache after the first run. Kent ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] question about run time
Kent, I will check with the systems guys...and the Perl guys down the hall to see if they have the same problem. Thanks for the help. John Ertl -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Kent Johnson Sent: Tuesday, May 02, 2006 12:27 PM Cc: tutor@python.org Subject:Re: [Tutor] question about run time Ertl, John wrote: Kent, The files are very small (a few hundred lines). Maybe it is a network issue? But then why is it always slow the first time in the morning? I don't know network stuff but that seams a bit strange. Maybe the network access is slow and the files are cached locally after the first access? I think Windows does this... Some things you might want to try: - Open one of the files in a text editor. Close it and open it again. Is it faster the second time? - Write a simple python program to open one of the files and read it. Is it faster the second time you run it? HTH, I'm guessing here. I have definitely seen scripts that run faster the second time and attribute it to file caching somewhere...though I haven't seen such a significant difference as you. Kent Thanks, John Ertl -Original Message- From: Kent Johnson [mailto:[EMAIL PROTECTED] Sent: Tuesday, May 02, 2006 12:06 PM To: Ertl, John Cc: tutor@python.org Subject: Re: [Tutor] question about run time Ertl, John wrote: I have been using python for sometime...and occasionally I noticed significant delay before the code would run but unitl now I have been able to write it off to other things. Now I have a short script that I wrote to check some files and print out a few lines. I have noticed that usually the first time I fire it up in the morning or after a long time of not running it, it takes 10-15 seconds to run and the output to the screen is very slow...maybe 1 second per line. If I run it soon after that it runs and the output is on the screen in less then a second. I would think this has to do with compiling but I am not sure. Any ideas how to speed this up? Compiling is not that slow. Are you files huge? Possibly they are in the disk cache after the first run. Kent ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] question about run time
I have been using python for sometime...and occasionally I noticed significant delay before the code would run but unitl now I have been able to write it off to other things. Now I have a short script that I wrote to check some files and print out a few lines. I have noticed that usually the first time I fire it up in the morning or after a long time of not running it, it takes 10-15 seconds to run and the output to the screen is very slow...maybe 1 second per line. If I run it soon after that it runs and the output is on the screen in less then a second. I would think this has to do with compiling but I am not sure. Any ideas how to speed this up? I am running python 2.4 on a RHE3.0 cluster. ^^ Hi John, One thing to check is to see if the program is spending the majority of its time doing input and output (I/O Bound), or if it's really doing heavy computations (CPU bound). Knowing this might provide clues as to why you're seeing this kind of jerky performance. Also, you may want to check with your cluster folks on the possible effects the cluster's architecture may have on program startup. You're running on a slightly specialized platform, so I wouldn't be surprised if the cluster architecture is contributing something special. Finally, if you want to share that script for people to comment on, that might help. Good luck! ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] question about run time
) = myUse.extractUserData() myUse.add(memAmount,inodeAmount) print Your memory usage is %s KB and your inode usage is %s % (myUse.memTotal,myUse.inodeTotal) print Your memory limit is %s KB and your inode limit is %s % (myUse.memLimit, myUse.inodeLimit) if myUse.memLimit myUse.memTotal or myUse.inodeLimit myUse.inodeTotal: print You have excedded your limit myUse.sendEmail(%s memory/inode limit reached on gpfs % myUse.userName) -Original Message- From: Danny Yoo [mailto:[EMAIL PROTECTED] Sent: Tuesday, May 02, 2006 1:32 PM To: Ertl, John Cc: tutor@python.org Subject:Re: [Tutor] question about run time I have been using python for sometime...and occasionally I noticed significant delay before the code would run but unitl now I have been able to write it off to other things. Now I have a short script that I wrote to check some files and print out a few lines. I have noticed that usually the first time I fire it up in the morning or after a long time of not running it, it takes 10-15 seconds to run and the output to the screen is very slow...maybe 1 second per line. If I run it soon after that it runs and the output is on the screen in less then a second. I would think this has to do with compiling but I am not sure. Any ideas how to speed this up? I am running python 2.4 on a RHE3.0 cluster. ^^ Hi John, One thing to check is to see if the program is spending the majority of its time doing input and output (I/O Bound), or if it's really doing heavy computations (CPU bound). Knowing this might provide clues as to why you're seeing this kind of jerky performance. Also, you may want to check with your cluster folks on the possible effects the cluster's architecture may have on program startup. You're running on a slightly specialized platform, so I wouldn't be surprised if the cluster architecture is contributing something special. Finally, if you want to share that script for people to comment on, that might help. Good luck! ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] question about run time
Hi John, You can try something like the profiler, which will say where most of the program's time is being spent. We can find documentation on the Python profiler here: http://www.python.org/doc/lib/profile.html From a rough, low-level standpoint, there are tools like 'top' on Linux that let you see if a program is idling. Another low-level program --- strace --- allows one to watch for system calls, and can be very useful for understanding the low-level performance of a program. http://www.liacs.nl/~wichert/strace/ I'm using Solaris on one of my systems, and it comes with a marvelous tool called 'dtrace': http://www.sun.com/bigadmin/content/dtrace/ So there are good tools for measuring performance from both a high-level and a low-level perspective, and we often need to jump betweeen these levels to understand program performance. Let's do some code review. It checks some text files for a user name and collects memory and inode usage then adds them together and checks against a set limit...if the limit is reached it calls a mail script to send an email warning. From a cursory look at your program, I see one place which seems to be the tight inner loop of your program, in extractUserData(). # def extractUserData(self): print self.filePath fullList = open(self.filePath,r).readlines() for line in fullList: #print line, line singleList = line.split() try: if singleList[1] == self.userName: print line return singleList[2],singleList[3] except: pass return 0,0 # This function is called in another loop in your main program, and it itself does lots of looping, so let's spend some time looking at this: I believe this will be worthwhile. One small improvement you might want to make here is to avoid reading in the whole file at once. That is, rather than: lines = open(filename).readlines() for line in lines: ... it's often better to do: myfile = open(filename) for line in myfile: ... This is a relatively minor detail, and a low-level one. But a bigger payoff can occur if we take a higher-level look at what's happening. From a high level, the premise of the program is that there's a set of text files. For any particular user, some auxiliary information (inode and memory usage.) is being stored in these files. This is really crying out to be a database. *grin* I don't know how much freedom you have to change things around, but if you can use a database to centralize all this information, that will be a very good thing. If we really must keep things this way, I'd strongly recommend that we reconsider doing all the file opening/reading/scanning in the inner loop. I suspect that doing all that file opening and linear scanning in the inner loop is what strongly influences the program's performance. This is certainly I/O bound, and we want to get I/O out of tight loops like this. Instead, we can do some preprocessing work up front. If we read all the records at the very beginning and store those records in an in-memory dictionary, then extractUserData() can be a very simple lookup rather than a filesystem-wide hunt. It's the conceptual difference between: ## Pseudocode; in reality, we'd strip line endings too while True: word = raw_input(enter a word) for other in open('/usr/share/dict/words'): if other == word: print It's in break vs: # all_words = {} for word in open('/usr/share/dict/words'): all_words[word] = True while True: word = raw_input(enter a word) if word in all_words: print It's in # The former opens and scans the file for each input we get. The latter does a bit of work up front, but it makes up for it if we go through the inner loop more than once. If we were to do a scan for words just once, the former might be preferable since it might not need to read the whole file to give an answer. But if we're going to do the scan for several people, the latter is probably the way to go. Good luck to you! ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor