Re: [Tutor] question about run time

2006-05-03 Thread Hugo González Monteverde
I have made scripts that work on many files (sometimes just some tens) 
and appears that filesystem structure caching in Linux is very 
efficient.  That's why it runs very fast later.

I've seen this in Slackware, Debian, and RH, so I guess it's just a 
linux/FS/disk thing.

Try doing 'find' combined with md5sum or something disk-intensive, but 
unrelated with python to see if it exhibits the same problem.

Hugo

Ertl, John wrote:
 I have been using python for sometime...and occasionally I noticed
 significant delay before the code would run but unitl now I have been able
 to write it off to other things.  Now I have a short script that I wrote to
 check some files and print out a few lines.
 
 I have noticed that usually the first time I fire it up in the morning or
 after a long time of not running it, it takes 10-15 seconds to run and the
 output to the screen is very slow...maybe 1 second per line.  If I run it
 soon after that it runs and the output is on the screen in less then a
 second.  I would think this has to do with compiling but I am not sure.  Any
 ideas how to speed this up?  
 
 I am running python 2.4 on a RHE3.0 cluster. 
 
 Thanks,
 
 John Ertl
 ___
 Tutor maillist  -  Tutor@python.org
 http://mail.python.org/mailman/listinfo/tutor
 
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] question about run time

2006-05-02 Thread Kent Johnson
Ertl, John wrote:
 I have been using python for sometime...and occasionally I noticed
 significant delay before the code would run but unitl now I have been able
 to write it off to other things.  Now I have a short script that I wrote to
 check some files and print out a few lines.
 
 I have noticed that usually the first time I fire it up in the morning or
 after a long time of not running it, it takes 10-15 seconds to run and the
 output to the screen is very slow...maybe 1 second per line.  If I run it
 soon after that it runs and the output is on the screen in less then a
 second.  I would think this has to do with compiling but I am not sure.  Any
 ideas how to speed this up?  

Compiling is not that slow. Are you files huge? Possibly they are in the 
disk cache after the first run.

Kent


___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] question about run time

2006-05-02 Thread Ertl, John
Kent,

The files are very small (a few hundred lines).  Maybe it is a network
issue? But then why is it always slow the first time in the morning?  I
don't know network stuff but that seams a bit strange.

Thanks,

John Ertl 

 -Original Message-
From:   Kent Johnson [mailto:[EMAIL PROTECTED] 
Sent:   Tuesday, May 02, 2006 12:06 PM
To: Ertl, John
Cc: tutor@python.org
Subject:Re: [Tutor] question about run time

Ertl, John wrote:
 I have been using python for sometime...and occasionally I noticed
 significant delay before the code would run but unitl now I have been able
 to write it off to other things.  Now I have a short script that I wrote
to
 check some files and print out a few lines.
 
 I have noticed that usually the first time I fire it up in the morning or
 after a long time of not running it, it takes 10-15 seconds to run and the
 output to the screen is very slow...maybe 1 second per line.  If I run it
 soon after that it runs and the output is on the screen in less then a
 second.  I would think this has to do with compiling but I am not sure.
Any
 ideas how to speed this up?  

Compiling is not that slow. Are you files huge? Possibly they are in the 
disk cache after the first run.

Kent

___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] question about run time

2006-05-02 Thread Kent Johnson
Ertl, John wrote:
 Kent,
 
 The files are very small (a few hundred lines).  Maybe it is a network
 issue? But then why is it always slow the first time in the morning?  I
 don't know network stuff but that seams a bit strange.

Maybe the network access is slow and the files are cached locally after 
the first access? I think Windows does this...

Some things you might want to try:
- Open one of the files in a text editor. Close it and open it again. Is 
it faster the second time?
- Write a simple python program to open one of the files and read it. Is 
it faster the second time you run it?

HTH, I'm guessing here. I have definitely seen scripts that run faster 
the second time and attribute it to file caching somewhere...though I 
haven't seen such a significant difference as you.

Kent

 
 Thanks,
 
 John Ertl 
 
  -Original Message-
 From: Kent Johnson [mailto:[EMAIL PROTECTED] 
 Sent: Tuesday, May 02, 2006 12:06 PM
 To:   Ertl, John
 Cc:   tutor@python.org
 Subject:  Re: [Tutor] question about run time
 
 Ertl, John wrote:
 I have been using python for sometime...and occasionally I noticed
 significant delay before the code would run but unitl now I have been able
 to write it off to other things.  Now I have a short script that I wrote
 to
 check some files and print out a few lines.

 I have noticed that usually the first time I fire it up in the morning or
 after a long time of not running it, it takes 10-15 seconds to run and the
 output to the screen is very slow...maybe 1 second per line.  If I run it
 soon after that it runs and the output is on the screen in less then a
 second.  I would think this has to do with compiling but I am not sure.
 Any
 ideas how to speed this up?  
 
 Compiling is not that slow. Are you files huge? Possibly they are in the 
 disk cache after the first run.
 
 Kent
 
 
 


___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] question about run time

2006-05-02 Thread Ertl, John
Kent,

I will check with the systems guys...and the Perl guys down the hall to see
if they have the same problem.  

Thanks for the help.

John Ertl 


 -Original Message-
From:   [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]  On
Behalf Of Kent Johnson
Sent:   Tuesday, May 02, 2006 12:27 PM
Cc: tutor@python.org
Subject:Re: [Tutor] question about run time

Ertl, John wrote:
 Kent,
 
 The files are very small (a few hundred lines).  Maybe it is a network
 issue? But then why is it always slow the first time in the morning?  I
 don't know network stuff but that seams a bit strange.

Maybe the network access is slow and the files are cached locally after 
the first access? I think Windows does this...

Some things you might want to try:
- Open one of the files in a text editor. Close it and open it again. Is 
it faster the second time?
- Write a simple python program to open one of the files and read it. Is 
it faster the second time you run it?

HTH, I'm guessing here. I have definitely seen scripts that run faster 
the second time and attribute it to file caching somewhere...though I 
haven't seen such a significant difference as you.

Kent

 
 Thanks,
 
 John Ertl 
 
  -Original Message-
 From: Kent Johnson [mailto:[EMAIL PROTECTED] 
 Sent: Tuesday, May 02, 2006 12:06 PM
 To:   Ertl, John
 Cc:   tutor@python.org
 Subject:  Re: [Tutor] question about run time
 
 Ertl, John wrote:
 I have been using python for sometime...and occasionally I noticed
 significant delay before the code would run but unitl now I have been
able
 to write it off to other things.  Now I have a short script that I wrote
 to
 check some files and print out a few lines.

 I have noticed that usually the first time I fire it up in the morning or
 after a long time of not running it, it takes 10-15 seconds to run and
the
 output to the screen is very slow...maybe 1 second per line.  If I run it
 soon after that it runs and the output is on the screen in less then a
 second.  I would think this has to do with compiling but I am not sure.
 Any
 ideas how to speed this up?  
 
 Compiling is not that slow. Are you files huge? Possibly they are in the 
 disk cache after the first run.
 
 Kent
 
 
 


___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] question about run time

2006-05-02 Thread Danny Yoo


 I have been using python for sometime...and occasionally I noticed 
 significant delay before the code would run but unitl now I have been 
 able to write it off to other things.  Now I have a short script that I 
 wrote to check some files and print out a few lines.

 I have noticed that usually the first time I fire it up in the morning 
 or after a long time of not running it, it takes 10-15 seconds to run 
 and the output to the screen is very slow...maybe 1 second per line. 
 If I run it soon after that it runs and the output is on the screen in 
 less then a second.  I would think this has to do with compiling but I 
 am not sure.  Any ideas how to speed this up?

 I am running python 2.4 on a RHE3.0 cluster.
^^

Hi John,

One thing to check is to see if the program is spending the majority of 
its time doing input and output (I/O Bound), or if it's really doing heavy 
computations (CPU bound).  Knowing this might provide clues as to why 
you're seeing this kind of jerky performance.

Also, you may want to check with your cluster folks on the possible 
effects the cluster's architecture may have on program startup.  You're 
running on a slightly specialized platform, so I wouldn't be surprised if 
the cluster architecture is contributing something special.

Finally, if you want to share that script for people to comment on, that 
might help.


Good luck!
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] question about run time

2006-05-02 Thread Ertl, John
) = myUse.extractUserData()
myUse.add(memAmount,inodeAmount)

print Your memory usage is %s KB and your inode usage is %s %
(myUse.memTotal,myUse.inodeTotal)
print Your memory limit is %s KB and your inode limit is %s %
(myUse.memLimit, myUse.inodeLimit)

if myUse.memLimit  myUse.memTotal or myUse.inodeLimit 
myUse.inodeTotal:
print You have excedded your limit
myUse.sendEmail(%s memory/inode limit reached on gpfs  %
myUse.userName)


 -Original Message-
From:   Danny Yoo [mailto:[EMAIL PROTECTED] 
Sent:   Tuesday, May 02, 2006 1:32 PM
To: Ertl, John
Cc: tutor@python.org
Subject:Re: [Tutor] question about run time



 I have been using python for sometime...and occasionally I noticed 
 significant delay before the code would run but unitl now I have been 
 able to write it off to other things.  Now I have a short script that I 
 wrote to check some files and print out a few lines.

 I have noticed that usually the first time I fire it up in the morning 
 or after a long time of not running it, it takes 10-15 seconds to run 
 and the output to the screen is very slow...maybe 1 second per line. 
 If I run it soon after that it runs and the output is on the screen in 
 less then a second.  I would think this has to do with compiling but I 
 am not sure.  Any ideas how to speed this up?

 I am running python 2.4 on a RHE3.0 cluster.
^^

Hi John,

One thing to check is to see if the program is spending the majority of 
its time doing input and output (I/O Bound), or if it's really doing heavy 
computations (CPU bound).  Knowing this might provide clues as to why 
you're seeing this kind of jerky performance.

Also, you may want to check with your cluster folks on the possible 
effects the cluster's architecture may have on program startup.  You're 
running on a slightly specialized platform, so I wouldn't be surprised if 
the cluster architecture is contributing something special.

Finally, if you want to share that script for people to comment on, that 
might help.


Good luck!
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] question about run time

2006-05-02 Thread Danny Yoo


Hi John,

You can try something like the profiler, which will say where most of the 
program's time is being spent.  We can find documentation on the Python 
profiler here:

 http://www.python.org/doc/lib/profile.html

From a rough, low-level standpoint, there are tools like 'top' on Linux 
that let you see if a program is idling.  Another low-level program --- 
strace --- allows one to watch for system calls, and can be very useful 
for understanding the low-level performance of a program.

 http://www.liacs.nl/~wichert/strace/

I'm using Solaris on one of my systems, and it comes with a marvelous tool 
called 'dtrace':

 http://www.sun.com/bigadmin/content/dtrace/

So there are good tools for measuring performance from both a high-level 
and a low-level perspective, and we often need to jump betweeen these 
levels to understand program performance.


Let's do some code review.

 It checks some text files for a user name and collects memory and inode 
 usage then adds them together and checks against a set limit...if the 
 limit is reached it calls a mail script to send an email warning.

From a cursory look at your program, I see one place which seems to be the 
tight inner loop of your program, in extractUserData().

#
 def extractUserData(self):
 print self.filePath
 fullList = open(self.filePath,r).readlines()
 for line in fullList:
 #print line, line
 singleList = line.split()
 try:
 if singleList[1] == self.userName:
print line
return singleList[2],singleList[3]
 except:
 pass
 return 0,0
#

This function is called in another loop in your main program, and it 
itself does lots of looping, so let's spend some time looking at this: I 
believe this will be worthwhile.


One small improvement you might want to make here is to avoid reading in 
the whole file at once.  That is, rather than:

 lines = open(filename).readlines()
 for line in lines:
 ...

it's often better to do:

 myfile = open(filename)
 for line in myfile:
 ...

This is a relatively minor detail, and a low-level one.


But a bigger payoff can occur if we take a higher-level look at what's 
happening.  From a high level, the premise of the program is that there's 
a set of text files.  For any particular user, some auxiliary information 
(inode and memory usage.) is being stored in these files.

This is really crying out to be a database.  *grin* I don't know how much 
freedom you have to change things around, but if you can use a database to 
centralize all this information, that will be a very good thing.

If we really must keep things this way, I'd strongly recommend that we 
reconsider doing all the file opening/reading/scanning in the inner loop. 
I suspect that doing all that file opening and linear scanning in the 
inner loop is what strongly influences the program's performance.  This is 
certainly I/O bound, and we want to get I/O out of tight loops like this.

Instead, we can do some preprocessing work up front.  If we read all the 
records at the very beginning and store those records in an in-memory 
dictionary, then extractUserData() can be a very simple lookup rather than 
a filesystem-wide hunt.


It's the conceptual difference between:


## Pseudocode; in reality, we'd strip line endings too 

while True:
 word = raw_input(enter a word)
 for other in open('/usr/share/dict/words'):
 if other == word:
 print It's in
 break


vs:

#
all_words = {}
for word in open('/usr/share/dict/words'):
 all_words[word] = True
while True:
 word = raw_input(enter a word)
 if word in all_words:
 print It's in
#

The former opens and scans the file for each input we get.  The latter 
does a bit of work up front, but it makes up for it if we go through the 
inner loop more than once.

If we were to do a scan for words just once, the former might be 
preferable since it might not need to read the whole file to give an 
answer.  But if we're going to do the scan for several people, the latter 
is probably the way to go.

Good luck to you!
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor