[EMAIL PROTECTED] wrote: > I am running two functions in a row that do the same thing.
1. I see no functions here. You should set out a script like this: def main(): your_code_goes_here() if __name__ == '__main__': main() for two reasons (a) your code will be referring to locals instead of globals; this is faster, which might appeal to you (b) if somebody accidentally imports the script, nothing happens. 2. The two loops to which you refer do *not* do the same thing; see later. > One runs > in .14 seconds, the other 56. I'm confused. I wrote another version > of the program and couldn't get the slow behavior again, only the fast. > I'm not sure what is causing it. Can anyone figure it out? > > Here is my code (sorry it's a bit of a mess, but my cleaned up version > isn't slow!). Just skim to the bottom where the timing is. The first > time printed out is .14, the seond is 56.56. > > [snip] [following has extraneous blank lines and comments removed] > t1 = time.clock() > for j in r: > digest = m(jo(s[starts[j]+1:ends[j]])).hexdigest() > a[starts[j]+1:ends[j]] = digest > t2 = time.clock() > print "time is", round(t2-t1, 5) > > t1 = time.clock() > for i in r: > data[starts[i]:ends[i]] = \ > md5(join(s[starts[i]:ends[i]])).hexdigest() > t2 = time.clock() > print "second time is", round(t2-t1, 5) General questions: what platform? what version of Python? how large is the file? how much free memory do you have? how many passwords are there? what is the average length of a password? Ignoring the superficial-but-meaningless differences (i vs j, md5 [aarrgghh!!] vs m), jo vs join), these two loops differ in the following respects: (1) 'data' is a copy of 'a' (2) the first loop's body is effectively: digest = RHS; LHS = digest whereas the 2nd loop's body is: LHS = RHS (3) the first loop uses starts[j]+1 whereas the second loop uses starts[j] Item (1) may affect the timing if file is large compared with available memory -- could be 'a' has to be swapped out, and 'data' swapped in. Item (2) should make the 2nd loop very slightly faster, so we'll ignore that :-) Item (3) means you are not comparing like with like. It means that the 1st loop has less work to do. So this could make an observable difference for very short passwords -- but still nothing like 0.14 compared with 56. So, some more questions: The 56.56 is suspiciously precise -- you ran it a few times and it printed exactly 56.56 each time? Did you try putting the 2nd loop first [refer to Item (1) above]? Did you try putting in a switch so that your script runs either 1st loop or 2nd loop but not both? Note that each loop is making its target list expand in situ; this may after a while (like inside loop 2) cause the memory arena to become so fragmented that swapping will occur. This of course can vary wildly depending on the platform; Win95 used to be the most usual suspect but you're obviously not running on that. Some observations: (1) 's' is already a string, so ''.join(s[x:y]) is a slow way of doing s[x:y] (2) 'a' ends up as a list of one-byte strings, via a very circuitous process: a = array.array('c', s).tolist() A shorter route would be: a = list(s) However what's wrong with what you presumably tried out first i.e. a = array.array('c', s) ?? It doesn't need the final ''.join() before writing to disk, and it takes up less memory. NOTE: the array variety takes up 1 byte per character. The list variety takes up at least 4 bytes per character (on a machine where sizeof(PyObject *) == 4); to the extent that the file contains characters that are not interned (i.e. not [A-Za-z_] AFAIK), much more memory is required as a separate object will be created for each such character. Was it consistently slower? (3) If memory is your problem, you could rewrite the whole thing to simply do one write per password; that way you only need 1.x copy of the file contents in memory, not 2.x. Hoping some of this helps, John -- http://mail.python.org/mailman/listinfo/python-list