Re: Newbie completely confused
Jeroen Hegeman schreef: Thanks for the comments, (First, I had to add timing code to ReadClasses: the code you posted doesn't include them, and only shows timings for ReadLines.) Your program uses quite a bit of memory. I guess it gets harder and harder to allocate the required amounts of memory. Well, I guess there could be something in that, but why is there a significant increase after the first time? And after that, single- trip time pretty much flattens out. No more obvious increases. Sorry, I have no idea. If I change this line in ReadClasses: built_classes[len(built_classes)] = HugeClass(long_line) to dummy = HugeClass(long_line) then both times the files are read and your data structures are built, but after each run the data structure is freed. The result is that both runs are equally fast. Isnt't the 'del LINES' supposed to achieve the same thing? And really, reading 30MB files should not be such a problem, right? (I'm also running with 1GB of RAM.) 'del LINES' deletes the lines that are read from the file, but not all of your data structures that you created out of them. Now, indeed, reading 30 MB files should not be a problem. And I am confident that just reading the data is not a problem. To make sure I created a simple test: import time input_files = [./test_file0.txt, ./test_file1.txt] total_start = time.time() data = {} for input_fn in input_files: file_start = time.time() f = file(input_fn, 'r') data[input_fn] = f.read() f.close() file_done = time.time() print '%s: %f to read %d bytes' % (input_fn, file_done - file_start, len(data)) total_done = time.time() print 'all done in %f' % (total_done - total_start) When I run that with test_file0.txt and test_file1.txt as you described (each 30 MB), I get this output: ./test_file0.txt: 0.26 to read 1 bytes ./test_file1.txt: 0.251000 to read 2 bytes all done in 0.521000 Therefore I think the problem is not in reading the data, but in processing it and creating the data structures. You read the files, but don't use the contents; instead you use long_line over and over. I suppose you do that because this is a test, not your actual code? Yeah ;-) (Do I notice a lack of trust in the responses I get? Should I not mention 'newbie'?) I didn't mean to attack you; it's just that the program reads 30 MB of data, twice, but doesn't do anything with it. It only uses the data that was stored in long_lines, and which never is replaced. That is very strange for real code, but as a test it can have it's uses. That's why I asked. Let's get a couple of things out of the way: - I do know about meaningful variable names and case-conventions, but ... First of all I also have to live with inherited code (I don't like people shouting in their code either), and secondly (all the itemx) most of these members normally _have_ descriptive names but I'm not supposed to copy-paste the original code to any newsgroups. Ok. - I also know that a plain 'return' in python does not do anything but I happen to like them. Same holds for the sys.exit() call. Ok. - The __init__ methods normally actually do something: they initialise some member variables to meaningful values (by calling the clear() method, actually). - The __clear__ method normally brings objects back into a well- defined 'empty' state. - The __del__ methods are actually needed in this case (well, in the _real_ code anyway). The python code loads a module written in C++ and some of the member variables actually point to C++ objects created dynamically, so one actually has to call their destructors before unbinding the python var. That sounds a bit weird to me; I would think such explicit memory management belongs in the C++ code instead of in the Python code, but I must admit that I know next to nothing about extending Python so I assume you are right. All right, thanks for the tips. I guess the issue itself is still open, though. I'm afraid so. Sorry I can't help. One thing that helped me in the past to speed up input is using memory mapped I/O instead of stream I/O. But that was in C++ on Windows; I don't know if the same applies to Python on Linux. -- The saddest aspect of life right now is that science gathers knowledge faster than society gathers wisdom. -- Isaac Asimov Roel Schroeven -- http://mail.python.org/mailman/listinfo/python-list
Re: Newbie completely confused
Roel Schroeven schreef: import time input_files = [./test_file0.txt, ./test_file1.txt] total_start = time.time() data = {} for input_fn in input_files: file_start = time.time() f = file(input_fn, 'r') data[input_fn] = f.read() f.close() file_done = time.time() print '%s: %f to read %d bytes' % (input_fn, file_done - file_start, len(data)) ... that should of course be len(data[input_fn]) ... total_done = time.time() print 'all done in %f' % (total_done - total_start) When I run that with test_file0.txt and test_file1.txt as you described (each 30 MB), I get this output: ./test_file0.txt: 0.26 to read 1 bytes ./test_file1.txt: 0.251000 to read 2 bytes all done in 0.521000 ... and then that becomes: ./test_file0.txt: 0.29 to read 3317 bytes ./test_file1.txt: 0.231000 to read 3317 bytes all done in 0.521000 -- The saddest aspect of life right now is that science gathers knowledge faster than society gathers wisdom. -- Isaac Asimov Roel Schroeven -- http://mail.python.org/mailman/listinfo/python-list
Re: Newbie completely confused
Thanks for the comments, (First, I had to add timing code to ReadClasses: the code you posted doesn't include them, and only shows timings for ReadLines.) Your program uses quite a bit of memory. I guess it gets harder and harder to allocate the required amounts of memory. Well, I guess there could be something in that, but why is there a significant increase after the first time? And after that, single- trip time pretty much flattens out. No more obvious increases. If I change this line in ReadClasses: built_classes[len(built_classes)] = HugeClass(long_line) to dummy = HugeClass(long_line) then both times the files are read and your data structures are built, but after each run the data structure is freed. The result is that both runs are equally fast. Isnt't the 'del LINES' supposed to achieve the same thing? And really, reading 30MB files should not be such a problem, right? (I'm also running with 1GB of RAM.) I'm not sure how to speed things up here... you're doing much processing on a lot of small chunks of data. I have a number of observations and possible improvements though, and some might even speed things up a bit. Cool thanks, let's go over them. You read the files, but don't use the contents; instead you use long_line over and over. I suppose you do that because this is a test, not your actual code? Yeah ;-) (Do I notice a lack of trust in the responses I get? Should I not mention 'newbie'?) Let's get a couple of things out of the way: - I do know about meaningful variable names and case-conventions, but ... First of all I also have to live with inherited code (I don't like people shouting in their code either), and secondly (all the itemx) most of these members normally _have_ descriptive names but I'm not supposed to copy-paste the original code to any newsgroups. - I also know that a plain 'return' in python does not do anything but I happen to like them. Same holds for the sys.exit() call. - The __init__ methods normally actually do something: they initialise some member variables to meaningful values (by calling the clear() method, actually). - The __clear__ method normally brings objects back into a well- defined 'empty' state. - The __del__ methods are actually needed in this case (well, in the _real_ code anyway). The python code loads a module written in C++ and some of the member variables actually point to C++ objects created dynamically, so one actually has to call their destructors before unbinding the python var. I tried to get things down to as small as possible, but when I found out that the size of the classes seems to contribute to the issue (removing enough member variables will bring you to a point where all of a sudden the speed increases a factor ten, there seems to be some breakpoint depending on the size of the classes) I could not simply remove all members but had to give them funky names. I kept the main structure of things, though, to see if that would solicit comments. (And it did...) In a number of cases, you use a dict like this: built_classes = {} for i in LINES: built_classes[len(built_classes)] = ... So you're using the indices 0, 1, 2, ... as the keys. That's not what dictionaries are made for; lists are much better for that: built_classes = [] for i in LINES: built_classes.append(...) Yeah, I inherited that part... Your readLines() function reads a whole file into memory. If you're working with large files, that's not such a good idea. It's better to load one line at a time into memory and work on that. I would even completely remove readLines() and restructure ReadClasses() like this: Actually, part of what I removed was the real reason why readLines() is there at all: it reads files in blocks of (at most) some_number lines, and keeps track of the line offset in the file. I kept this structure hoping that someone would point out something obvious like some internal buffer going out of scope or whatever. All right, thanks for the tips. I guess the issue itself is still open, though. Cheers, Jeroen Jeroen Hegeman jeroen DOT hegeman AT gmail DOT com WARNING: This message may contain classified information. Immediately burn this message after reading. -- http://mail.python.org/mailman/listinfo/python-list
Re: Newbie completely confused
Your code does NOT include any statements that could have produced the above line of output -- IOW, you have not posted the code that you actually ran. Oh my, I must have cleaned it up a bit too much, hoping that people would focus on the issue instead of the formatting of the output strings! Did you miss your morning coffee??? Your code is already needlessly monstrously large. Which I realised and apologised for beforehand. And Python 2.5.1 does what? Strike 3. Hmm, I must have missed where it said that you can only ask for help if you're using the latest version... In case you're wondering, 2.5.1 is not _really_ that wide-spread as most of the older versions. For handling the bit extraction stuff, either [snip] (b) do a loop over the bit positions Now that sounds more useful. I'll give that a try. Thanks, Jeroen Jeroen Hegeman jeroen DOT hegeman AT gmail DOT com WARNING: This message may contain classified information. Immediately burn this message after reading. -- http://mail.python.org/mailman/listinfo/python-list
Re: Newbie completely confused
Two comments, ... self.item3 = float(foo[c]); c+=1 self.item4 = float(foo[c]); c+=1 self.item5 = float(foo[c]); c+=1 self.item6 = float(foo[c]); c+=1 ... this here (and your code in general) is mind boggling and not in a good way, as for you original question, I don't think that reading in files of the size you mention can cause any substantial problems, I think the problem is somewhere else, you can run the code below to see that the read times are unaffected by the order of processing -- import timeit # make a big file NUM= 10**5 fp = open('bigfile.txt', 'wt') longline = ' ABC '* 60 + '\n' for count in xrange( NUM ): fp.write( longline ) fp.close() setup1 = def readLines(): data = [] for line in file('bigfile.txt'): data.append( line ) return data stmt1 = data = readLines() stmt2 = data = readLines() data = readLines() stmt3 = data = file('bigfile.txt').readlines() def run( setup, stmt, N=5 ): t = timeit.Timer(stmt=stmt, setup=setup) msec = 1000 * t.timeit(number=N)/N print %f msec/pass % msec if __name__ == '__main__': for stmt in (stmt1, stmt2, stmt3): run(setup=setup1, stmt=stmt) -- http://mail.python.org/mailman/listinfo/python-list
Re: Newbie completely confused
On Sep 25, 1:51 am, Jeroen Hegeman [EMAIL PROTECTED] wrote: Your code does NOT include any statements that could have produced the above line of output -- IOW, you have not posted the code that you actually ran. Oh my, I must have cleaned it up a bit too much, hoping that people would focus on the issue instead of the formatting of the output strings! Did you miss your morning coffee??? The difference was not a formatting difference; it was complete absence of a statement, raising the question of what other non-obvious differences there might be. You miss the point: if it is obvious that the posted code did not produce the posted output (common when newbies are thrashing around trying to solve a problem), some of the audience may not bother trying to help with the main issue -- they may attempt to help with side issues (as I did with the fugly code bloat) or just ignore you altogether. Your code is already needlessly monstrously large. Which I realised and apologised for beforehand. An apology does not change the fact that the code was needlesly large (AND needed careful post-linefolding reformatting just to make it runnable) and so some may not have bothered to read it. And Python 2.5.1 does what? Strike 3. Hmm, I must have missed where it said that you can only ask for help if you're using the latest version... You missed the point again: that your problem may be fixed in a later version. In case you're wondering, 2.5.1 is not _really_ that wide-spread as most of the older versions. I wasn't wondering. I know. I maintain a package (xlrd) which works on Python 2.5 all the way back to 2.1. It occasionally has possibly similar second iteration goes funny issues (e.g. when reading 120MB Excel spreadsheet files one after the other). You mention that removing some attributes from a class may make your code stop exhibiting cliff-face behaviour. If you can produce two versions of your code that actually demonstrate the abrupt change, I'd be quite interested in digging into it, to our possible mutual benefit. For handling the bit extraction stuff, either [snip] (b) do a loop over the bit positions Now that sounds more useful. I'll give that a try. I'm glad you found something possibly more useful in my posting :-) Cheers, John -- http://mail.python.org/mailman/listinfo/python-list
Re: Newbie completely confused
In message [EMAIL PROTECTED], Gabriel Genellina wrote: En Fri, 21 Sep 2007 13:34:40 -0300, Jeroen Hegeman [EMAIL PROTECTED] escribi�: class ModerateClass: def __init__(self): return def __del__(self): pass return class HugeClass: def __init__(self,line): self.clear() self.input(line) return def __del__(self): del self.B4v return def clear(self): self.long_classes = {} self.B4v={} return (BTW, all those return statements are redundant and useless) The OP could be trying to use them as some kind of textual indicator of the end of the function. Myself, I prefer end-comments, e.g. class HugeClass : ... def clear(self) : ... #end clear #end HugeClass -- http://mail.python.org/mailman/listinfo/python-list
Re: Newbie completely confused
Jeroen Hegeman schreef: ...processing all 2 files found -- 1/2: ./test_file0.txt Now reading ... DEBUG readLines A took 0.093 s ...took 8.85717201233 seconds -- 2/2: ./test_file0.txt Now reading ... DEBUG readLines A took 3.917 s ...took 12.8725550175 seconds So the first time around the file gets read in in ~0.1 seconds, the second time around it needs almost four seconds! As far as I can see this is related to 'something in memory being copied around' since if I replace the 'alternative 1' by the 'alternative 2', basically making sure that my classes are not used, reading time the second time around drops back to normal (= roughly what it is the first pass). (First, I had to add timing code to ReadClasses: the code you posted doesn't include them, and only shows timings for ReadLines.) Your program uses quite a bit of memory. I guess it gets harder and harder to allocate the required amounts of memory. If I change this line in ReadClasses: built_classes[len(built_classes)] = HugeClass(long_line) to dummy = HugeClass(long_line) then both times the files are read and your data structures are built, but after each run the data structure is freed. The result is that both runs are equally fast. Also, if I run the first version (without the dummy) on a computer with a bit more memory (1 GiB), it seems there is no problem allocating memory: both runs are equally fast. I'm not sure how to speed things up here... you're doing much processing on a lot of small chunks of data. I have a number of observations and possible improvements though, and some might even speed things up a bit. You read the files, but don't use the contents; instead you use long_line over and over. I suppose you do that because this is a test, not your actual code? __init__() with nothing (or only return) in it is not useful; better to just leave it out. You have a number of return statements that don't do anything (i.e. they return nothing (None actually) at the end of the function). A function without return automatically returns None at the end, so it's better to leave them out. Similarly you don't need to call sys.exit(): the script will terminate anyway if it reaches the end. Better leave it out. LongClass.clear() doesn't do anything and isn't called anyway; leave it out. ModerateClass.__del__() doesn't do anything either. I'm not sure how it affects what happens if ModerateClass gets freed, but I suggest you don't start messing with __del__() until you have more Python knowledge and experience. I'm not sure why you think you need to implement that method. The same goes for HugeClass.__del__(). It does delete self.B4v, but the default behavior will do that too. Again, I don't get why you want to override the default behavior. In a number of cases, you use a dict like this: built_classes = {} for i in LINES: built_classes[len(built_classes)] = ... So you're using the indices 0, 1, 2, ... as the keys. That's not what dictionaries are made for; lists are much better for that: built_classes = [] for i in LINES: built_classes.append(...) HugeClass.B4v isn't used, so you can safely remove it. Your readLines() function reads a whole file into memory. If you're working with large files, that's not such a good idea. It's better to load one line at a time into memory and work on that. I would even completely remove readLines() and restructure ReadClasses() like this: def ReadClasses(filename): print 'Now reading ...' built_classes = [] # Open file in_file = open(filename, 'r') # Read lines and interpret them. time_a = time.time() for i in in_file: ## This is alternative 1. built_classes.append(HugeClass(long_line)) ## The next line is alternative 2. ##built_classes[len(built_classes)] = long_line in_file.close() time_b = time.time() print DEBUG readClasses took %.3f s % (time_b - time_a) Personally I only use 'i' for integer indices (as in 'for i in range(10)'); for other use I prefer more descriptive names: for line in in_file: ... But I guess that's up to personal preference. Also you used LINES to store the file contents; the convention is that names with all capitals are used for constants, not for things that change. In ProcessList(), you keep the index in a separate variable. Python has a trick so you don't have to do that yourself: nfiles = len(input_files) for file_index, i in enumerate(input_files): print -- %i/%i: %s % (file_index + 1, nfiles, i) ReadClasses(i) Instead of item0, item1, ... , it's generally better to use a list, so you can use item[0], item[1], ... And finally, John Machin's suggestion looks like a good way to restructure that long sequence of conversions and assignments in HugeClass. -- The saddest aspect of life right now is that science gathers
Newbie completely confused
Dear Pythoneers, I'm moderately new to python and it got me completely lost already. I've got a bunch of large (30MB) txt files containing one 'event' per line. I open files after each other, read them line by line and from each line build a 'data structure' of a main class (HugeClass) containing some simple information as well as several instances of some other classes. No problem so far, but I noticed that the first file was always faster than the others, whereas I would expect it to be slower, if anything. Testing with two copies of the same file shows the same behaviour. Below is a (rather large, I'll explain) chunk of code. I ran this in a directory with two test files called 'test_file0.txt' and 'test_file1.txt', each containing 10k lines of the same information as the 'long_line' variable in the code. This shows the following timing (consistently) for the little piece of code that reads all lines from file: ...processing all 2 files found -- 1/2: ./test_file0.txt Now reading ... DEBUG readLines A took 0.093 s ...took 8.85717201233 seconds -- 2/2: ./test_file0.txt Now reading ... DEBUG readLines A took 3.917 s ...took 12.8725550175 seconds So the first time around the file gets read in in ~0.1 seconds, the second time around it needs almost four seconds! As far as I can see this is related to 'something in memory being copied around' since if I replace the 'alternative 1' by the 'alternative 2', basically making sure that my classes are not used, reading time the second time around drops back to normal (= roughly what it is the first pass). I already want to apologise for the size of the code chunk below. I know about 'minimal reproducible examples' and such but I found out that if I commented out the filling (and thus binding) of some of the member variables in the lower-level classes, the problem (sometimes) also disappears. That also points to some magic happening in memory? I probably mucked something up but I'm really lost as to where. Any help would be appreciated. The original problem showed up using Python 2.4.3 under linux (Fedora Core 1). Python 2.3.5 on OS X 10.4.10 (PPC) appears not to show this issue(?). Thanks, Jeroen P.S. Any ideas on optimising the input to the classes would be welcome too ;-) Jeroen Hegeman jeroen DOT hegeman AT gmail DOT com ===Start of code chunk= #!/usr/bin/env python import time import sys import os import gzip import pdb long_line = 1,31905,0,174501,46152419,2117961,143,-1.,51,2,-19.9139,42,-19.9140 , 6.6002,0,0,0,46713.1484,2,0.,-1,1.4203220606,0.3876158297,147.121017 4561,147.1284120973,-2,0.,-1,1.5887237787,-2.4011900425,-319.7776794 434,319.7906836817,4,21,0.,-1,-0.5672637224,2.2052443027,-43.2842369 080,43.3440905719,21,0.,-1,-0.8540721536,0.0770076364,-22.7033920288 , 22.7195827425,21,0.,-1,0.1623233557,0.5845987201,-28.0794525146,28.0 860084170,21,0.,-1,0.1943928897,-0.2195242196,-22.0666370392,22.0685 899391,6,0.,-1,-40.1810989380,-127.0743789673,-104.9231948853,239.74 36794163,-6,0.,-1,43.2013626099,125.0640945435,-67.7339172363,227.17 53587387,24,0.,-1,-57.9123306274,-17.3483123779,-71.8334121704,123.4 397648033,-24,0.,-1,84.0985488892,54.4542312622,-62.4525032043,144.5 299239704,5,0.,-1,17.7312316895,-109.7260665894,-33.0897827148,116.3 039146130,-5,0.,-1,-40.8971862793,70.6098632812,-5.2814140320,82.645 4347683,4,0.,-1,-6.2859884724,-17.9586020410,-58.9464384913,69.40294 68585,-3,0.,-1,-51.6263811588,0.6104701459,-12.8869901896,54.0368221 571,3,0.,-1,16.4690684490,48.0271777511,-51.7867884636,74.5327484701 ,-4,0.,-1,67.6295298338,6.4269350171,-10.6658525467,69.9971834876,7, 7,1.0345464706e+01,-7.0800781250e+01,-2.0385742187e+01,7.5256346272e +01,1.3148,0.0072,0.0072,1.3148,0.0072,0.0072,1.0255,1.0413,0.0,0.0,0.0, 0.0,-1.0,-4.2383,49.5276,13,0.1537,0.5156,0,0.9982,0.0034,1.,7,1,0.9 566,0.0062,1,0,2,1.2736,1,7.8407,1,0,2,1.2736,1,7.8407,0,0,-1.0,-1.0,5,1 ,-2.4047853470e+01,4.0832519531e+01,-3.8452150822e+00,4.7851562559e +01,1.3383,0.0051,0.0051,1.3383,0.0051,0.0051,0.9340,0.9541,0.0,0.0,0.0, 0.0,-1.0,-2.4609,21.3916,7,0.1166,0.5977,0,0.,0.0052,1.,9,1,0.99 47,0.0063,1,0,2,0.7735,1,74.7937,1,0,2,0.7735,1,74.7937,0,0,-1.0,-1.0,5, 1,-4.4067382812e+01,2.5634796619e+00,-1.1138916016e+01,4.6203614579e +01,1.3533,0.0054,0.0054,1.3533,0.0054,0.0054,1.0486,1.0903,0.0,0.0,0.0, 0.0,-1.0,-3.9648,31.3733,13,0.1767,0.5508,100,0.9977,0.0040,1.,9,1,0 . ,0.4349,0,0,0,0.,0,-1000.,0,0,0,0.,0,-1000.,0,0,-1.0 ,-1.0,0,1,3.7200927734e+01,2.7465817928e+00,-5.5847163200e +00,3.7994386563e +01,1.3634,0.0062,0.0062,1.6488,0.0385,0.0385,0.7141,0.9013,5.3986899118 e+00,6.6766492833e-01,-2.3780213181e-01,5.4460399892e +00,0.5504,-3.1445,0.7776,9,0.1169,0.7734,0,0.9977,0.0040,1.,7,1,0.0
Re: Newbie completely confused
On Sep 22, 2:34 am, Jeroen Hegeman [EMAIL PROTECTED] wrote: [snip] ...processing all 2 files found -- 1/2: ./test_file0.txt Now reading ... DEBUG readLines A took 0.093 s ...took 8.85717201233 seconds Your code does NOT include any statements that could have produced the above line of output -- IOW, you have not posted the code that you actually ran. Your code is already needlessly monstrously large. That's two strikes against anyone bothering to try to nut out what's going wrong, if indeed anything is going wrong. [snip] The original problem showed up using Python 2.4.3 under linux (Fedora Core 1). Python 2.3.5 on OS X 10.4.10 (PPC) appears not to show this issue(?). And Python 2.5.1 does what? Strike 3. P.S. Any ideas on optimising the input to the classes would be welcome too ;-) 1. What is the point of having a do-nothing __init__ method? I'd suggest making the __init__method do the input. 2. See below [snip] class LongClass: def __init__(self): return def clear(self): return def input(self, foo, c): self.item0 = float(foo[c]); c += 1 self.item1 = float(foo[c]); c += 1 [multiple snips ahead] self.item18 = float(foo[c]); c+=1 self.item19 = int(foo[c]); c+=1 self.item20 = float(foo[c]); c+=1 self.item27 = bool(int(foo[c])); c+=1 self.item30 = (foo[c] == 1); c += 1 self.item31 = (foo[c] == 1); c += 1 self.item47 = bool(int(foo[c])); c+=1 return c at global level: converters = [float] * 48 cvlist = [ (int, (19, 22, 26, 34, 40, 46)), (lambda z: bool(int(z)), (27, 47)), (lambda z: z == 1, (30, 31, 36, 37, 42)), ] for func, indexes in cvlist: for x in indexes: converters[x] = func enumerated_converters = list(enumerate(converters)) Then: def input(self, foo, c): self.item = [func(foo[c+x]) for x, func in enumerated_converters] return c + 48 which requires you refer to obj.item[19] instead of obj.item19 If you *must* use item19 etc, then try this: for x, func in enumerated_converters: setattr(self, item%d % x, func(foo[c+x])) You could also (shock, horror) use meaningful names for the attributes ... include a list of attribute names in the global stuff, and put the relevant name in as the 2nd arg of setattr() instead of itemxx. For handling the bit extraction stuff, either (a) conversion functions have a 2nd arg which defaults to None and whose usage depends on the function itself ... would be mask or bit position (or could be e.g. a scale factor for implied-decimal-point input) or (b) do a loop over the bit positions HTH, John -- http://mail.python.org/mailman/listinfo/python-list
Re: Newbie completely confused
En Fri, 21 Sep 2007 13:34:40 -0300, Jeroen Hegeman [EMAIL PROTECTED] escribi�: So the first time around the file gets read in in ~0.1 seconds, the second time around it needs almost four seconds! As far as I can see this is related to 'something in memory being copied around' since if I replace the 'alternative 1' by the 'alternative 2', basically making sure that my classes are not used, reading time the second time around drops back to normal (= roughly what it is the first pass). class ModerateClass: def __init__(self): return def __del__(self): pass return class HugeClass: def __init__(self,line): self.clear() self.input(line) return def __del__(self): del self.B4v return def clear(self): self.long_classes = {} self.B4v={} return Don't use __del__ unless it's absolutely necesary. ModerateClass.__del__ does nothing, but its mere existence does not allow the garbage collector to work efficiently. If you explicitey call clear() from HugeClass, you can avoid using __del__ too. And if B4v is not involved in cycles, clearing it is not even necesary. (BTW, all those return statements are redundant and useless) -- Gabriel Genellina -- http://mail.python.org/mailman/listinfo/python-list