On 2008-08-13, Daniel Lenski <[EMAIL PROTECTED]> wrote: > On Wed, 13 Aug 2008 16:57:32 -0400, Zachary Pincus wrote: >> Your approach generates numerous large temporary arrays and lists. If >> the files are large, the slowdown could be because all that memory >> allocation is causing some VM thrashing. I've run into that at times >> parsing large text files. > > Thanks, Zach. I do think you have the right explanation for what was > wrong with my code. > > I thought the slowdown was due to the overhead of interpreted code. So I > tried to do everything in list comprehensions and array statements rather > than explicit Python loops. But your were definitely right, the slowdown > was due to memory use, not interpreted code. > >> Perhaps better would be to iterate through the file, building up your >> cells dictionary incrementally. Finally, once the file is read in >> fully, you could convert what you can to arrays... >> >> f = open('big_file') >> header = f.readline() >> cells = {'tet':[], 'hex':[], 'quad':[]} for line in f: >> vals = line.split() >> index_property = vals[:2] >> type=vals[3] >> nodes = vals[3:] >> cells[type].append(index_property+nodes) >> for type, vals in cells: >> cells[type] = numpy.array(vals, dtype=int) > > This is similar to what I tried originally! Unfortunately, repeatedly > appending to a list seems to be very slow... I guess Python keeps > reallocating and copying the list as it grows. (It would be nice to be > able to tune the increments by which the list size increases.)
The list reallocation schedule is actually fairly well-tuned as it is. Appending to a list object should be amortized O(1) time. >> I'm not sure if this is exactly what you want, but you get the idea... >> Anyhow, the above only uses about twice as much RAM as the size of the >> file. Your approach looks like it uses several times more than that. >> >> Also you could see if: >> cells[type].append(numpy.array([index, property]+nodes, dtype=int)) >> >> is faster than what's above... it's worth testing. > > Repeatedly concatenating arrays with numpy.append or numpy.concatenate is > also quite slow, unfortunately. :-( Yes. There is no preallocation here. >> If even that's too slow, maybe you'll need to do this in C? That >> shouldn't be too hard, really. > > Yeah, I eventually came up with a decent solution Python solution: > preallocate the arrays to the maximum size that might be needed. Trim > them down afterwards. This is very wasteful of memory when there may be > many cell types (less so if the OS does lazy allocation), but in the > typical case of only a few cell types it works great: Another approach would be to preallocate a substantial chunk at a time, then concatenate all of the chunks. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco _______________________________________________ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion