On 2008-08-13, Daniel Lenski <[EMAIL PROTECTED]> wrote:
> On Wed, 13 Aug 2008 16:57:32 -0400, Zachary Pincus wrote:
>> Your approach generates numerous large temporary arrays and lists. If
>> the files are large, the slowdown could be because all that memory
>> allocation is causing some VM thrashing. I've run into that at times
>> parsing large text files.
>
> Thanks, Zach. I do think you have the right explanation for what was
> wrong with my code.
>
> I thought the slowdown was due to the overhead of interpreted code. So I
> tried to do everything in list comprehensions and array statements rather
> than explicit Python loops. But your were definitely right, the slowdown
> was due to memory use, not interpreted code.
>
>> Perhaps better would be to iterate through the file, building up your
>> cells dictionary incrementally. Finally, once the file is read in
>> fully, you could convert what you can to arrays...
>>
>> f = open('big_file')
>> header = f.readline()
>> cells = {'tet':[], 'hex':[], 'quad':[]} for line in f:
>> vals = line.split()
>> index_property = vals[:2]
>> type=vals[3]
>> nodes = vals[3:]
>> cells[type].append(index_property+nodes)
>> for type, vals in cells:
>> cells[type] = numpy.array(vals, dtype=int)
>
> This is similar to what I tried originally! Unfortunately, repeatedly
> appending to a list seems to be very slow... I guess Python keeps
> reallocating and copying the list as it grows. (It would be nice to be
> able to tune the increments by which the list size increases.)
The list reallocation schedule is actually fairly well-tuned as it is.
Appending to a list object should be amortized O(1) time.
>> I'm not sure if this is exactly what you want, but you get the idea...
>> Anyhow, the above only uses about twice as much RAM as the size of the
>> file. Your approach looks like it uses several times more than that.
>>
>> Also you could see if:
>> cells[type].append(numpy.array([index, property]+nodes, dtype=int))
>>
>> is faster than what's above... it's worth testing.
>
> Repeatedly concatenating arrays with numpy.append or numpy.concatenate is
> also quite slow, unfortunately. :-(
Yes. There is no preallocation here.
>> If even that's too slow, maybe you'll need to do this in C? That
>> shouldn't be too hard, really.
>
> Yeah, I eventually came up with a decent solution Python solution:
> preallocate the arrays to the maximum size that might be needed. Trim
> them down afterwards. This is very wasteful of memory when there may be
> many cell types (less so if the OS does lazy allocation), but in the
> typical case of only a few cell types it works great:
Another approach would be to preallocate a substantial chunk at a
time, then concatenate all of the chunks.
--
Robert Kern
"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
-- Umberto Eco
_______________________________________________
Numpy-discussion mailing list
[email protected]
http://projects.scipy.org/mailman/listinfo/numpy-discussion