My (not very educated) guess is that each SubString object gets its own memory allocated. In the past, I've dealt with these problems by using raw-byte buffers and working with those, since you can keep reusing a single buffer for every line and avoid all memory allocation.
I'm not clear what changes need to happen in Julia to make sure something like this doesn't keep allocating new memory. In an ideal world, I'd be able to process a file by: (1) Allocating a single string object that has, at its backend, a large buffer of bytes. (2) Read in a new line from the file into this string object without allocating more bytes unless strictly necessary. (3) Run parse functions on this string object without allocating new memory except for the bytes needed to store a float. My sense is that this is a little hard in Julia at the moment. -- John On Mar 10, 2014, at 1:30 PM, Keith Campbell <keithcc1...@gmail.com> wrote: > Hi all, > > I'm trying to minimize memory allocation while doing line-oriented processing > on a fairly large set of text files. SubString and pre-allocated outputs > have helped, but I'm still getting memory allocations proportional to the > size of the input set and looking for new ideas. > > The toy example below illustrates how the allocations grow. > Am I right to suspect that float() is the culprit. Any thoughts for how to > cut out the remaining allocations? > > thanks, > Keith > > julia> function str_with_sub(N) > mystr = ascii("1.1,2.2") > fs=Array(Float64,2) > > for i in 1:N > dostr!(mystr, fs) > end > end > str_with_sub (generic function with 1 method) > > julia> function dostr!(mystr, fs) > fs[1] = float(SubString(mystr,1,3)) > fs[2] = float(SubString(mystr,5,7)) > end > dostr! (generic function with 1 method) > > julia> @time str_with_sub(4) > elapsed time: 0.008214327 seconds (190612 bytes allocated) > > julia> @time str_with_sub(4) > elapsed time: 8.441e-6 seconds (496 bytes allocated) > > julia> @time str_with_sub(4) > elapsed time: 6.493e-6 seconds (496 bytes allocated) > > julia> @time str_with_sub(6) > elapsed time: 7.074e-6 seconds (688 bytes allocated) > > julia> @time str_with_sub(8) > elapsed time: 7.437e-6 seconds (880 bytes allocated) >