Jeff and I were just discussing a plan to massively reduce the overhead for strings and eliminate substrings. It's a bit early to get into much detail, but it would hopefully help with a lot of string-related problems that are pretty inefficient right now. Not very helpful at the moment, unfortunately, however.
On Mon, Mar 10, 2014 at 4:52 PM, John Myles White <johnmyleswh...@gmail.com>wrote: > My (not very educated) guess is that each SubString object gets its own > memory allocated. In the past, I've dealt with these problems by using > raw-byte buffers and working with those, since you can keep reusing a > single buffer for every line and avoid all memory allocation. > > I'm not clear what changes need to happen in Julia to make sure something > like this doesn't keep allocating new memory. In an ideal world, I'd be > able to process a file by: > > (1) Allocating a single string object that has, at its backend, a large > buffer of bytes. > (2) Read in a new line from the file into this string object without > allocating more bytes unless strictly necessary. > (3) Run parse functions on this string object without allocating new > memory except for the bytes needed to store a float. > > My sense is that this is a little hard in Julia at the moment. > > -- John > > > On Mar 10, 2014, at 1:30 PM, Keith Campbell <keithcc1...@gmail.com> wrote: > > Hi all, > > I'm trying to minimize memory allocation while doing line-oriented > processing on a fairly large set of text files. SubString and > pre-allocated outputs have helped, but I'm still getting memory allocations > proportional to the size of the input set and looking for new ideas. > > The toy example below illustrates how the allocations grow. > Am I right to suspect that float() is the culprit. Any thoughts for how > to cut out the remaining allocations? > > thanks, > Keith > > julia> function str_with_sub(N) > mystr = ascii("1.1,2.2") > fs=Array(Float64,2) > > for i in 1:N > dostr!(mystr, fs) > end > end > str_with_sub (generic function with 1 method) > > julia> function dostr!(mystr, fs) > fs[1] = float(SubString(mystr,1,3)) > fs[2] = float(SubString(mystr,5,7)) > end > dostr! (generic function with 1 method) > > julia> @time str_with_sub(4) > elapsed time: 0.008214327 seconds (190612 bytes allocated) > > julia> @time str_with_sub(4) > elapsed time: 8.441e-6 seconds (496 bytes allocated) > > julia> @time str_with_sub(4) > elapsed time: 6.493e-6 seconds (496 bytes allocated) > > julia> @time str_with_sub(6) > elapsed time: 7.074e-6 seconds (688 bytes allocated) > > julia> @time str_with_sub(8) > elapsed time: 7.437e-6 seconds (880 bytes allocated) > > >