On Friday, May 1, 2015 at 9:16:43 AM UTC-4, Tim Holy wrote: > > On Friday, May 01, 2015 03:19:03 AM Scott Jones wrote: > > As the string grows, Julia's internals end up having to reallocate the > > memory and sometimes copy it to a new location, hence the O(n^2) nature > of > > the code. > > Small correction: push! is not O(n^2), it's O(nlogn). Internally, the > storage > array grows by factors of 2 [1]; after one allocation of size 2n you can > add n > more elements without reallocating. >
Good to know, I hate to say it, but the performance looked so bad to me, I didn't bother to see if it even had that optimization (which is exactly what I did for strings for the language I used to develop) Does it always grow by factors of 2? That might not be so good... we found that after a certain point, it was better to increase in chunks, say of 64K, or 1M, because increasing the size that way of large LOBs could make you run out of memory fairly quickly... > That said, O(nlogn) can be pretty easily beat by O(2n): make one pass > through > and count how many you'll need, allocate the whole thing, and then stuff > in > elements. As you seem to be planning to do. > Yes, and have very nice performance improvements to show for it (most were around 4-10x faster, go look at what I put in my gist), and that's even with my pure Julia version... :-) > > --Tim > > [1] Last I looked, that is; there was some discussion about switching it > to > something like 1.5 because of various discussions of memory fragmentation > and > reuse. > Still, same issue as I described above... probably better to increase by 2x up to a point, and then by chunk sizes, where the chunk sizes might slowly get larger...