Serializing strings was indeed unreasonably slow. I just pushed a commit that should be a significant improvement.
On Wed, May 21, 2014 at 7:17 AM, Tim Holy <tim.h...@gmail.com> wrote: > Samuel, rewriting in C is almost never necessary, because Julia is as fast as > C. It's just that some implementations are faster, and some slower. > > I don't want to dissuade you in any way from improving the performance of > serialize, but you might consider looking at HDF5/JLD. > > --Tim > > On Tuesday, May 20, 2014 07:34:11 PM Samuel Colvin wrote: >> Thanks for that, yes you're write I was being dumb. >> >> Just to give an example of how slooooow, reading a 48mb csv files with a >> mixture of strings and numbers with DataFrames's readtable, then writing it >> gives: >> >> *julia> **@time writetable("data.csv", r)* >> >> elapsed time: 3.539380236 seconds (77981180 bytes allocated) >> >> *julia> **@time serialize(open("data.dat", "w"), r)* >> >> elapsed time: 83.743085747 seconds (3439332792 bytes allocated) >> >> Lots of time seems to be spent in string.jl >> print_escaped. >> >> Surely there is some improvement that can be made to this? If print_escape >> is the bottle neck couldn't it be rewritten in c?