[julia-users] Re: Writing a subset DataFrame to file is 220 times slower than saving the whole DataFrame

Alex Mellnik Thu, 27 Oct 2016 09:37:43 -0700

I'm not sure what's wrong with sub, but don't use it -- it's definitely 
worse than just making a copy of the subset you want to write.


s = df[df[:rank_PV].<=r_max,:]
@time write_results(s, name, "significant", sep, h)



On Thursday, October 27, 2016 at 5:07:31 AM UTC-7, Fred wrote:
>
> Hi,
>
> In the same program,  I save in a file a DataFrame "df" and a subset of 
> this DataFrame in another file. The problem I have is that saving the 
> subset is much slower than saving the entire DataFrame : 220 times slower. 
> It is too slow and I don't what is my mistake.
>
> Thank you for your advices !
>
> in Julia 0.4.5 : 
>
> Saving the entire DataFrame
> Saving... results/Stat.csv
> 1.115944 seconds (13.78 M allocations: 319.534 MB, 2.59% gc time)
>
>
> Saving the subset of the DataFrame 
> Saving... significant/Stat.csv
> 246.099835 seconds (41.79 M allocations: 376.189 GB, 4.77% gc time)
> elapsed time: 251.581459853 seconds
>
>
> in Julia 0.5 : 
>
> Saving the entire DataFrame
> Saving... results/Stat.csv
> 1.060365 seconds (7.08 M allocations: 116.025 MB, 0.73% gc time)
>
> Saving the subset of the DataFrame 
> Saving... significant/Stat.csv
> 226.813587 seconds (37.40 M allocations: 376.268 GB, 2.42% gc time)
> elapsed time: 232.95933586 seconds
>
> ################################################
> # my function to save the results to a file
>
> function write_results(x, name, dir, sep, h)
>   outfile = "$dir/$name"
>   println("Saving...\t", outfile)
>   writetable( outfile, x, separator = sep, header = h)
> end
>
>
> # save my DataFrame df : very fast
> @time write_results(df, name, "results", sep, h)
>
>
> # subset DataFrame s
> s = sub(df, (df[:rank_PV] .<= r_max))
>
> # save my subset DataFrame s : incredibly slow !
>
> @time write_results(s, name, "significant", sep, h)
>
>
>

[julia-users] Re: Writing a subset DataFrame to file is 220 times slower than saving the whole DataFrame

Reply via email to