Same here on a 12gb ram machine 

               _
   _       _ _(_)_     |  A fresh approach to technical computing
  (_)     | (_) (_)    |  Documentation: http://docs.julialang.org
   _ _   _| |_  __ _   |  Type "?help" for help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 0.5.0-dev+429 (2015-09-29 09:47 UTC)
 _/ |\__'_|_|_|\__'_|  |  Commit f71e449 (14 days old master)
|__/                   |  x86_64-w64-mingw32

julia> using DataFrames                                                     
       
                                                                            
       
julia> train = readtable("./test.csv");                                     
       
ERROR: OutOfMemoryError()                                                   
       
 in resize! at array.jl:452                                                 
       
 in readnrows! at 
C:\Users\Mustafa\.julia\v0.5\DataFrames\src\dataframe\io.jl:164  
 in readtable! at 
C:\Users\Mustafa\.julia\v0.5\DataFrames\src\dataframe\io.jl:767  
 in readtable at 
C:\Users\Mustafa\.julia\v0.5\DataFrames\src\dataframe\io.jl:847   
 in readtable at 
C:\Users\Mustafa\.julia\v0.5\DataFrames\src\dataframe\io.jl:893   





On Tuesday, October 13, 2015 at 3:47:58 PM UTC-4, Yichao Yu wrote:
>
>
> On Oct 13, 2015 2:47 PM, "Grey Marsh" <kd.k...@gmail.com <javascript:>> 
> wrote:
>
> Which julia version are you using. There's sime gc tweak on 0.4 for that.
>
> >
> > I was trying to load the training dataset from springleaf marketing 
> response on Kaggle. The csv is 921 mb, has 145321 row and 1934 columns. My 
> machine has 8 gb ram and julia ate 5.8gb+ memory after that I stopped julia 
> as there was barely any memory left for OS to function properly. It took 
> about 5-6 minutes later for the incomplete operation. I've windows 8  
> 64bit. Used the following code to read the csv to Julia:
> >
> > using DataFrames
> > train = readtable("C:\\train.csv")
> >
> > Next I tried to to load the same file in python: 
> >
> > import pandas as pd
> > train = pd.read_csv("C:\\train.csv")
> >
> > This took ~2.4gb memory, about a minute time
> >
> > Checking the same in R again:
> > df = read.csv('E:/Libraries/train.csv', as.is = T)
> >
> > This took 2-3 minutes and consumes 3.5gb mem on the same machine. 
> >
> > Why such discrepancy and why Julia even fails to load the csv before 
> running out of memory? If there is any better way to get the file loaded in 
> Julia?
> >
> >
>

Reply via email to