Hi Milan,

I used the too csvfix 1.6 validate function by Neil Butterworth for the csv 
file (only for csv, not for wsv) and discovered two corrupted lines and 
then it worked.

The link to the manual: 
http://neilb.bitbucket.org/csvfix/manual/csvfix16/csvfix.html

As a feature request, it would be very useful if either readtable 
findcorruption function returns the corrupted line number and/or corrupted 
line for the csv file.
If performance is the design priority, provide another dedicated 
findcorruption function to help with cleaning up the csv files should 
readtable fails

Keith 




On Saturday, 7 February 2015 08:17:51 UTC-8, Milan Bouchet-Valat wrote:
>
> Le vendredi 06 février 2015 à 15:01 -0800, Keith Kee a écrit : 
> > Hi Milan, 
> > 
> > 
> > Thanks for your advice. 
> > 
> > 
> > I spotted one corruption in a smaller sample of 3000 lines and then it 
> > worked. 
> > 
> > 
> > Then a tried a larger number of 10000 lines and it gave the following: 
> > Saw 10000 rows, 4 columns (correct) and 40022 fields*Line 1 has 6 
> > columns (not sure where "line 1" starts but line 1 was ok as per using 
> > only 3000 lines file) 
> > 
> > 
> > How do I find the corruptions using the above message? Clearly it 
> > detected 6 columns in some "Line 1", but it is not the first line. 
> > 
> > 
> > Are there any julia functions or packages I can use to clean up the 
> > data or that will highlight corrupted lines in the data. 
> > 
> > 
> > I did try loading the 15,000 line csv file into excel and it worked 
> > fine there. 
> > 
> > 
> > Looking forward to your expert advice. 
> Sorry, I'm not really an expert of that function. Can't you identify the 
> problematic line by continuing to split the file into halves? 
>
> Anyway, you should file a bug against the DataFrames package on GitHub, 
> people will be more knowledgeable, and there's apparently a bug at least 
> in the line number that is being reported. 
>
>
> Regards 
>
> > Thanks. 
> > 
> > 
> > Keith   
> > 
> > On Friday, 6 February 2015 12:19:55 UTC-8, Milan Bouchet-Valat wrote: 
> >         Le vendredi 06 février 2015 à 11:12 -0800, Keith Kee a 
> >         écrit : 
> >         > Hi 
> >         > 
> >         > 
> >         > Using DataFrames ( v"0.6.0" ) and Win32 julia 0.3.5 
> >         > 
> >         > 
> >         > ds = readtable("EURUSD.CSV", header=false) 
> >         > 
> >         > 
> >         > 
> >         > results in 
> >         > 
> >         > 
> >         > 
> >         > BoundsError() 
> >         > in findcorruption at io.jl:698 
> >         > in readtable! at io.jl:779 
> >         > in readtable at io.jl:893 
> >         > 
> >         > 
> >         > The original file has 15000 lines, works when I cut it down 
> >         to 10 
> >         > lines. 
> >         > 
> >         > 
> >         > Please advise as to whether there are limits to readtable on 
> >         win32 
> >         > setups? 
> >         15000 sounds quite small even for 32-bit. More likely, the 
> >         file contains 
> >         something readtable() doesn't like, and which does not appear 
> >         in the 
> >         first 10 lines. You could try removing half of the file, see 
> >         if it 
> >         works, and go on like that until you (possibly) find out which 
> >         line 
> >         creates a bug. 
> >         
> >         
> >         Regards 
>
>
>
>

Reply via email to