Hi Milan,
Thanks for your advice.
I spotted one corruption in a smaller sample of 3000 lines and then it
worked.
Then a tried a larger number of 10000 lines and it gave the following:
Saw 10000 rows, 4 columns (correct) and 40022 fields*Line 1 has 6 columns
(not sure where "line 1" starts but line 1 was ok as per using only 3000
lines file)
How do I find the corruptions using the above message? Clearly it detected
6 columns in some "Line 1", but it is not the first line.
Are there any julia functions or packages I can use to clean up the data or
that will highlight corrupted lines in the data.
I did try loading the 15,000 line csv file into excel and it worked fine
there.
Looking forward to your expert advice.
Thanks.
Keith
On Friday, 6 February 2015 12:19:55 UTC-8, Milan Bouchet-Valat wrote:
>
> Le vendredi 06 février 2015 à 11:12 -0800, Keith Kee a écrit :
> > Hi
> >
> >
> > Using DataFrames ( v"0.6.0" ) and Win32 julia 0.3.5
> >
> >
> > ds = readtable("EURUSD.CSV", header=false)
> >
> >
> >
> > results in
> >
> >
> >
> > BoundsError()
> > in findcorruption at io.jl:698
> > in readtable! at io.jl:779
> > in readtable at io.jl:893
> >
> >
> > The original file has 15000 lines, works when I cut it down to 10
> > lines.
> >
> >
> > Please advise as to whether there are limits to readtable on win32
> > setups?
> 15000 sounds quite small even for 32-bit. More likely, the file contains
> something readtable() doesn't like, and which does not appear in the
> first 10 lines. You could try removing half of the file, see if it
> works, and go on like that until you (possibly) find out which line
> creates a bug.
>
>
> Regards
>