Hi Angelo,

I think what might be happening here is that you have space characters in
your integer column, which are causing problems.

I created what could be a reproducible example of your problem at:
https://gist.github.com/thisisnic/af265166d5cd1ebce605cf3e478ee6d8

In short, can you try including the parameter (and values) `null_values =
c("", " ", "NA")` in your call to `open_dataset()`?  By default, empty
strings are set to NA values, but spaces are not, so this could be the
source of your error.

Nic

On Thu, 9 Feb 2023 at 05:06, Angelo Casalan <acasalan...@gmail.com> wrote:

> Hi Everyone,
>
> Thanks for the responses. I hope you are all well.
>
> Hi Dewey. As to the problematic column error message: Invalid: Could not
> open CSV input source 'folder/name.CSV': Invalid: In CSV column #30: Row
> #5: CSV conversion error to int32: invalid value ''
>
> I manually opened the csv and saw the cells are empty or blanks along with
> integers on the same column 30. Also present in some other columns.
>
> I tried manually setting via schema() the columns as utf8()/character
> equivalent in R, or string().
>
> I still get the same error message.
>
> disk.frame read these columns mixing integers with spaces/blanks as
> integers smoothly with no error messages at all. I think disk.frame read
> the spaces/blanks as null values/NA in R studio.
>
> I am scripting all of these in RMarkdown if that might be a factor.
>
> Questions:
> 1.Is there a way in open_dataset() to automatically set all blanks as null
> values across multiple csvs which im trying to load into R? Similar in
> logic to pandas.read_csv('test.csv',na_values=['nan'])
>
> manual re-encoding is not feasible because im dealing with millions of data
> points, I am also just a secondary user of this data, and my goal is to
> automate in R for my organization.
>
> 2.  Are there other arrow functions/commands that can load multiple csvs
> from my local folder as an arrow object?
>
> Regards,
>
> On Tue, Jan 31, 2023 at 8:50 AM Angelo Casalan <acasalan...@gmail.com>
> wrote:
>
> > Hi Jacob,
> >
> > Thanks. To provide some specifics on my query:
> >
> > 1.which version of arrow are you running?
> > - 10.0.1
> >
> > 2. The error message provides an exact col,row position, have you checked
> > the value there?
> > Yes. It is int64. This is after running open_dataset without specifying
> > schema:
> > '''
> > arrow<-open_dataset(
> > sources="location of csv files",
> > format="csv"
> > )
> > '''
> >
> >  3. I have to correct the exact error message:
> > CSV conversion error to int64:invalid value ' '
> > I think arrow tells me the invalid value present is ' '
> >
> >  4. This reminds me of cases where scientific notation is used for
> > integers
> >  which causes an error but that usually shows the value e.g. "1e6".
> > the invalid value is: ' '
> >
> > 5. I am really confused because using disk.frame() function, on the same
> > csvs, I have not encountered this problem on this column because it was
> > cleanly encoded as a numeric variable.
> >
> > Regards,
> >
> >
> >
> > On Fri, Jan 27, 2023 at 9:43 AM Angelo Casalan <acasalan...@gmail.com>
> > wrote:
> >
> >> Hi ,
> >>
> >> I hope you are well. I wish to ask how I can resolve this error:
> >>
> >> "CSV conversion error to int64: invalid value"
> >>
> >>
> >> To give an idea of my dataset. I have 4 csvs all placed in a local
> folder.
> >>
> >>
> >> The code below worked when importing:
> >>
> >>
> >> arrow<-open_dataset(
> >> sources="csv location",
> >> format="csv")
> >>
> >>
> >> However, when I run:
> >>
> >>
> >> arrow %>% count(column) %>% collect()
> >> nrow(arrow %>% collect)
> >>
> >> head(arrow %>% collect(),10 )
> >>
> >> I always get the same  error message: "Invalid: In CSV column #12: Row
> >> #580. CSV conversion error to int64: invalid value"
> >>
> >> I tried going back to open_dataset(,schema() ). Where the column that is
> >> giving me problems is set as utf8 or sometimes str in the schema
> argument.
> >>
> >> schema(
> >> col=utf8(),
> >> other nth columns
> >> )
> >>
> >> But I still encounter the same problem.
> >>
> >> Using this code below fail to work either.
> >>
> >> arrow2<-arrow_table(arrow)
> >>
> >> Thanks in advance if you can help me.
> >>
> >> --
> >> Regards,
> >>
> >> Angelo Casalan
> >> Statistical Methodology Unit
> >>
> >
> >
> > --
> > Regards,
> >
> > Angelo Casalan
> > Statistical Methodology Unit
> >
>
>
> --
> Regards,
>
> Angelo Casalan
>

Reply via email to