Hi Angelo, I think what might be happening here is that you have space characters in your integer column, which are causing problems.
I created what could be a reproducible example of your problem at: https://gist.github.com/thisisnic/af265166d5cd1ebce605cf3e478ee6d8 In short, can you try including the parameter (and values) `null_values = c("", " ", "NA")` in your call to `open_dataset()`? By default, empty strings are set to NA values, but spaces are not, so this could be the source of your error. Nic On Thu, 9 Feb 2023 at 05:06, Angelo Casalan <acasalan...@gmail.com> wrote: > Hi Everyone, > > Thanks for the responses. I hope you are all well. > > Hi Dewey. As to the problematic column error message: Invalid: Could not > open CSV input source 'folder/name.CSV': Invalid: In CSV column #30: Row > #5: CSV conversion error to int32: invalid value '' > > I manually opened the csv and saw the cells are empty or blanks along with > integers on the same column 30. Also present in some other columns. > > I tried manually setting via schema() the columns as utf8()/character > equivalent in R, or string(). > > I still get the same error message. > > disk.frame read these columns mixing integers with spaces/blanks as > integers smoothly with no error messages at all. I think disk.frame read > the spaces/blanks as null values/NA in R studio. > > I am scripting all of these in RMarkdown if that might be a factor. > > Questions: > 1.Is there a way in open_dataset() to automatically set all blanks as null > values across multiple csvs which im trying to load into R? Similar in > logic to pandas.read_csv('test.csv',na_values=['nan']) > > manual re-encoding is not feasible because im dealing with millions of data > points, I am also just a secondary user of this data, and my goal is to > automate in R for my organization. > > 2. Are there other arrow functions/commands that can load multiple csvs > from my local folder as an arrow object? > > Regards, > > On Tue, Jan 31, 2023 at 8:50 AM Angelo Casalan <acasalan...@gmail.com> > wrote: > > > Hi Jacob, > > > > Thanks. To provide some specifics on my query: > > > > 1.which version of arrow are you running? > > - 10.0.1 > > > > 2. The error message provides an exact col,row position, have you checked > > the value there? > > Yes. It is int64. This is after running open_dataset without specifying > > schema: > > ''' > > arrow<-open_dataset( > > sources="location of csv files", > > format="csv" > > ) > > ''' > > > > 3. I have to correct the exact error message: > > CSV conversion error to int64:invalid value ' ' > > I think arrow tells me the invalid value present is ' ' > > > > 4. This reminds me of cases where scientific notation is used for > > integers > > which causes an error but that usually shows the value e.g. "1e6". > > the invalid value is: ' ' > > > > 5. I am really confused because using disk.frame() function, on the same > > csvs, I have not encountered this problem on this column because it was > > cleanly encoded as a numeric variable. > > > > Regards, > > > > > > > > On Fri, Jan 27, 2023 at 9:43 AM Angelo Casalan <acasalan...@gmail.com> > > wrote: > > > >> Hi , > >> > >> I hope you are well. I wish to ask how I can resolve this error: > >> > >> "CSV conversion error to int64: invalid value" > >> > >> > >> To give an idea of my dataset. I have 4 csvs all placed in a local > folder. > >> > >> > >> The code below worked when importing: > >> > >> > >> arrow<-open_dataset( > >> sources="csv location", > >> format="csv") > >> > >> > >> However, when I run: > >> > >> > >> arrow %>% count(column) %>% collect() > >> nrow(arrow %>% collect) > >> > >> head(arrow %>% collect(),10 ) > >> > >> I always get the same error message: "Invalid: In CSV column #12: Row > >> #580. CSV conversion error to int64: invalid value" > >> > >> I tried going back to open_dataset(,schema() ). Where the column that is > >> giving me problems is set as utf8 or sometimes str in the schema > argument. > >> > >> schema( > >> col=utf8(), > >> other nth columns > >> ) > >> > >> But I still encounter the same problem. > >> > >> Using this code below fail to work either. > >> > >> arrow2<-arrow_table(arrow) > >> > >> Thanks in advance if you can help me. > >> > >> -- > >> Regards, > >> > >> Angelo Casalan > >> Statistical Methodology Unit > >> > > > > > > -- > > Regards, > > > > Angelo Casalan > > Statistical Methodology Unit > > > > > -- > Regards, > > Angelo Casalan >