Hi Everyone, Thanks for the responses. I hope you are all well.
Hi Dewey. As to the problematic column error message: Invalid: Could not open CSV input source 'folder/name.CSV': Invalid: In CSV column #30: Row #5: CSV conversion error to int32: invalid value '' I manually opened the csv and saw the cells are empty or blanks along with integers on the same column 30. Also present in some other columns. I tried manually setting via schema() the columns as utf8()/character equivalent in R, or string(). I still get the same error message. disk.frame read these columns mixing integers with spaces/blanks as integers smoothly with no error messages at all. I think disk.frame read the spaces/blanks as null values/NA in R studio. I am scripting all of these in RMarkdown if that might be a factor. Questions: 1.Is there a way in open_dataset() to automatically set all blanks as null values across multiple csvs which im trying to load into R? Similar in logic to pandas.read_csv('test.csv',na_values=['nan']) manual re-encoding is not feasible because im dealing with millions of data points, I am also just a secondary user of this data, and my goal is to automate in R for my organization. 2. Are there other arrow functions/commands that can load multiple csvs from my local folder as an arrow object? Regards, On Tue, Jan 31, 2023 at 8:50 AM Angelo Casalan <acasalan...@gmail.com> wrote: > Hi Jacob, > > Thanks. To provide some specifics on my query: > > 1.which version of arrow are you running? > - 10.0.1 > > 2. The error message provides an exact col,row position, have you checked > the value there? > Yes. It is int64. This is after running open_dataset without specifying > schema: > ''' > arrow<-open_dataset( > sources="location of csv files", > format="csv" > ) > ''' > > 3. I have to correct the exact error message: > CSV conversion error to int64:invalid value ' ' > I think arrow tells me the invalid value present is ' ' > > 4. This reminds me of cases where scientific notation is used for > integers > which causes an error but that usually shows the value e.g. "1e6". > the invalid value is: ' ' > > 5. I am really confused because using disk.frame() function, on the same > csvs, I have not encountered this problem on this column because it was > cleanly encoded as a numeric variable. > > Regards, > > > > On Fri, Jan 27, 2023 at 9:43 AM Angelo Casalan <acasalan...@gmail.com> > wrote: > >> Hi , >> >> I hope you are well. I wish to ask how I can resolve this error: >> >> "CSV conversion error to int64: invalid value" >> >> >> To give an idea of my dataset. I have 4 csvs all placed in a local folder. >> >> >> The code below worked when importing: >> >> >> arrow<-open_dataset( >> sources="csv location", >> format="csv") >> >> >> However, when I run: >> >> >> arrow %>% count(column) %>% collect() >> nrow(arrow %>% collect) >> >> head(arrow %>% collect(),10 ) >> >> I always get the same error message: "Invalid: In CSV column #12: Row >> #580. CSV conversion error to int64: invalid value" >> >> I tried going back to open_dataset(,schema() ). Where the column that is >> giving me problems is set as utf8 or sometimes str in the schema argument. >> >> schema( >> col=utf8(), >> other nth columns >> ) >> >> But I still encounter the same problem. >> >> Using this code below fail to work either. >> >> arrow2<-arrow_table(arrow) >> >> Thanks in advance if you can help me. >> >> -- >> Regards, >> >> Angelo Casalan >> Statistical Methodology Unit >> > > > -- > Regards, > > Angelo Casalan > Statistical Methodology Unit > -- Regards, Angelo Casalan