Incidentally, "didn't work" is not very useful information. Please tell us exactly what error message or apparently aberrant result you received. Also, what do you get from:
sapply(your_dataframe, "class") nrow(your_dataframe) (as I suspect what you think it is, isn't). Cheers, Bert On Tue, Jan 30, 2024 at 11:01 AM Bert Gunter <bgunter.4...@gmail.com> wrote: > "I cannot change the data type from > boolean to numeric. I tried doing dataset$my_field = > as.numeric(dataset$my_field), I also tried to do dataset <- > dataset[complete.cases(dataset), ], didn't work either. " > > Sorry, but all I can say is: huh? > > > dt <- data.frame(a = c(NA,NA, FALSE, TRUE), b = 1:4) > > dt > a b > 1 NA 1 > 2 NA 2 > 3 FALSE 3 > 4 TRUE 4 > > sapply(dt, class) > a b > "logical" "integer" > > dt$a <- as.numeric(dt$a) > > dt > a b > 1 NA 1 > 2 NA 2 > 3 0 3 > 4 1 4 > > sapply(dt, class) > a b > "numeric" "integer" > > So either I'm missing something or you are. Happy to be corrected and > chastised if the former. > > Cheers, > Bert > > > On Tue, Jan 30, 2024 at 10:41 AM Paul Bernal <paulberna...@gmail.com> > wrote: > >> Dear friend Duncan, >> >> Thank you so much for your kind reply. Yes, that is exactly what is >> happening, there are a lot of NA values at the start, so R assumes that >> the >> field is of type boolean. The challenge that I am facing is that I want to >> read into R an Excel file that has many sheets (46 in this case) but I >> wanted to combine all 46 sheets into a single dataframe (since the columns >> are exactly the same for all 46 sheets). The rio package does this nicely, >> the problem is that, once I have the full dataframe (which amounts to >> roughly 2.98 million rows total), I cannot change the data type from >> boolean to numeric. I tried doing dataset$my_field = >> as.numeric(dataset$my_field), I also tried to do dataset <- >> dataset[complete.cases(dataset), ], didn't work either. >> >> The only thing that worked for me was to take a single sheed and through >> the read_excel function use the guess_max parameter and set it to a >> sufficiently large number (a number >= to the total amount of the full >> merged dataset). I want to automate the merging of the N number of Excel >> sheets so that I don't have to be manually doing it. Unless there is a way >> to accomplish something similar to what rio's package function import_list >> does, that is able to keep the field's numeric data type nature. >> >> Cheers, >> Paul >> >> El mar, 30 ene 2024 a las 12:23, Duncan Murdoch (< >> murdoch.dun...@gmail.com>) >> escribió: >> >> > On 30/01/2024 11:10 a.m., Paul Bernal wrote: >> > > Dear friends, >> > > >> > > Hope you are doing well. I am currently using R version 4.3.2, and I >> > have a >> > > .xlsx file that has 46 sheets on it. I basically combined all 46 >> sheets >> > > and read them as a single dataframe in R using package rio. >> > > >> > > I read a solution using package readlx, as suggested in a >> StackOverflow >> > > discussion as follows: >> > > df <- read_excel(path = filepath, sheet = sheet_name, guess_max = >> > 100000). >> > > Now, when you have so many sheets (46 in my case) in an Excel file, >> the >> > rio >> > > methodology is more practical. >> > > >> > > This is what I did: >> > > path = >> > > >> > >> "C:/Users/myuser/Documents/DataScienceF/Forecast_and_Econometric_Analysis_FIGI >> > > (4).xlsx" >> > > figidat = import_list(path, rbind = TRUE) #here figidat refers to my >> > dataset >> > > >> > > Now, it successfully imports and merges all records, however, some >> fields >> > > (despite being numeric), R interprets as a boolean field. >> > > >> > > Here is the structure of the field that is causing me problems (I >> > apologize >> > > for the length): >> > > structure(list(StoreCharges = c(NA, NA, NA, NA, NA, NA, NA, NA, >> > > NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, >> > > NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, >> > > NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, >> > > NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, >> > > NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, >> > > NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, >> > > NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, >> > > NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, >> > ... >> > > FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, NA, NA, >> > > FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, >> > > FALSE, FALSE, FALSE)), class = c("tbl_df", "tbl", "data.frame" >> > > ), row.names = c(NA, -7033L)) >> > > >> > > As you can see, when I do the dput, it gives me a bunch of TRUE and >> FALSE >> > > values, when in reality I have records with value $0, records with >> > amounts >> > >> $0 and also a bunch of blank records. >> > > >> > > Any help will be greatly appreciated. >> > >> > I don't know how read_excel() determines column types, but some >> > functions look only at the first n rows to guess the type. It appears >> > you have a lot of NA values at the start. That is a logical value, so >> > that might be what is going wrong. >> > >> > In read.table() and related functions, you can specify the types of >> > column explicitly. It sounds as though that's what you should do if >> > read_excel() offers that as a possibility. >> > >> > Duncan Murdoch >> > >> >> [[alternative HTML version deleted]] >> >> ______________________________________________ >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.