On 16 April 2024 at 10:46, jing hua zhao wrote:
| Dear R-developers,
| 
| I came to a somewhat unexpected behaviour of read.csv() which is trivial but 
worthwhile to note -- my data involves a protein named "1433E" but to save 
space I drop the quote so it becomes,
| 
| Gene,SNP,prot,log10p
| YWHAE,13:62129097_C_T,1433E,7.35
| YWHAE,4:72617557_T_TA,1433E,7.73
| 
| Both read.cv() and readr::read_csv() consider prot(ein) name as (possibly 
confused by scientific notation) numeric 1433 which only alerts me when I tried 
to combine data,
| 
| all_data <- data.frame()
| for (protein in proteins[1:7])
| {
|    cat(protein,":\n")
|    f <- paste0(protein,".csv")
|    if(file.exists(f))
|    {
|      p <- read.csv(f)
|      print(p)
|      if(nrow(p)>0) all_data  <- bind_rows(all_data,p)
|    }
| }
| 
| proteins[1:7]
| [1] "1433B" "1433E" "1433F" "1433G" "1433S" "1433T" "1433Z"
| 
| dplyr::bind_rows() failed to work due to incompatible types nevertheless 
rbind() went ahead without warnings.

You may need to reconsider aiding read.csv() (and alternate reading
functions) by supplying column-type info instead of relying on educated
heuristic guesses which appear to fail here due to the nature of your data.

Other storage formats can store type info. That is generally safer and may be
an option too.

I think this was more of an email for r-help than r-devel.

Dirk

-- 
dirk.eddelbuettel.com | @eddelbuettel | e...@debian.org

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Reply via email to