Re: [Rd] read.csv

2024-04-27 Thread Kevin Coombes
I was horrified when I saw John Weinstein's article about Excel turning gene names into dates. Mainly because I had been complaining about that phenomenon for years, and it never remotely occurred to me that you could get a publication out of it. I eventually rectified the situation by publishing

Re: [Rd] read.csv

2024-04-16 Thread Reed A. Cartwright
Gene names being misinterpreted by spreadsheet software (read.csv is no different) is a classic issue in bioinformatics. It seems like every practitioner ends up encountering this issue in due time. E.g. https://pubmed.ncbi.nlm.nih.gov/15214961/

Re: [Rd] read.csv

2024-04-16 Thread Ben Bolker
Tangentially, your code will be more efficient if you add the data files to a *list* one by one and then apply bind_rows or do.call(rbind,...) after you have accumulated all of the information (see chapter 2 of the _R Inferno_). This may or may not be practically important in your particular

Re: [Rd] read.csv

2024-04-16 Thread Dirk Eddelbuettel
As an aside, the odd format does not seem to bother data.table::fread() which also happens to be my personally preferred workhorse for these tasks: > fname <- "/tmp/r/filename.csv" > read.csv(fname) Gene SNP prot log10p 1 YWHAE 13:62129097_C_T 1433 7.35 2 YWHAE 4:72617557_T_TA

Re: [Rd] read.csv

2024-04-16 Thread Duncan Murdoch
On 16/04/2024 7:36 a.m., Rui Barradas wrote: Às 11:46 de 16/04/2024, jing hua zhao escreveu: Dear R-developers, I came to a somewhat unexpected behaviour of read.csv() which is trivial but worthwhile to note -- my data involves a protein named "1433E" but to save space I drop the quote so it

Re: [Rd] read.csv

2024-04-16 Thread peter dalgaard
Hum... This boils down to > as.numeric("1.23e") [1] 1.23 > as.numeric("1.23e-") [1] 1.23 > as.numeric("1.23e+") [1] 1.23 which in turn comes from this code in src/main/util.c (function R_strtod) if (*p == 'e' || *p == 'E') { int expsign = 1; switch(*++p) { case '-':

Re: [Rd] read.csv

2024-04-16 Thread Rui Barradas
Às 11:46 de 16/04/2024, jing hua zhao escreveu: Dear R-developers, I came to a somewhat unexpected behaviour of read.csv() which is trivial but worthwhile to note -- my data involves a protein named "1433E" but to save space I drop the quote so it becomes, Gene,SNP,prot,log10p

Re: [Rd] read.csv

2024-04-16 Thread Dirk Eddelbuettel
On 16 April 2024 at 10:46, jing hua zhao wrote: | Dear R-developers, | | I came to a somewhat unexpected behaviour of read.csv() which is trivial but worthwhile to note -- my data involves a protein named "1433E" but to save space I drop the quote so it becomes, | | Gene,SNP,prot,log10p |

[Rd] read.csv

2024-04-16 Thread jing hua zhao
Dear R-developers, I came to a somewhat unexpected behaviour of read.csv() which is trivial but worthwhile to note -- my data involves a protein named "1433E" but to save space I drop the quote so it becomes, Gene,SNP,prot,log10p YWHAE,13:62129097_C_T,1433E,7.35

[Rd] read.csv quadratic time in number of columns

2023-03-29 Thread Toby Hocking
Dear R-devel, A number of people have observed anecdotally that read.csv is slow for large number of columns, for example: https://stackoverflow.com/questions/7327851/read-csv-is-extremely-slow-in-reading-csv-files-with-large-numbers-of-columns I did a systematic comparison of read.csv with

Re: [Rd] read.csv, worrying behaviour?

2021-02-25 Thread Kevin R. Coombes
I believe this is documented behavior. The 'read.csv' function is a front-end to 'read.table' with different default values. IN this particular case, read.csv sets fill = TRUE, which means that it is supposed to fill incomplete lines with NA's. It also sets header=TRUE, which is presumably

Re: [Rd] read.csv, worrying behaviour?

2021-02-25 Thread TAYLOR, Benjamin (BLACKPOOL TEACHING HOSPITALS NHS FOUNDATION TRUST) via R-devel
Dear all I've been using R for around 16 years now and I've only just become aware of a behaviour of read.csv that I find worrying which is why I'm contacting this list. A simplified example of the behaviour is as follows I created a "test.csv" file containing the following lines:

Re: [Rd] read.csv reads more rows than indicated by wc -l

2012-12-20 Thread Matthew Dowle
Ben, Somewhere on my wish/TO DO list is for someone to rewrite read.table for better robustness *and* efficiency ... Wish granted. New in data.table 1.8.7 : = New function fread(), a fast and friendly file reader. * header, skip, nrows, sep and colClasses are all auto detected. *

Re: [Rd] read.csv reads more rows than indicated by wc -l

2012-12-19 Thread Ben Bolker
G See gsee000 at gmail.com writes: When I have a csv file that is more than 6 lines long, not including the header, and one of the fields is blank for the last few lines, and there is an extra comma on of the lines with the blank field, read.csv() makes creates an extra line. I attached

Re: [Rd] read.csv behaviour

2011-09-28 Thread Ben Bolker
Mehmet Suzen msuzen at mango-solutions.com writes: This might be obvious but I was wondering if anyone knows quick and easy way of writing out a CSV file with varying row lengths, ideally an initial data read from a CSV file which has the same format. See example below.

[Rd] read.csv behaviour

2011-09-27 Thread Mehmet Suzen
This might be obvious but I was wondering if anyone knows quick and easy way of writing out a CSV file with varying row lengths, ideally an initial data read from a CSV file which has the same format. See example below. I found it quite strange that R cannot write it in one go, so one must

Re: [Rd] read.csv and FileEncoding in Windows version of R 2.13.0

2011-06-06 Thread Alexander Peterhansl
,header=FALSE) (As you'll see, the file does have a byte order mark.) Regards, Alex -Original Message- From: Duncan Murdoch [mailto:murdoch.dun...@gmail.com] Sent: Wednesday, June 01, 2011 7:35 PM To: Alexander Peterhansl Cc: R-devel@r-project.org Subject: Re: [Rd] read.csv

[Rd] read.csv and FileEncoding in Windows version of R 2.13.0

2011-06-01 Thread Alexander Peterhansl
Dear R-devel List: read.csv() seems to have changed in R version 2.13.0 as compared to version 2.12.2 when reading in simple CSV files. Suppose I read in a 2-column CSV file (test.csv), say 1, a 2, b If file is encoded as UTF-8 (on Windows 7), then under R 2.13.0

Re: [Rd] read.csv and FileEncoding in Windows version of R 2.13.0

2011-06-01 Thread Duncan Murdoch
On 01/06/2011 6:00 PM, Alexander Peterhansl wrote: Dear R-devel List: read.csv() seems to have changed in R version 2.13.0 as compared to version 2.12.2 when reading in simple CSV files. Suppose I read in a 2-column CSV file (test.csv), say 1, a 2, b If file is encoded as UTF-8 (on Windows

Re: [Rd] read.csv trap

2011-03-03 Thread Ben Bolker
Ben Bolker bbolker at gmail.com writes: On 02/11/2011 03:37 PM, Laurent Gatto wrote: On 11 February 2011 19:39, Ben Bolker bbolker at gmail.com wrote: [snip] Bump. Is there any opinion about this from R-core?? Will I be scolded if I submit this as a bug ... ?? What is

Re: [Rd] read.csv trap

2011-02-11 Thread Ben Bolker
Bump. It's been a week since I posted this to r-devel. Any thoughts/discussion? Would R-core be irritated if I submitted a bug report? cheers Ben Original Message Subject: read.csv trap Date: Fri, 04 Feb 2011 11:16:36 -0500 From: Ben Bolker bbol...@gmail.com To:

Re: [Rd] read.csv trap

2011-02-11 Thread Ken.Williams
On 2/11/11 1:39 PM, Ben Bolker bbol...@gmail.com wrote: [snip] Original Message Subject: read.csv trap Date: Fri, 04 Feb 2011 11:16:36 -0500 From: Ben Bolker bbol...@gmail.com To: r-de...@stat.math.ethz.ch r-de...@stat.math.ethz.ch, David Earn e...@math.mcmaster.ca [snip]

Re: [Rd] read.csv trap

2011-02-11 Thread Laurent Gatto
On 11 February 2011 19:39, Ben Bolker bbol...@gmail.com wrote: [snip] What is dangerous/confusing is that R silently **wraps** longer lines if fill=TRUE (which is the default for read.csv).  I encountered this when working with a colleague on a long, messy CSV file that had some phantom

Re: [Rd] read.csv trap

2011-02-11 Thread Ben Bolker
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 02/11/2011 03:37 PM, Laurent Gatto wrote: On 11 February 2011 19:39, Ben Bolker bbol...@gmail.com wrote: [snip] What is dangerous/confusing is that R silently **wraps** longer lines if fill=TRUE (which is the default for read.csv). I

[Rd] read.csv trap

2011-02-04 Thread Ben Bolker
This is not specifically a bug, but an (implicitly/obscurely) documented behavior of read.csv (or read.table with fill=TRUE) that can be quite dangerous/confusing for users. I would love to hear some discussion from other users and/or R-core about this ... As always, I apologize if I have

[Rd] read.csv('/dev/stdin') fails (PR#14218)

2010-02-20 Thread egoldlust
Full_Name: Eric Goldlust Version: 2.10.1 (2009-12-14) x86_64-unknown-linux-gnu OS: Linux 2.6.9-67.0.1.ELsmp x86_64 Submission from: (NULL) (64.22.160.1) After upgrading to from 2.9.1 to 2.10.1, I get unexpected results when calling read.csv('/dev/stdin'). These problems go away when I call

[Rd] read.csv confused by newline characters in header (PR#14103)

2009-12-02 Thread g . russell
Full_Name: George Russell Version: 2.10.0 OS: Microsoft Windows XP Service Pack 2 Submission from: (NULL) (217.111.3.131) The following code (typed into R --vanilla) testString - 'B1\nB2\n1\n' con - textConnection(testString) tab - read.csv(con,stringsAsFactors = FALSE) produces a data frame

Re: [Rd] read.csv confused by newline characters in header (PR#14103)

2009-12-02 Thread Peter Dalgaard
g.russ...@eos-solutions.com wrote: Full_Name: George Russell Version: 2.10.0 OS: Microsoft Windows XP Service Pack 2 Submission from: (NULL) (217.111.3.131) The following code (typed into R --vanilla) testString - 'B1\nB2\n1\n' con - textConnection(testString) tab -

Re: [Rd] read.csv

2009-06-25 Thread Petr Savicky
On Sun, Jun 14, 2009 at 02:56:01PM -0400, Gabor Grothendieck wrote: If read.csv's colClasses= argument is NOT used then read.csv accepts double quoted numerics: 1: read.csv(stdin()) 0: A,B 1: 1,1 2: 2,2 3: A B 1 1 1 2 2 2 However, if colClasses is used then it seems that it does

Re: [Rd] read.csv

2009-06-25 Thread Petr Savicky
I am sorry for not including the attachment mentioned in my previous email. Attached now. Petr. --- R-devel/src/library/utils/R/readtable.R 2009-05-18 17:53:08.0 +0200 +++ R-devel-readtable/src/library/utils/R/readtable.R 2009-06-25 10:20:06.0 +0200 @@ -143,9 +143,6 @@

Re: [Rd] read.csv

2009-06-16 Thread Petr Savicky
On Sun, Jun 14, 2009 at 09:21:24PM +0100, Ted Harding wrote: On 14-Jun-09 18:56:01, Gabor Grothendieck wrote: If read.csv's colClasses= argument is NOT used then read.csv accepts double quoted numerics: 1: read.csv(stdin()) 0: A,B 1: 1,1 2: 2,2 3: A B 1 1 1 2 2 2

[Rd] read.csv

2009-06-14 Thread Gabor Grothendieck
If read.csv's colClasses= argument is NOT used then read.csv accepts double quoted numerics: 1: read.csv(stdin()) 0: A,B 1: 1,1 2: 2,2 3: A B 1 1 1 2 2 2 However, if colClasses is used then it seems that it does not: read.csv(stdin(), colClasses = numeric) 0: A,B 1: 1,1 2: 2,2 3: Error in

Re: [Rd] read.csv

2009-06-14 Thread Ted Harding
On 14-Jun-09 18:56:01, Gabor Grothendieck wrote: If read.csv's colClasses= argument is NOT used then read.csv accepts double quoted numerics: 1: read.csv(stdin()) 0: A,B 1: 1,1 2: 2,2 3: A B 1 1 1 2 2 2 However, if colClasses is used then it seems that it does not:

Re: [Rd] read.csv

2009-06-14 Thread Gabor Grothendieck
On Sun, Jun 14, 2009 at 4:21 PM, Ted Hardingted.hard...@manchester.ac.uk wrote: Or am I missing something?!! The point of this is that the current behavior is not desirable since you can't have quoted numeric fields if you specify colClasses = numeric yet you can if you don't. The concepts are