Hi Henrik, Thank you very much for looking into this. And thanks for the patch!
Yes, let's hope this is a typo that gets fixed. Regards, Andreas Henrik Bengtsson <henrik.bengts...@ucsf.edu> writes: > Thanks for insisting; I was wrong and I'm happy to see that there is > indeed code intended for named 'colClasses', which even goes back to > 2004. But as you report, then names only work when > length(colClasses) < cols (which also explains why I though it was not > supported). I'm not sure if that _strictly less than_ test is > intentional or a mistake, but I would propose the following patch: > > [HB-X201]{hb}: svn diff src\library\utils\R\readtable.R > Index: src/library/utils/R/readtable.R > =================================================================== > --- src/library/utils/R/readtable.R (revision 68642) > +++ src/library/utils/R/readtable.R (working copy) > @@ -139,7 +139,7 @@ > if (rlabp) col.names <- c("row.names", col.names) > > nmColClasses <- names(colClasses) > - if(length(colClasses) < cols) > + if(length(colClasses) <= cols) > if(is.null(nmColClasses)) { > colClasses <- rep_len(colClasses, cols) > } else { > > > Your example works with this patch. I've made it source():able so you > can try it out (if you cannot source() https://, then download the > file an source it locally): > > source("https://gist.githubusercontent.com/HenrikBengtsson/ed1eeb41a1b4d6c43b47/raw/ebe58f76e518dd014423bea466a5c93d2efd3c99/readtable-fix.R") > > kkk <- c("a\tb", > "3.14\tx") > > colClasses <- c(a="numeric", b="character") > data <- read.table(textConnection(kkk), > sep="\t", > header = TRUE, > colClasses = colClasses) > str(data) > ### 'data.frame': 1 obs. of 2 variables: > ### $ a: num 3.14 > ### $ b: chr "x" > > ## Does not work with utils::read.table(), but with patch > data <- read.table(textConnection(kkk), > sep="\t", > header = TRUE, > colClasses = rev(colClasses)) > str(data) > ### 'data.frame': 1 obs. of 2 variables: > ### $ a: num 3.14 > ### $ b: chr "x" > > Let's hope that the above is a (10-year old) typo, and changing a < to > a <= adds support for named 'colClasses', which is a really useful > functionality. > > /Henrik > > On Wed, Jul 8, 2015 at 6:42 PM, Andreas Leha > <andreas.l...@med.uni-goettingen.de> wrote: >> Hi Henrik, >> >> Thanks for your reply. >> >> I am not (yet) convinced, though. The help page for read.table >> mentions named colClasses and if I specify colClasses for not all >> columns, the names are taken into account: >> >> --8<---------------cut here---------------start------------->8--- >> kkk <- c("a\tb", >> "3.14\tx") >> str(read.table(textConnection(kkk), >> sep="\t", >> header = TRUE)) >> >> str(read.table(textConnection(kkk), >> sep="\t", >> header = TRUE, >> colClasses=c(b="character"))) >> --8<---------------cut here---------------end--------------->8--- >> >> What am I missing? >> >> Best, >> Andreas >> >> >> >> On 09/07/2015 02:21, Henrik Bengtsson wrote: >>> read.table() does not make use of names(colClasses) - only its values. >>> Because of this, ordering is critical, as you noted. It shouldn't be >>> too hard to add support for a named `colClasses` argument of >>> utils::read.table(), but someone needs to convince the R core team >>> that this is a good idea. >>> >>> As an alternative, see R.filesets::readDataFrame() for a >>> read.table()-like function that matches names(colClasses) to column >>> names, if they exists. >>> >>> /Henrik >>> (author of R.filesets) >>> >>> On Wed, Jul 8, 2015 at 5:41 PM, Andreas Leha >>> <andreas.l...@med.uni-goettingen.de> wrote: >>>> Hi all, >>>> >>>> Apparently, the colClasses argument to read.table needs to be in the >>>> order of the columns *even when it is named*. Why is that? And where >>>> would I find it in the documentation? >>>> >>>> Here is a MWE: >>>> >>>> --8<---------------cut here---------------start------------->8--- >>>> kkk <- c("a\tb", >>>> "3.14\tx") >>>> read.table(textConnection(kkk), >>>> sep="\t", >>>> header = TRUE) >>>> >>>> cclasses=c(b="character", >>>> a="numeric") >>>> >>>> read.table(textConnection(kkk), >>>> sep="\t", >>>> header = TRUE, >>>> colClasses = cclasses) ## <--- error >>>> >>>> read.table(textConnection(kkk), >>>> sep="\t", >>>> header = TRUE, >>>> colClasses = cclasses[order(names(cclasses))]) >>>> --8<---------------cut here---------------end--------------->8--- >>>> >>>> >>>> Thanks, >>>> Andreas >>>> >>>> ______________________________________________ >>>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>> PLEASE do read the posting guide >>>> http://www.R-project.org/posting-guide.html >>>> and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.