[Rd] stringsAsFactors = FALSE
Hi all, I love the option to not automatically convert strings into factors, but there are three places that the current option doesn't work where I think it should: options(stringsAsFactors = FALSE) str(expand.grid(letters)) str(type.convert(letters)) df - read.fwf(textConnection(paste(letters,collapse=\n)), 1) str(df) I think type.convert and read.fwf can be fixed by giving them a stringsAsFactors argument and then using asis = !stringsAsFactors (like read.table). The key lines in expand.grid would seem to be if (!is.factor(x) is.character(x)) x - factor(x, levels = unique(x)) but I'm not sure why they are being converted to factors in the first place. Regards, Hadley -- http://had.co.nz/ __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] stringsAsFactors = FALSE
On Mon, 17 Nov 2008, hadley wickham wrote: Hi all, I love the option to not automatically convert strings into factors, but there are three places that the current option doesn't work where I think it should: Perhaps you mean 'when I would like it to'? Things *should* work as documented, surely? options(stringsAsFactors = FALSE) str(expand.grid(letters)) str(type.convert(letters)) df - read.fwf(textConnection(paste(letters,collapse=\n)), 1) str(df) I get str(df) 'data.frame': 26 obs. of 1 variable: $ V1: chr a b c d ... so what is wrong with that? read.fwf just calls read.table, so the default options of read.table apply. I think type.convert and read.fwf can be fixed by giving them a stringsAsFactors argument and then using asis = !stringsAsFactors (like read.table). Seems to me that there is nothing wrong with read.fwf. For type.convert() we could have the default as.is = !default.stringsAsFactors() but I think a strong case needs to be made to change the documented behaviour. The key lines in expand.grid would seem to be if (!is.factor(x) is.character(x)) x - factor(x, levels = unique(x)) but I'm not sure why they are being converted to factors in the first place. Nor I am, but it goes back to at least r2107, over 10 years ago. I don't see much problem with adding a 'stringsAsFactors' argument there. -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] stringsAsFactors = FALSE
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of hadley wickham Sent: Monday, November 17, 2008 5:10 AM To: r-devel@r-project.org Subject: [Rd] stringsAsFactors = FALSE ... The key lines in expand.grid would seem to be if (!is.factor(x) is.character(x)) x - factor(x, levels = unique(x)) but I'm not sure why they are being converted to factors in the first place. I think expand.grid converts input strings to factors so they retain the order they have in the input. (Note that the levels argument is unique(x), not the sort(unique(x)) that data.frame uses.) People generally give expand.grid sorted input and expect it to not alter the order (the order of the levels affects tables and and some plots). lapply(expand.grid(Grade=c(Bad,Good,Better),Size=c(Small,Medium ,Large)), levels) $Grade [1] BadGood Better $Size [1] Small Medium Large lapply(data.frame(Grade=c(Bad,Good,Better),Size=c(Small,Medium ,Large)), levels) $Grade [1] BadBetter Good $Size [1] Large Medium Small I have nothing against adding the stringsAsFactors argument to expand.grid. Bill Dunlap TIBCO Software Inc - Spotfire Division wdunlap tibco.com __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] stringsAsFactors = FALSE
On Mon, 17 Nov 2008, Prof Brian Ripley wrote: On Mon, 17 Nov 2008, hadley wickham wrote: Hi all, I love the option to not automatically convert strings into factors, but there are three places that the current option doesn't work where I think it should: Perhaps you mean 'when I would like it to'? Things *should* work as documented, surely? options(stringsAsFactors = FALSE) str(expand.grid(letters)) str(type.convert(letters)) df - read.fwf(textConnection(paste(letters,collapse=\n)), 1) str(df) I get str(df) 'data.frame': 26 obs. of 1 variable: $ V1: chr a b c d ... so what is wrong with that? read.fwf just calls read.table, so the default options of read.table apply. I think type.convert and read.fwf can be fixed by giving them a stringsAsFactors argument and then using asis = !stringsAsFactors (like read.table). Seems to me that there is nothing wrong with read.fwf. For type.convert() we could have the default as.is = !default.stringsAsFactors() but I think a strong case needs to be made to change the documented behaviour. It seems only to be used in RODBC (where I have some extra control pending), simecol and BioC:beadarraySNP (both with as.is=TRUE) and reshape (author, one Hadley Wickham). Given it is documented as a help utilty, it seems up to the caller to set the behaviour he wants. The key lines in expand.grid would seem to be if (!is.factor(x) is.character(x)) x - factor(x, levels = unique(x)) but I'm not sure why they are being converted to factors in the first place. Nor I am, but it goes back to at least r2107, over 10 years ago. I don't see much problem with adding a 'stringsAsFactors' argument there. -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] stringsAsFactors = FALSE
On Mon, Nov 17, 2008 at 11:06 AM, William Dunlap [EMAIL PROTECTED] wrote: From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of hadley wickham Sent: Monday, November 17, 2008 5:10 AM To: r-devel@r-project.org Subject: [Rd] stringsAsFactors = FALSE ... The key lines in expand.grid would seem to be if (!is.factor(x) is.character(x)) x - factor(x, levels = unique(x)) but I'm not sure why they are being converted to factors in the first place. I think expand.grid converts input strings to factors so they retain the order they have in the input. (Note that the levels argument is unique(x), not the sort(unique(x)) that data.frame uses.) People generally give expand.grid sorted input and expect it to not alter the order (the order of the levels affects tables and and some plots). Ah, that makes sense. (Although the conversion to factors just seems to be a convenient way to achieve the desired effect in this case - there's no reason they have to be factors in the output) Hadley -- http://had.co.nz/ __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] stringsAsFactors = FALSE
On Mon, Nov 17, 2008 at 9:03 AM, Prof Brian Ripley [EMAIL PROTECTED] wrote: On Mon, 17 Nov 2008, hadley wickham wrote: Hi all, I love the option to not automatically convert strings into factors, but there are three places that the current option doesn't work where I think it should: Perhaps you mean 'when I would like it to'? Things *should* work as documented, surely? In an ideal world, I think things should be documented *and* consistent. options(stringsAsFactors = FALSE) str(expand.grid(letters)) str(type.convert(letters)) df - read.fwf(textConnection(paste(letters,collapse=\n)), 1) str(df) I get str(df) 'data.frame': 26 obs. of 1 variable: $ V1: chr a b c d ... so what is wrong with that? read.fwf just calls read.table, so the default options of read.table apply. Ok, that's weird. I get factors. I think type.convert and read.fwf can be fixed by giving them a stringsAsFactors argument and then using asis = !stringsAsFactors (like read.table). Seems to me that there is nothing wrong with read.fwf. For type.convert() we could have the default as.is = !default.stringsAsFactors() but I think a strong case needs to be made to change the documented behaviour. Well, my intuition was that type.convert should mirror the behaviour of read.table, since it is what does the conversion behind the scenes. I can of course change my own code. The key lines in expand.grid would seem to be if (!is.factor(x) is.character(x)) x - factor(x, levels = unique(x)) but I'm not sure why they are being converted to factors in the first place. Nor I am, but it goes back to at least r2107, over 10 years ago. I don't see much problem with adding a 'stringsAsFactors' argument there. Great, thanks. Hadley -- http://had.co.nz/ __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] stringsAsFactors = FALSE
William Dunlap wrote: but I'm not sure why they are being converted to factors in the first place. I think expand.grid converts input strings to factors so they retain the order they have in the input. Yep. These things do matter. Incidentally, I recently got burned by cooking an example using expand.grid, writing the data to a file with write.table and reading it back in during lecture with read.table. Odds ratio turned upside down... -- O__ Peter Dalgaard Ă˜ster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~ - ([EMAIL PROTECTED]) FAX: (+45) 35327907 __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] stringsAsFactors = FALSE
WD == William Dunlap [EMAIL PROTECTED] on Mon, 17 Nov 2008 09:06:49 -0800 writes: From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of hadley wickham Sent: Monday, November 17, 2008 5:10 AM To: r-devel@r-project.org Subject: [Rd] stringsAsFactors = FALSE ... The key lines in expand.grid would seem to be if (!is.factor(x) is.character(x)) x - factor(x, levels = unique(x)) but I'm not sure why they are being converted to factors in the first place. WD I think expand.grid converts input strings to factors so WD they retain the order they have in the input. (Note WD that the levels argument is unique(x), not the WD sort(unique(x)) that data.frame uses.) People generally WD give expand.grid sorted input and expect it to not alter WD the order (the order of the levels affects tables and WD and some plots). WD lapply(expand.grid(Grade=c(Bad,Good,Better),Size=c(Small,Medium WD ,Large)), levels) $Grade [1] Bad Good Better WD $Size [1] Small Medium Large WD lapply(data.frame(Grade=c(Bad,Good,Better),Size=c(Small,Medium WD ,Large)), levels) $Grade [1] Bad Better Good WD $Size [1] Large Medium Small WD I have nothing against adding the stringsAsFactors WD argument to expand.grid. That's fine, but I am VERY MUCH against making the default of that argument depend on the ominous default.stringsAsFactors() which is determined by getOption(stringsAsFactors). Why would I hate such a change very much : Note that we have here an option which would change the result of a standard R (S) function expand.grid(). Whereas I already did not like that change when it happened for read.table(), in that case, one could at least say, that read.table() is in some way platform dependent {(because it typically depends on files of the local platform, but as we know this is not true even there; even now, if I tell my students, or a book author tells her readers to use read.table(http://.;) I can no longer be sure that my students get the same data frame, because they could have different settings of getOptions(stringsAsFactors) horrible, really!! )} Please, R should stay as much a functional language as possible and sensible! If we start having global options more and more influence the result of standard R functions, we are going down a very slippery rope, and one that is making R even more idionsyncratic than it already needs to be. Please, no !! Rather revert the read.table() default of stringsAsFactors to not depend on the option, and maybe provide another set of short forms of the various read.table(*, stringsAsFactors=FALSE) incantations such that all the factor-haters-string-lovers can use these short forms... At the very first DSC, 1999, Joe Eaton, author of GNU octave, told us how he regretted that he had started going down that bad path, because users had started asking for it. In the extreme case, we are ending up with a language that depends on a whole huge status setting, and what a given function computes can no longer be predicted by looking at the function calls, unless you simultaneously know that whole status. Please, No !! Martin Maechler, ETH Zurich WD Bill Dunlap TIBCO Software Inc - Spotfire Division WD wdunlap tibco.com WD __ WD R-devel@r-project.org mailing list WD https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel