On Mon, May 11, 2009 at 05:06:38PM +0200, Martin Maechler wrote: [...] > The version I have committed a few hours ago is indeed a much > re-simplified version, using as.character(.) explicitly > and consequently no longer providing the extra optional > arguments that we have had for a couple of days. > > Keeping such a basic function factor() as simple as possible > seems a good strategy to me.
OK. I understand the argument of simplicity. So, factor(x) is just a compressed encoding of as.character(x), where each value is stored only once. This sounds good to me. Let me go back to the original purpose of this thread: suggestion for extending ?as.factor I think that somewhere in the help page, we could have something like Using factor() to a numeric vector should be done with caution. The information in x is preserved to the extent to which it is preserved in as.character(x). If this leads to too many different levels due to minor differences among the input numbers, it is suggested to use something like factor(signif(x, digits)) or factor(round(x, digits)), where the number of decimal digits appropriate for a given application should be used. Let me point out that the following sentence from Warning is not exactly correct as it is in svn at the moment. So, i suggest to add the word "approximately" to the place marked with square brackets and add one more sentence of explanation marked also by square brackets. To transform a factor \code{f} to [approximately] its original numeric values, \code{as.numeric(levels(f))[f]} is recommended and slightly more efficient than \code{as.numeric(as.character(f))}. [Note that the original values may be extracted only to the precision used in as.character(x), which is typically 15 decimal digits.] Petr. ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel