On Mon, 2005-11-14 at 19:07 +0200, Brandt, T. (Tobias) wrote: > > >-----Original Message----- > >From: Gabor Grothendieck [mailto:[EMAIL PROTECTED] > >Sent: 14 November 2005 06:21 PM > > > >On 11/14/05, Brandt, T. (Tobias) <[EMAIL PROTECTED]> wrote: > >> Hi > >> > >> Given that things like the following work > >> > >> > a <- c("-.1"," 2.7 ","B") > >> > a > >> [1] "-.1" " 2.7 " "B" > >> > as.numeric(a) > >> [1] -0.1 2.7 NA > >> Warning message: > >> NAs introduced by coercion > >> > > >> > >> I naively expected that the following would behave differently. > >> > >> > b <- c('10%', '-20%', '30.0%', '.40%') > >> > b > >> [1] "10%" "-20%" "30.0%" ".40%" > >> > as.numeric(b) > >> [1] NA NA NA NA > >> Warning message: > >> NAs introduced by coercion > > > >Try this: > > > >as.numeric(sub("%", "e-2", b)) > > > > Thank you, that accomplishes what I had intended. > > I would have thought though that the expression "53%" would be a fairly > standard representation of the number 0.53 and might be handled as such. Is > there a specific reason for avoiding this behaviour?
"53%" is a 'shorthand' character representation of a mathematical concept. To wit, the specific representation of a fraction using 100 as the denominator (ie. 53 / 100). The symbol '%' can be replaced by the word "percent", such as "53 percent", which is also a character representation. 0.53, in context, is a numeric representation of a proportion in the range of 0 - 1.0. > I can imagine that it might add unnecessary overhead to routines like > "as.numeric" which one would like to keep as fast as possible. > > Perhaps there are other areas though where it might be desirable? For > example I'm thinking of the read.table function for reading in csv files > since I have many of these that have been saved from excel and now contain > numbers in the "%" format. In Excel, numbers displayed with a '%' are what you see visually. However, the internal representation (how the value is actually stored in the program) is still as a floating point value, without the '%'. For example: > a <- 53 > a [1] 53 > sprintf("%.0f%%", a) [1] "53%" > is.numeric(a) [1] TRUE > is.numeric(sprintf("%.0f%%", a)) [1] FALSE Unfortunately (depending upon your perspective), Excel, and other similar programs, tend to export the visually displayed values and not the internal representations of them. Thus, as Gabor pointed out, you will need to do some 'editing' of the values before using them in R. You can either do this in Excel, by removing the "%" formatting, or post-import in R as Gabor has described. You need to keep separate the internal representation of a value and its printed or displayed representation for human readable consumption. as.numeric() does basically one thing and it does it well and properly. It is up to the user to ensure that it is passed the proper values. When that is not the case, it issues an appropriate warning message and returns NA. Of course, using Gabor's hint, you can also write your own variation of as.numeric(), creating a function that takes percent formatted values and converts them as you require. One of the many strengths of R, is that you can extend it to meet your own specific requirements when the base functions do not. HTH, Marc Schwartz ______________________________________________ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html