You could report it as a bug at https://bugs.r-project.org/bugzilla3/
Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com > -----Original Message----- > From: Mathieu Basille [mailto:basille....@ase-research.org] > Sent: Thursday, August 01, 2013 10:31 AM > To: R help > Cc: William Dunlap > Subject: Re: [R] 'format' behaviour in a 'apply' call depending on > 'options(digits = K)' > > Nicely spotted, Bill! You went much farther than I could have. We can > basically summarize the problem with the following simple example: > > > format(9994, digits = 3) > [1] "9994" > > format(9995, digits = 3) > [1] " 9995" > > I'm still not sure why this is happening, though: The 'digits' parameter is > used to guess the number of characters of the output, but not to format the > actual number (i.e. all digits are still there anyway)? Is this case a bug, > or a feature? And if the latter, is it documented anywhere? I couldn't see > any hint of it in ?format, or ?options... The use of 'trim = TRUE' to fix > the problem seems to me like a workaround, not a real solution... > > Lastly, should I report this somewhere else? > > Thanks for your comment, > Mathieu. > > > Le 08/01/2013 12:36 PM, William Dunlap a écrit : > > I see the problem on both Linux and Windows, R-3.0.1. > > > vapply(as.numeric(9994:9995), function(x)format(x, scientific=FALSE, > > digits=3), "") > > [1] "9994" " 9995" > > > vapply(as.numeric(99994:99995), function(x)format(x, scientific=FALSE, > > digits=4), > "") > > [1] "99994" " 99995" > > > vapply(as.numeric(999994:999995), function(x)format(x, > > scientific=FALSE, digits=5), > "") > > [1] "999994" " 999995" > > > > The ones with the initial space are the ones that would round up to the > > next power of > 10 when > > rounded to the requested number of significant digits: > > > x <- as.numeric(1:5e5) > > > z <- vapply(x, function(x)format(x, scientific=FALSE, digits=3), "") > > > i <- grep(" ", z) > > > z[i] > > [1] " 9995" " 9996" " 9997" " 9998" " 9999" " 99950" " 99951" " > > 99952" > > [9] " 99953" " 99954" " 99955" " 99956" " 99957" " 99958" " 99959" " > > 99960" > > [17] " 99961" " 99962" " 99963" " 99964" " 99965" " 99966" " 99967" " > > 99968" > > [25] " 99969" " 99970" " 99971" " 99972" " 99973" " 99974" " 99975" " > > 99976" > > [33] " 99977" " 99978" " 99979" " 99980" " 99981" " 99982" " 99983" " > > 99984" > > [41] " 99985" " 99986" " 99987" " 99988" " 99989" " 99990" " 99991" " > > 99992" > > [49] " 99993" " 99994" " 99995" " 99996" " 99997" " 99998" " 99999" > > > print(x[i], digits=3) > > [1] 1e+04 1e+04 1e+04 1e+04 1e+04 1e+05 1e+05 1e+05 1e+05 1e+05 1e+05 > > 1e+05 > > [13] 1e+05 1e+05 1e+05 1e+05 1e+05 1e+05 1e+05 1e+05 1e+05 1e+05 1e+05 > > 1e+05 > > [25] 1e+05 1e+05 1e+05 1e+05 1e+05 1e+05 1e+05 1e+05 1e+05 1e+05 1e+05 > > 1e+05 > > [37] 1e+05 1e+05 1e+05 1e+05 1e+05 1e+05 1e+05 1e+05 1e+05 1e+05 1e+05 > > 1e+05 > > [49] 1e+05 1e+05 1e+05 1e+05 1e+05 1e+05 1e+05 > > > > Bill Dunlap > > Spotfire, TIBCO Software > > wdunlap tibco.com > > > > > >> -----Original Message----- > >> From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On > Behalf > >> Of Mathieu Basille > >> Sent: Thursday, August 01, 2013 8:31 AM > >> To: R help > >> Subject: Re: [R] 'format' behaviour in a 'apply' call depending on > >> 'options(digits = K)' > >> > >> This problem does not seem to be widely popular, but at least affects two > >> users (both on Linux, maybe a hint here?). To me, it looks like a bug (is > >> it a R bug, or a OS-related bug, I don't know). Should I forward it to > >> R-devel, or some other place where R gurus may have a chance to look at it? > >> > >> Mathieu. > >> > >> > >> Le 07/30/2013 02:34 PM, arun a écrit : > >>> Hi Mathieu > >>> yes, the original problem occurs in my system too. I am using R 3.0.1 on > >>> linux mint > 15. I > >> guess the default case would be trim=FALSE, but still it looks very > >> strange especially in > >> ?apply(), as it starts from " 99995" onwards. > >>> > >>> sessionInfo() > >>> R version 3.0.1 (2013-05-16) > >>> Platform: x86_64-unknown-linux-gnu (64-bit) > >>> > >>> locale: > >>> [1] LC_CTYPE=en_CA.UTF-8 LC_NUMERIC=C > >>> [3] LC_TIME=en_CA.UTF-8 LC_COLLATE=en_CA.UTF-8 > >>> [5] LC_MONETARY=en_CA.UTF-8 LC_MESSAGES=en_CA.UTF-8 > >>> [7] LC_PAPER=C LC_NAME=C > >>> [9] LC_ADDRESS=C LC_TELEPHONE=C > >>> [11] LC_MEASUREMENT=en_CA.UTF-8 LC_IDENTIFICATION=C > >>> > >>> attached base packages: > >>> [1] stats graphics grDevices utils datasets methods base > >>> > >>> other attached packages: > >>> [1] stringr_0.6.2 reshape2_1.2.2 > >>> > >>> loaded via a namespace (and not attached): > >>> [1] plyr_1.8 tools_3.0.1 > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> ----- Original Message ----- > >>> From: Mathieu Basille <basille....@ase-research.org> > >>> To: arun <smartpink...@yahoo.com> > >>> Cc: R help <r-help@r-project.org> > >>> Sent: Tuesday, July 30, 2013 2:29 PM > >>> Subject: Re: [R] 'format' behaviour in a 'apply' call depending on > >>> 'options(digits = K)' > >>> > >>> Thanks Arun for your answer. 'trim = TRUE' does indeed solve the symptoms > >>> of the problem, and this is the solution I'm currently using. However, it > >>> does not help to understand what the problem is, and what is the cause of > >>> it. > >>> > >>> Can you confirm that the original problem also occurs on your computer > >>> (and > >>> what is your OS)? It would be interesting since David is not able to > >>> reproduce the problem with Mac OS X. > >>> Mathieu. > >>> > >>> > >>> Le 07/30/2013 02:15 PM, arun a écrit : > >>>> Hi, > >>>> Try using trim=TRUE, in ?format() > >>>> options(digits=4) > >>>> > >>>> df2 <- data.frame(x = rnorm(110000), y = rnorm(110000), id = 1:110000) > >>>> df2$id2 <- apply(df2, 1, function(dfi) format(dfi["id"], > >>>> trim=TRUE,scientific = > FALSE)) > >>>> df2$id2[99990:100010] > >>>> # [1] "99990" "99991" "99992" "99993" "99994" "99995" "99996" > >>>> "99997" > >>>> # [9] "99998" "99999" "100000" "100001" "100002" "100003" "100004" > >>>> "100005" > >>>> #[17] "100006" "100007" "100008" "100009" "100010" > >>>> > >>>> > >>>> id2 <- format(1:110000, scientific = FALSE,trim=TRUE) > >>>> id2[99990:100010] > >>>> # [1] "99990" "99991" "99992" "99993" "99994" "99995" "99996" > >>>> "99997" > >>>> #[9] "99998" "99999" "100000" "100001" "100002" "100003" "100004" > "100005" > >>>> #[17] "100006" "100007" "100008" "100009" "100010" > >>>> A.K. > >>>> > >>>> > >>>> ----- Original Message ----- > >>>> From: Mathieu Basille <basille....@ase-research.org> > >>>> To: David Winsemius <dwinsem...@comcast.net> > >>>> Cc: r-help@r-project.org > >>>> Sent: Tuesday, July 30, 2013 2:07 PM > >>>> Subject: Re: [R] 'format' behaviour in a 'apply' call depending on > >>>> 'options(digits = K)' > >>>> > >>>> Thanks David for your interest. I have to admit that your answer puzzles > >>>> me > >>>> even more than before. It seems that the underlying problem is way beyond > >>>> my R skills... > >>>> > >>>> The generation of id2 is indeed quite demanding, especially compared to a > >>>> simple 'as.character' call. Anyway, since it seems to be system specific, > >>>> here is the sessionInfo() that I forgot to attach to my first message: > >>>> > >>>> R version 3.0.1 (2013-05-16) > >>>> Platform: x86_64-pc-linux-gnu (64-bit) > >>>> > >>>> locale: > >>>> [1] LC_CTYPE=fr_FR.UTF-8 LC_NUMERIC=C > >>>> [3] LC_TIME=fr_FR.UTF-8 LC_COLLATE=fr_FR.UTF-8 > >>>> [5] LC_MONETARY=fr_FR.UTF-8 LC_MESSAGES=fr_FR.UTF-8 > >>>> [7] LC_PAPER=C LC_NAME=C > >>>> [9] LC_ADDRESS=C LC_TELEPHONE=C > >>>> [11] LC_MEASUREMENT=fr_FR.UTF-8 LC_IDENTIFICATION=C > >>>> > >>>> attached base packages: > >>>> [1] stats graphics grDevices utils datasets methods base > >>>> > >>>> In brief: last stable R available under Debian Testing... Hopefully this > >>>> can help tracking down the problem. > >>>> Mathieu. > >>>> > >>>> > >>>> Le 07/30/2013 01:58 PM, David Winsemius a écrit : > >>>>> > >>>>> On Jul 30, 2013, at 9:01 AM, Mathieu Basille wrote: > >>>>> > >>>>>> Dear list, > >>>>>> > >>>>>> Here is a simple example in which the behaviour of 'format' does not > >>>>>> make sense > to > >> me. I have read the documentation and searched the archives, but nothing > >> pointed > me in > >> the right direction to understand this behaviour. Let's start with a > >> simple data frame: > >>>>>> > >>>>>> df1 <- data.frame(x = rnorm(110000), y = rnorm(110000), id = 1:110000) > >>>>>> > >>>>>> Let's now create a new variable 'id2' which is the character > >>>>>> representation of 'id'. > >> Note that I use 'scientific = FALSE' to ensure that long numbers such as > >> 100,000 are > not > >> formatted using their scientific representation (in this case 1e+05): > >>>>>> > >>>>>> df1$id2 <- apply(df1, 1, function(dfi) format(dfi["id"], scientific = > >>>>>> FALSE)) > >>>>>> > >>>>>> Let's have a look at part of the result: > >>>>>> > >>>>>> df1$id2[99990:100010] > >>>>>> [1] "99990" "99991" "99992" "99993" "99994" "99995" "99996" > >>>>>> [8] "99997" "99998" "99999" "100000" "100001" "100002" "100003" > >>>>>> [15] "100004" "100005" "100006" "100007" "100008" "100009" "100010" > >>>>> > >>>>> Some formating processes are carried out by system functions. In this > >>>>> case I am > >> unable to reproduce with the same code on a Mac OS 10.7.5/R 3.0.1 Patched > >>>>> > >>>>>> df1$id2[99990:100010] > >>>>> [1] "99990" "99991" "99992" "99993" "99994" "99995" "99996" > >>>>> "99997" > >>>>> [9] "99998" "99999" "100000" "100001" "100002" "100003" "100004" > "100005" > >>>>> [17] "100006" "100007" "100008" "100009" "100010" > >>>>> > >>>>> (I did notice that generation of the id2 variable seemed to take an > >>>>> inordinately > long > >> time.) > >>>>> > >>>>> -- David. > >>>>>> > >>>>>> So far, so good. Let's now play with the 'digits' option: > >>>>>> > >>>>>> options(digits = 4) > >>>>>> df2 <- data.frame(x = rnorm(110000), y = rnorm(110000), id = 1:110000) > >>>>>> df2$id2 <- apply(df2, 1, function(dfi) format(dfi["id"], scientific = > >>>>>> FALSE)) > >>>>>> df2$id2[99990:100010] > >>>>>> [1] "99990" "99991" "99992" "99993" "99994" " 99995" " 99996" > >>>>>> [8] " 99997" " 99998" " 99999" "100000" "100001" "100002" "100003" > >>>>>> [15] "100004" "100005" "100006" "100007" "100008" "100009" "100010" > >>>>>> > >>>>>> Notice the extra leading space from 99995 to 99999? To make sure it > >>>>>> only > >> happened there: > >>>>>> > >>>>>> df2$id2[which(df1$id2 != df2$id2)] > >>>>>> [1] " 99995" " 99996" " 99997" " 99998" " 99999" > >>>>>> > >>>>>> And just to make sure it only occurs in a 'apply' call, here is the > >>>>>> same directly on a > >> numeric vector: > >>>>>> > >>>>>> id2 <- format(1:110000, scientific = FALSE) > >>>>>> id2[99990:100010] > >>>>>> [1] " 99990" " 99991" " 99992" " 99993" " 99994" " 99995" " 99996" > >>>>>> [8] " 99997" " 99998" " 99999" "100000" "100001" "100002" "100003" > >>>>>> [15] "100004" "100005" "100006" "100007" "100008" "100009" "100010" > >>>>>> > >>>>>> Here the leading spaces are for every number, which makes sense to me. > >>>>>> Is there > >> anything I'm misinterpreting in the behaviour of 'format'? > >>>>>> Thanks in advance for any hint, > >>>>>> Mathieu. > >>>>>> > >>>>>> > >>>>>> PS: Some background for this question. It all comes from a Rmd > >>>>>> document, that > >> knitr consistently failed to process, while the R code was fine using > >> batch or > interactive > >> R. knitr uses 'options(digits = 4)' as opposed to 'options(digits = 7)' by > >> default in R, > which > >> made one of my function throw an error with knitr, but not with batch or > >> interactive > R. I > >> managed to solve the problem using 'trim = TRUE' in 'format', but I still > >> do not > >> understand what's going on... > >>>>>> If you're interested, see here for more details on the original > >>>>>> problem: > >> http://stackoverflow.com/questions/17866230/knitr-vs-interactive-r- > >> behaviour/17872176 > >>>>>> > >>>>>> > >>>>>> -- > >>>>>> > >>>>>> ~$ whoami > >>>>>> Mathieu Basille, PhD > >>>>>> > >>>>>> ~$ locate --details > >>>>>> University of Florida \\ > >>>>>> Fort Lauderdale Research and Education Center > >>>>>> (+1) 954-577-6314 > >>>>>> http://ase-research.org/basille > >>>>>> > >>>>>> ~$ fortune > >>>>>> « Le tout est de tout dire, et je manque de mots > >>>>>> Et je manque de temps, et je manque d'audace. » > >>>>>> -- Paul Éluard > >>>>>> > >>>>>> ______________________________________________ > >>>>>> R-help@r-project.org mailing list > >>>>>> https://stat.ethz.ch/mailman/listinfo/r-help > >>>>>> PLEASE do read the posting guide > >>>>>> http://www.R-project.org/posting-guide.html > >>>>>> and provide commented, minimal, self-contained, reproducible code. > >>>>> > >>>>> David Winsemius > >>>>> Alameda, CA, USA > >>>>> > >>>> > >>>> > >>>> > >>>>> > >>>>> On Jul 30, 2013, at 9:01 AM, Mathieu Basille wrote: > >>>>> > >>>>>> Dear list, > >>>>>> > >>>>>> Here is a simple example in which the behaviour of 'format' does not > >>>>>> make sense > to > >> me. I have read the documentation and searched the archives, but nothing > >> pointed > me in > >> the right direction to understand this behaviour. Let's start with a > >> simple data frame: > >>>>>> > >>>>>> df1 <- data.frame(x = rnorm(110000), y = rnorm(110000), id = 1:110000) > >>>>>> > >>>>>> Let's now create a new variable 'id2' which is the character > >>>>>> representation of 'id'. > >> Note that I use 'scientific = FALSE' to ensure that long numbers such as > >> 100,000 are > not > >> formatted using their scientific representation (in this case 1e+05): > >>>>>> > >>>>>> df1$id2 <- apply(df1, 1, function(dfi) format(dfi["id"], scientific = > >>>>>> FALSE)) > >>>>>> > >>>>>> Let's have a look at part of the result: > >>>>>> > >>>>>> df1$id2[99990:100010] > >>>>>> [1] "99990" "99991" "99992" "99993" "99994" "99995" "99996" > >>>>>> [8] "99997" "99998" "99999" "100000" "100001" "100002" "100003" > >>>>>> [15] "100004" "100005" "100006" "100007" "100008" "100009" "100010" > >>>>> > >>>>> Some formating processes are carried out by system functions. In this > >>>>> case I am > >> unable to reproduce with the same code on a Mac OS 10.7.5/R 3.0.1 Patched > >>>>> > >>>>>> df1$id2[99990:100010] > >>>>> [1] "99990" "99991" "99992" "99993" "99994" "99995" > >>>>> "99996" "99997" > >>>>> [9] "99998" "99999" "100000" "100001" "100002" "100003" > >>>>> "100004" > "100005" > >>>>> [17] "100006" "100007" "100008" "100009" "100010" > >>>>> > >>>>> (I did notice that generation of the id2 variable seemed to take an > >>>>> inordinately > long > >> time.) > >>>>> > >>>> > >>>> ______________________________________________ > >>>> R-help@r-project.org mailing list > >>>> https://stat.ethz.ch/mailman/listinfo/r-help > >>>> PLEASE do read the posting guide > >>>> http://www.R-project.org/posting-guide.html > >>>> and provide commented, minimal, self-contained, reproducible code. > >>>> > >>> > >> > >> ______________________________________________ > >> R-help@r-project.org mailing list > >> https://stat.ethz.ch/mailman/listinfo/r-help > >> PLEASE do read the posting guide > >> http://www.R-project.org/posting-guide.html > >> and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.