Re: [Rd] Bug 16719: kruskal.test documentation for formula
> Thomas Levine writes: Thanks: this is now fixed in the trunk with c74945. Best -k > I submit a couple options for addressing bug 16719: kruskal.test > documentation for formula. > https://bugs.r-project.org/bugzilla3/show_bug.cgi?id=16719 > disallow-character.diff changes the documentation and error message > to indicate that factors are accepted. > allow-character.diff changes the kruskal.test functions to convert > character vectors to factors; documentation is updated accordingly. > I tested the updated functions with the examples in example.R. It is > based on the examples in the bug report. > If there is interest in applying either patch, especially the latter, > I want first to test the change on lots of existing programs that call > kruskal.test, to see if it causes any regressions. Is there a good place > to look for programs that use particular R functions? > I am having trouble building R, so I have so far tested these changes > only by patching revision 74631 (SVN head) and sourcing the resulting > kruskal.test.R in R 3.4.1 on OpenBSD 6.2. I thus have not tested the > R documentation files. > x[DELETED ATTACHMENT disallow-character.diff, plain text] > x[DELETED ATTACHMENT allow-character.diff, plain text] > x[DELETED ATTACHMENT example.R, plain text] > __ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Bug 16719: kruskal.test documentation for formula
Thomas Levine writes: > I have yet to find any example of my proposed changes causing a > regression. I believe that the most reasonable thing that it might > break is something that depends on either kruskal.test raising an > error or that depends on the specific text in the error message. > > If the limited testing is a concern, I could find a way to install > all of the packages and run all of their examples. In case my April message is hard to find, I have attached the packages redundantly to this email. Index: src/library/stats/R/kruskal.test.R === --- src/library/stats/R/kruskal.test.R (revision 74631) +++ src/library/stats/R/kruskal.test.R (working copy) @@ -46,7 +46,10 @@ x <- x[OK] g <- g[OK] if (!all(is.finite(g))) -stop("all group levels must be finite") +if (is.character(g)) +stop("all group levels must be finite; convert group to a factor") +else +stop("all group levels must be finite") g <- factor(g) k <- nlevels(g) if (k < 2L) Index: src/library/stats/man/kruskal.test.Rd === --- src/library/stats/man/kruskal.test.Rd (revision 74631) +++ src/library/stats/man/kruskal.test.Rd (working copy) @@ -22,11 +22,12 @@ \item{x}{a numeric vector of data values, or a list of numeric data vectors. Non-numeric elements of a list will be coerced, with a warning.} - \item{g}{a vector or factor object giving the group for the + \item{g}{a numeric vector or factor object giving the group for the corresponding elements of \code{x}. Ignored with a warning if \code{x} is a list.} \item{formula}{a formula of the form \code{response ~ group} where -\code{response} gives the data values and \code{group} a vector or +\code{response} gives the data values and \code{group} +a numeric vector or factor of the corresponding groups.} \item{data}{an optional matrix or data frame (or similar: see \code{\link{model.frame}}) containing the variables in the @@ -52,7 +53,8 @@ list, use \code{kruskal.test(list(x, ...))}. Otherwise, \code{x} must be a numeric data vector, and \code{g} must - be a vector or factor object of the same length as \code{x} giving + be a numeric vector or factor object of the same length as \code{x} + giving the group for the corresponding elements of \code{x}. } \value{ Index: src/library/stats/R/kruskal.test.R === --- src/library/stats/R/kruskal.test.R (revision 74631) +++ src/library/stats/R/kruskal.test.R (working copy) @@ -45,7 +45,7 @@ OK <- complete.cases(x, g) x <- x[OK] g <- g[OK] -if (!all(is.finite(g))) +if (!is.character(g) & !all(is.finite(g))) stop("all group levels must be finite") g <- factor(g) k <- nlevels(g) Index: src/library/stats/man/kruskal.test.Rd === --- src/library/stats/man/kruskal.test.Rd (revision 74631) +++ src/library/stats/man/kruskal.test.Rd (working copy) @@ -22,11 +22,13 @@ \item{x}{a numeric vector of data values, or a list of numeric data vectors. Non-numeric elements of a list will be coerced, with a warning.} - \item{g}{a vector or factor object giving the group for the + \item{g}{a character vector, numeric vector, or factor +giving the group for the corresponding elements of \code{x}. Ignored with a warning if \code{x} is a list.} \item{formula}{a formula of the form \code{response ~ group} where -\code{response} gives the data values and \code{group} a vector or +\code{response} gives the data values and \code{group} a +character vector, numeric vector, or factor of the corresponding groups.} \item{data}{an optional matrix or data frame (or similar: see \code{\link{model.frame}}) containing the variables in the @@ -52,7 +54,8 @@ list, use \code{kruskal.test(list(x, ...))}. Otherwise, \code{x} must be a numeric data vector, and \code{g} must - be a vector or factor object of the same length as \code{x} giving + be a numeric vector, character vector, or factor of the same length + as \code{x} giving the group for the corresponding elements of \code{x}. } \value{ __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Bug 16719: kruskal.test documentation for formula
Thomas Levine writes: > I submit a couple options for addressing bug 16719: kruskal.test > documentation for formula. > https://bugs.r-project.org/bugzilla3/show_bug.cgi?id=16719 > > disallow-character.diff changes the documentation and error message > to indicate that factors are accepted. > > allow-character.diff changes the kruskal.test functions to convert > character vectors to factors; documentation is updated accordingly. > > I tested the updated functions with the examples in example.R. It is > based on the examples in the bug report. > > If there is interest in applying either patch, especially the latter, > I want first to test the change on lots of existing programs that call > kruskal.test, to see if it causes any regressions. Is there a good place > to look for programs that use particular R functions? > > I am having trouble building R, so I have so far tested these changes > only by patching revision 74631 (SVN head) and sourcing the resulting > kruskal.test.R in R 3.4.1 on OpenBSD 6.2. I thus have not tested the > R documentation files. I thought it was important to test the changes on lots of existing programs that call kruskal.test, to see if it causes any regressions. CRAN testing I downloaded all CRAN packages and checked whether they contained the fixed expression "kruskal.test". (See "Makefile" and "all-kruskal.r".) I subsequently tested on all packages in CRAN that mentioned "kruskal.test". I patched the development version of R and built like this. ./configure --without-recommended-packages gmake cd src/library gmake all docs Rdfiles This command was helpful for cleaning the repository tree. svn status | sed -n 's/^\? *//p' | xargs rm -r I tested three versions of kruskal.test * SVN checkout 74844 with no modifications * SVN checkout 74844 with disallow-character patch * SVN checkout 74844 with allow-character patch The test is to run all of the examples from all of the packages that mention kruskal.test; with each example I ran, I recorded whether an error was raised. I ran all examples, regardless of whether the example mentioned kruskal.test. I compared the raising of an error among the three builds of R/kruskal.test. I ran these commands for each R version to build R, install the packages referencing kruskal.test, and run the tests in parallel. The procedure is available here; see the Makefile for more detail. https://thomaslevine.com/scm/r-patches/dir?ci=6ea0db4fde=kruskal.test-numeric/testing Run it with like this if you are so inclined. make -j 3 install make -j 3 test I found 100 packages that referenced kruskal.test. (This was based on a very crude string matching; some of these packages mentioned kruskal.test only in the documentation.) Of these 100 packages, I was able to install 39. I ran all of the examples in all of these packages, a total of 2361 examples. The successes and failures matched exactly among the three builds. 341 examples succeeded, and 2020 failed. https://thomaslevine.com/scm/r-patches/artifact/5df57add4414970a This is of course a lot of failures and a small proportion of the packages. I only installed the packages whose dependencies were easy for me to install (on OpenBSD 6.2), and some of those implicitly depended on other things that were not available; this explains all of the examples that raised errors. Review of r-help I also began to collect all kruskal.test calls that I could find in the r-help archives. Formatting them to be appropriate for evaluation is quite tedious, so I doubt I will follow through with this, but all of the calls appear to use ordinary character, numeric, or factor types, and none performed error catching, so no obvious problems with my proposed changes stand out. Furthermore, in looking through the r-help archives, I noted these messages on r-help where people were having trouble using kruskal.test and where I think either of my proposed changes would have helped them perform their desired Kruskal-Wallis rank sum tests. <1280836078385-2311712.p...@n4.nabble.com> <1280849183252-2312063.p...@n4.nabble.com> Conclusions --- I have yet to find any example of my proposed changes causing a regression. I believe that the most reasonable thing that it might break is something that depends on either kruskal.test raising an error or that depends on the specific text in the error message. If the limited testing is a concern, I could find a way to install all of the packages and run all of their examples. __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel