Re: [Rd] Issue with data() function
Duncan and others: I was not being careful with my description. This concerned tests of version 3.2-8, not yet on CRAN, in which I was trying some size-limiting measures. My apologies for not making this clear. - I feel mild pressure to make the survival package smaller, per CRAN guidelines, and shrinking the data appears to be one way to approach that. So a real point of the query is my attempts to do so. (I am much more resistant to shrinking the extensive test suite or the vignettes.) - The survival package has a lot of small data sets, and bundling them up into a single .rda file does save space, but it causes some issues with data(). The overall tarball goes from 7480 to 6100 in size (ls -s). Terry On 10/24/20 4:28 AM, Duncan Murdoch wrote: > On 23/10/2020 9:25 p.m., Therneau, Terry M., Ph.D. via R-devel wrote: >> I found an issue with the data() command this evening when working on the >> survival >> package. >> >> 1. I have a lot of data sets in the package, almost all used in at least one >> vignette, >> help file, or test. As a space saving measure, I have bundled many of them >> together, >> i.e., the file data/cancer.rda contains 19 data sets, many of them small. >> The resulting >> file (using xz compression) is quite a bit smaller than the individual ones. >> (I still get >> a warning note about size from R CMD check, but I'm no longer 2x the limit.) >> >> 2. Consider the lung data set. All of these fail: >> data(lung) >> data("lung") >> data(lung, package="survival") >> >> a. The lung.Rd file had \usage{data(lung)}; that error was not caught by >> R CMD check. >> (Several other .Rd files as well.) >> >> b. In broader examples for teaching, I sometimes load data from other >> packages, e.g >> data(aidssi, package="mstate"). But this does not work for survival. (The >> larger >> survival data sets that are in separate .rda files can be found.) >> >> c. What does work is survival::lung. Might it be useful to add a comment >> to data.Rd to >> this effect? > > You don't describe how this dataset is being included in your package. Have > you moved it > from data/lung.rda to data/cancer.rda? Currently (in survival 3.2-7) each of > these works > for me: > > library(survival); data(lung) > > library(survival); data("lung") > > # Without library(survival): > data(lung, package="survival") > > I think if the lung dataset is now being included in cancer.rda, you'd need > > data(cancer, package="survival") > > or equivalent to load it (and the rest of the datasets there). > >> >> >> 3. Creating a separate package 'survivaldata' is of course one route, and is >> suggested in >> the "Writing R Extensions" guide. But this is not possible since survival >> is a >> recommended package: it can't load any non-recommended package for it's >> tests or >> vignettes. Longer term, perhaps there is way around this constraint? > > Maybe the solution is to put your datasets into the "datasets" package, or > make > "survivaldata" a recommended package, or just leave things as they are and > ignore the > warnings about package size. I think that's a negotiation you should have > with R Core. > > Duncan Murdoch [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Issue with data() function
On 24/10/2020 2:00 p.m., Dirk Eddelbuettel wrote: On 24 October 2020 at 05:28, Duncan Murdoch wrote: | they are and ignore the warnings about package size. I think that's a | negotiation you should have with R Core. s/R Core/CRAN/ ? Yes, for that part. The other suggestions need R Core agreement. Duncan Murdoch __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Issue with data() function
On 24 October 2020 at 05:28, Duncan Murdoch wrote: | they are and ignore the warnings about package size. I think that's a | negotiation you should have with R Core. s/R Core/CRAN/ ? Dirk -- https://dirk.eddelbuettel.com | @eddelbuettel | e...@debian.org __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Issue with data() function
On 23/10/2020 9:25 p.m., Therneau, Terry M., Ph.D. via R-devel wrote: I found an issue with the data() command this evening when working on the survival package. 1. I have a lot of data sets in the package, almost all used in at least one vignette, help file, or test. As a space saving measure, I have bundled many of them together, i.e., the file data/cancer.rda contains 19 data sets, many of them small. The resulting file (using xz compression) is quite a bit smaller than the individual ones. (I still get a warning note about size from R CMD check, but I'm no longer 2x the limit.) 2. Consider the lung data set. All of these fail: data(lung) data("lung") data(lung, package="survival") a. The lung.Rd file had \usage{data(lung)}; that error was not caught by R CMD check. (Several other .Rd files as well.) b. In broader examples for teaching, I sometimes load data from other packages, e.g data(aidssi, package="mstate"). But this does not work for survival. (The larger survival data sets that are in separate .rda files can be found.) c. What does work is survival::lung. Might it be useful to add a comment to data.Rd to this effect? You don't describe how this dataset is being included in your package. Have you moved it from data/lung.rda to data/cancer.rda? Currently (in survival 3.2-7) each of these works for me: library(survival); data(lung) library(survival); data("lung") # Without library(survival): data(lung, package="survival") I think if the lung dataset is now being included in cancer.rda, you'd need data(cancer, package="survival") or equivalent to load it (and the rest of the datasets there). 3. Creating a separate package 'survivaldata' is of course one route, and is suggested in the "Writing R Extensions" guide. But this is not possible since survival is a recommended package: it can't load any non-recommended package for it's tests or vignettes. Longer term, perhaps there is way around this constraint? Maybe the solution is to put your datasets into the "datasets" package, or make "survivaldata" a recommended package, or just leave things as they are and ignore the warnings about package size. I think that's a negotiation you should have with R Core. Duncan Murdoch __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] Issue with data() function
I found an issue with the data() command this evening when working on the survival package. 1. I have a lot of data sets in the package, almost all used in at least one vignette, help file, or test. As a space saving measure, I have bundled many of them together, i.e., the file data/cancer.rda contains 19 data sets, many of them small. The resulting file (using xz compression) is quite a bit smaller than the individual ones. (I still get a warning note about size from R CMD check, but I'm no longer 2x the limit.) 2. Consider the lung data set. All of these fail: data(lung) data("lung") data(lung, package="survival") a. The lung.Rd file had \usage{data(lung)}; that error was not caught by R CMD check. (Several other .Rd files as well.) b. In broader examples for teaching, I sometimes load data from other packages, e.g data(aidssi, package="mstate"). But this does not work for survival. (The larger survival data sets that are in separate .rda files can be found.) c. What does work is survival::lung. Might it be useful to add a comment to data.Rd to this effect? 3. Creating a separate package 'survivaldata' is of course one route, and is suggested in the "Writing R Extensions" guide. But this is not possible since survival is a recommended package: it can't load any non-recommended package for it's tests or vignettes. Longer term, perhaps there is way around this constraint? Terry T. -- Terry M Therneau, PhD Department of Health Science Research Mayo Clinic thern...@mayo.edu "TERR-ree THUR-noh" [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel