Re: [Rd] Issue with data() function

2020-10-25 Thread Therneau, Terry M., Ph.D. via R-devel
Duncan and others:  I was not being careful with my description.  This 
concerned tests of 
version 3.2-8, not yet on CRAN, in which I was trying some size-limiting 
measures.   My 
apologies for not making this clear.

   - I feel mild pressure to make the survival package smaller, per CRAN 
guidelines, and 
shrinking the data appears to be one way to approach that.  So a real point of 
the query 
is my attempts to do so.   (I am much more resistant to shrinking the extensive 
test suite 
or the vignettes.)
   -  The survival package has a lot of small data sets, and bundling them up 
into a 
single .rda file does save space, but it causes some issues with data().   The 
overall 
tarball goes from 7480 to 6100 in size (ls -s).

   Terry

On 10/24/20 4:28 AM, Duncan Murdoch wrote:
> On 23/10/2020 9:25 p.m., Therneau, Terry M., Ph.D. via R-devel wrote:
>> I found an issue with the data() command this evening when working on the 
>> survival 
>> package.
>>
>> 1. I have a lot of data sets in the package, almost all used in at least one 
>> vignette,
>> help file, or test.  As a space saving measure, I have bundled many of them 
>> together,
>> i.e., the file data/cancer.rda contains 19 data sets, many of them small. 
>> The resulting
>> file (using xz compression) is quite a bit smaller than the individual ones. 
>>  (I still get
>> a warning note about size from R CMD check, but I'm no longer 2x the limit.)
>>
>> 2. Consider the lung data set.  All of these fail:
>>      data(lung)
>>      data("lung")
>>      data(lung, package="survival")
>>
>>    a. The lung.Rd file had \usage{data(lung)}; that error was not caught by 
>> R CMD check.
>> (Several other .Rd files as well.)
>>
>>    b. In broader examples for teaching, I sometimes load data from other 
>> packages, e.g
>> data(aidssi, package="mstate").  But this does not work for survival.  (The 
>> larger
>> survival data sets that are in separate .rda files can be found.)
>>
>>    c. What does work is survival::lung.  Might it be useful to add a comment 
>> to data.Rd to
>> this effect?
>
> You don't describe how this dataset is being included in your package. Have 
> you moved it 
> from data/lung.rda to data/cancer.rda? Currently (in survival 3.2-7) each of 
> these works 
> for me:
>
>  library(survival); data(lung)
>
>  library(survival); data("lung")
>
>  # Without library(survival):
>  data(lung, package="survival")
>
> I think if the lung dataset is now being included in cancer.rda, you'd need
>
>   data(cancer, package="survival")
>
> or equivalent to load it (and the rest of the datasets there).
>
>>
>>
>> 3. Creating a separate package 'survivaldata' is of course one route, and is 
>> suggested in
>> the "Writing R Extensions" guide.  But this is not possible since survival 
>> is a
>> recommended package: it can't load any non-recommended package for it's 
>> tests or
>> vignettes.  Longer term, perhaps there is way around this constraint?
>
> Maybe the solution is to put your datasets into the "datasets" package, or 
> make 
> "survivaldata" a recommended package, or just leave things as they are and 
> ignore the 
> warnings about package size.  I think that's a negotiation you should have 
> with R Core.
>
> Duncan Murdoch


[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Issue with data() function

2020-10-24 Thread Duncan Murdoch

On 24/10/2020 2:00 p.m., Dirk Eddelbuettel wrote:


On 24 October 2020 at 05:28, Duncan Murdoch wrote:
| they are and ignore the warnings about package size.  I think that's a
| negotiation you should have with R Core.

s/R Core/CRAN/  ?


Yes, for that part.  The other suggestions need R Core agreement.

Duncan Murdoch

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Issue with data() function

2020-10-24 Thread Dirk Eddelbuettel


On 24 October 2020 at 05:28, Duncan Murdoch wrote:
| they are and ignore the warnings about package size.  I think that's a 
| negotiation you should have with R Core.

s/R Core/CRAN/  ?

Dirk

-- 
https://dirk.eddelbuettel.com | @eddelbuettel | e...@debian.org

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Issue with data() function

2020-10-24 Thread Duncan Murdoch

On 23/10/2020 9:25 p.m., Therneau, Terry M., Ph.D. via R-devel wrote:

I found an issue with the data() command this evening when working on the 
survival package.

1. I have a lot of data sets in the package, almost all used in at least one 
vignette,
help file, or test.  As a space saving measure, I have bundled many of them 
together,
i.e., the file data/cancer.rda contains 19 data sets, many of them small. The 
resulting
file (using xz compression) is quite a bit smaller than the individual ones.  
(I still get
a warning note about size from R CMD check, but I'm no longer 2x the limit.)

2. Consider the lung data set.  All of these fail:
     data(lung)
     data("lung")
     data(lung, package="survival")

   a. The lung.Rd file had \usage{data(lung)}; that error was not caught by R 
CMD check.
(Several other .Rd files as well.)

   b. In broader examples for teaching, I sometimes load data from other 
packages, e.g
data(aidssi, package="mstate").  But this does not work for survival.  (The 
larger
survival data sets that are in separate .rda files can be found.)

   c. What does work is survival::lung.  Might it be useful to add a comment to 
data.Rd to
this effect?


You don't describe how this dataset is being included in your package. 
Have you moved it from data/lung.rda to data/cancer.rda?  Currently (in 
survival 3.2-7) each of these works for me:


 library(survival); data(lung)

 library(survival); data("lung")

 # Without library(survival):
 data(lung, package="survival")

I think if the lung dataset is now being included in cancer.rda, you'd need

  data(cancer, package="survival")

or equivalent to load it (and the rest of the datasets there).




3. Creating a separate package 'survivaldata' is of course one route, and is 
suggested in
the "Writing R Extensions" guide.  But this is not possible since survival is a
recommended package: it can't load any non-recommended package for it's tests or
vignettes.  Longer term, perhaps there is way around this constraint?


Maybe the solution is to put your datasets into the "datasets" package, 
or make "survivaldata" a recommended package, or just leave things as 
they are and ignore the warnings about package size.  I think that's a 
negotiation you should have with R Core.


Duncan Murdoch

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Issue with data() function

2020-10-23 Thread Therneau, Terry M., Ph.D. via R-devel
I found an issue with the data() command this evening when working on the 
survival package.

1. I have a lot of data sets in the package, almost all used in at least one 
vignette, 
help file, or test.  As a space saving measure, I have bundled many of them 
together, 
i.e., the file data/cancer.rda contains 19 data sets, many of them small. The 
resulting 
file (using xz compression) is quite a bit smaller than the individual ones.  
(I still get 
a warning note about size from R CMD check, but I'm no longer 2x the limit.)

2. Consider the lung data set.  All of these fail:
    data(lung)
    data("lung")
    data(lung, package="survival")

  a. The lung.Rd file had \usage{data(lung)}; that error was not caught by R 
CMD check.  
(Several other .Rd files as well.)

  b. In broader examples for teaching, I sometimes load data from other 
packages, e.g 
data(aidssi, package="mstate").  But this does not work for survival.  (The 
larger 
survival data sets that are in separate .rda files can be found.)

  c. What does work is survival::lung.  Might it be useful to add a comment to 
data.Rd to 
this effect?


3. Creating a separate package 'survivaldata' is of course one route, and is 
suggested in 
the "Writing R Extensions" guide.  But this is not possible since survival is a 
recommended package: it can't load any non-recommended package for it's tests 
or 
vignettes.  Longer term, perhaps there is way around this constraint?

Terry T.

-- 
Terry M Therneau, PhD
Department of Health Science Research
Mayo Clinic
thern...@mayo.edu

"TERR-ree THUR-noh"


[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel