Re: [R] Issue with dataset inclusion in CRAN packages

2011-06-26 Thread Frank Harrell
I was wrong about this.  The dataset is small.  Most of the space is taken up
by a nice tutorial on rpart.plot.  Still I would favor linking to datasets
rather than duplicating part of them.
Thanks
Frank

Frank Harrell wrote:
 
 I was glad to see the new rpart.plot package by Stephen Milborrow.  I was
 however a bit concerned that Stephen distributed a dataset I created, and
 renamed the dataset (from titanic3 to ptitanic) in the process [with some
 justification, as some variables were omitted].  Fortunately Stephen
 included the script he used to download the dataset from our web site, and
 gave full credit to us.  What concerns me is that the rpart.plot package
 does not contain many functions but the package is as large as packages
 containing hundreds of functions.  This is due to the inclusion of the
 dataset.  I would prefer that authors provide the URL so that users can
 easily install the binary R binary dataframe directly from our web site
 (we even have an automated way to do this: require(Hmisc);
 getHdata(titanic3)).  This will allow users to profit from possible future
 data corrections as well as making the package much more compact.  Thanks
 for listening.  I'm writing to r-help because this may applied to other R
 packages as well.
 
 Frank
 


-
Frank Harrell
Department of Biostatistics, Vanderbilt University
--
View this message in context: 
http://r.789695.n4.nabble.com/Issue-with-dataset-inclusion-in-CRAN-packages-tp3626536p3626568.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Issue with dataset inclusion in CRAN packages

2011-06-26 Thread csrabak

Em 26/6/2011 17:43, Frank Harrell escreveu:

I was glad to see the new rpart.plot package by Stephen Milborrow.  I was
however a bit concerned that Stephen distributed a dataset I created, and
renamed the dataset (from titanic3 to ptitanic) in the process [with some
justification, as some variables were omitted].  Fortunately Stephen
included the script he used to download the dataset from our web site, and
gave full credit to us.  What concerns me is that the rpart.plot package
does not contain many functions but the package is as large as packages
containing hundreds of functions.  This is due to the inclusion of the
dataset.  I would prefer that authors provide the URL so that users can
easily install the binary R binary dataframe directly from our web site (we
even have an automated way to do this: require(Hmisc); getHdata(titanic3)).
This will allow users to profit from possible future data corrections as
well as making the package much more compact.  Thanks for listening.  I'm
writing to r-help because this may applied to other R packages as well.


Frank,

I can understand your concern and at first thought would even second it.

On the other hand, I think there are reasonable explanations why all 
authors prefer to include the datasets, especially if the data will be 
used in examples:


1) Docs written based in the datasets are synced with the dataframes 
offered with the package;


2) In several environments access to the web may be restricted and the 
getHdata or read.table(url) be not allowed.


my 0.01...

Regards,

--
Cesar Rabak

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.