Re: [R-pkg-devel] How to store large data to be used in an R package?
On 25 March 2024 at 11:12, Jairo Hidalgo Migueles wrote: | I'm reaching out to seek some guidance regarding the storage of relatively | large data, ranging from 10-40 MB, intended for use within an R package. | Specifically, this data consists of regression and random forest models | crucial for making predictions within our R package. | | Initially, I attempted to save these models as internal data within the | package. While this approach maintains functionality, it has led to a | package size exceeding 20 MB. I'm concerned that this would complicate | submitting the package to CRAN in the future. | | I would greatly appreciate any suggestions or insights you may have on | alternative methods or best practices for efficiently storing and accessing | this data within our R package. Brooke and I wrote a paper on one way of addressing it via a 'data' package accessibly via an Additional_repositories: entry supported by a drat repo. See https://journal.r-project.org/archive/2017/RJ-2017-026/index.html for the paper which contains a nice slow walkthrough of all the details. Dirk -- dirk.eddelbuettel.com | @eddelbuettel | e...@debian.org __ R-package-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-package-devel
Re: [R-pkg-devel] How to store large data to be used in an R package?
В Mon, 25 Mar 2024 11:12:57 +0100 Jairo Hidalgo Migueles пишет: > Specifically, this data consists of regression and random forest > models crucial for making predictions within our R package. Apologies for asking a silly question, but is there a chance that these models are large by accident (e.g. because an object references a large environment containing multiple copies of the training dataset)? Or it is there really more than a million weights required to make predictions? > Initially, I attempted to save these models as internal data within > the package. While this approach maintains functionality, it has led > to a package size exceeding 20 MB. I'm concerned that this would > complicate submitting the package to CRAN in the future. The policy mentions the possibility of having a separate large data-only package. Since CRAN strives to archive all package versions, this data-only package will have to be updated as rarely as possible. You will need to ask CRAN for approval. If there is a significant amount of core functionality inside your package that does *not* require the large data (so that it can still be installed and used without the data), you can publish the data-only package yourself (e.g. using the 'drat' package), put it in Suggests and link to it in the Additional_repositories field of your DESCRIPTION. Alternatively, you can publish the data on Zenodo and offer to download it on first use. Make sure to (1) use tools::R_user_dir to determine where to put the files, (2) only download the files after the user explicitly agrees to it and (3) test as much of your package functionality as possible without requiring the data to be downloaded. -- Best regards, Ivan __ R-package-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-package-devel
[R-pkg-devel] How to store large data to be used in an R package?
Dear all, I'm reaching out to seek some guidance regarding the storage of relatively large data, ranging from 10-40 MB, intended for use within an R package. Specifically, this data consists of regression and random forest models crucial for making predictions within our R package. Initially, I attempted to save these models as internal data within the package. While this approach maintains functionality, it has led to a package size exceeding 20 MB. I'm concerned that this would complicate submitting the package to CRAN in the future. I would greatly appreciate any suggestions or insights you may have on alternative methods or best practices for efficiently storing and accessing this data within our R package. Jairo [[alternative HTML version deleted]] __ R-package-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-package-devel