Re: [R-pkg-devel] Fast Matrix Serialization in R?

2024-05-08 Thread Simon Urbanek
Sameh,

if it's a matrix, that's easy as you can write it directly which is the fastest 
possible way without compression - e.g. quick proof of concept:

n <- 2^2
A <- matrix(runif(n), ncol = sqrt(n))

## write (dim + payload)
con <- file(description = "matrix_file", open = "wb")
system.time({
writeBin(d <- dim(A), con)
dim(A)=NULL
writeBin(A, con)
dim(A)=d
})
close(con)

## read
con <- file(description = "matrix_file", open = "rb")
system.time({
d <- readBin(con, 1L, 2)
A1 <- readBin(con, 1, d[1] * d[2])
dim(A1) <- d
})
close(con)
identical(A, A1)

   user  system elapsed 
  0.931   2.713   3.644 
   user  system elapsed 
  0.089   1.360   1.451 
[1] TRUE

So it's really just limited by the speed of your disk, parallelization won't 
help here.

Note that in general you get faster read times by using compression as most 
data is reasonably compressible, so that's where parallelization can be useful. 
There are plenty of package with more tricks like mmapping the files etc., but 
the above is just base R.

Cheers,
Simon



> On 9/05/2024, at 3:20 PM, Sameh Abdulah  wrote:
> 
> Hi,
> 
> I need to serialize and save a 20K x 20K matrix as a binary file. This 
> process is significantly slower in R compared to Python (4X slower).
> 
> I'm not sure about the best approach to optimize the below code. Is it 
> possible to parallelize the serialization function to enhance performance?
> 
> 
>  n <- 2^2
>  cat("Generating matrices ... ")
>  INI.TIME <- proc.time()
>  A <- matrix(runif(n), ncol = m)
>  END_GEN.TIME <- proc.time()
>  arg_ser <- serialize(object = A, connection = NULL)
> 
>  END_SER.TIME <- proc.time()
>  con <- file(description = "matrix_file", open = "wb")
>  writeBin(object = arg_ser, con = con)
>  close(con)
>  END_WRITE.TIME <- proc.time()
>  con <- file(description = "matrix_file", open = "rb")
>  par_raw <- readBin(con, what = raw(), n = file.info("matrix_file")$size)
>  END_READ.TIME <- proc.time()
>  B <- unserialize(connection = par_raw)
>  close(con)
>  END_DES.TIME <- proc.time()
>  TIME <- END_GEN.TIME - INI.TIME
>  cat("Generation time", TIME[3], " seconds.")
> 
>  TIME <- END_SER.TIME - END_GEN.TIME
>  cat("Serialization time", TIME[3], " seconds.")
> 
>  TIME <- END_WRITE.TIME - END_SER.TIME
>  cat("Writting time", TIME[3], " seconds.")
> 
>  TIME <- END_READ.TIME - END_WRITE.TIME
>  cat("Read time", TIME[3], " seconds.")
> 
>  TIME <- END_DES.TIME - END_READ.TIME
>  cat("Deserialize time", TIME[3], " seconds.")
> 
> 
> 
> 
> Best,
> --Sameh
> 
> -- 
> 
> This message and its contents, including attachments are intended solely 
> for the original recipient. If you are not the intended recipient or have 
> received this message in error, please notify me immediately and delete 
> this message from your computer system. Any unauthorized use or 
> distribution is prohibited. Please consider the environment before printing 
> this email.
> 
>   [[alternative HTML version deleted]]
> 
> __
> R-package-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-package-devel
> 

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] is Fortran write still strictly forbidden?

2024-05-08 Thread Berwin A Turlach
Hi Jisca,

On Wed, 8 May 2024 10:37:28 +0200
Jisca Huisman  wrote:

> I like to use write() in Fortran code [...] But from 'writing R
> extensions' it seems that there have been quite a few changes with
> respect to support for Fortran code, and it currently reads:
> 
> 6.5.1 Printing from Fortran
> 
> On many systems Fortran|write|and|print|statements can be used, but
> the output may not interleave well with that of C, and may be
> invisible onGUIinterfaces. They are not portable and best avoided.

I am not aware that there were any recent changes regarding printing
from Fortran recently, or that it was every strictly forbidden (perhaps
it is for packages that are submitted to CRAN?).  In fact, R-exts.texi
for R 1.0.0 states pretty much the same as what you quoted from the
current WRE manual:

  @subsection Printing from Fortran
  @cindex Printing from C

  In theory Fortran @code{write} and @code{print} statements can be
  used, but the output may not interleave well with that of C, and will
  be invisible on GUI interfaces.  They are best avoided.

  Three subroutines are provided to ease the output of information from
  Fortran code.

  @example
  subroutine dblepr(@var{label}, @var{nchar}, @var{data}, @var{ndata})
  subroutine realpr(@var{label}, @var{nchar}, @var{data}, @var{ndata})
  subroutine intpr (@var{label}, @var{nchar}, @var{data}, @var{ndata})
  @end example

Cheers,

Berwin

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] [EXTERNAL] Re: Fast Matrix Serialization in R?

2024-05-08 Thread Sameh Abdulah
Thanks!


I want to save data from a matrix to a file and then retrieve it later while 
running R code, all within R.

As long as the compression doesn't result in data loss, it should be suitable 
for my needs.


Best,
--Sameh



From: Dirk Eddelbuettel 
Date: Thursday, May 9, 2024 at 6:35 AM
To: Sameh Abdulah 
Cc: r-package-devel@r-project.org 
Subject: [EXTERNAL] Re: [R-pkg-devel] Fast Matrix Serialization in R?

On 9 May 2024 at 03:20, Sameh Abdulah wrote:
| I need to serialize and save a 20K x 20K matrix as a binary file.

Hm that is an incomplete specification: _what_ do you want to do with it?
Read it back in R?  Share it with other languages (like Python) ? I.e. what
really is your use case?  Also, you only seem to use readBin / writeBin. Why
not readRDS / saveRDS which at least give you compression?

If it is to read/write from / to R look into the qs package. It is good. The
README.md at its repo has benchmarks: 
https://urldefense.com/v3/__https://github.com/traversc/qs__;!!Nmw4Hv0!zZitnMd5aMqDWhLShhC5Npmd8pkisFVejC3grX2YGSGWFZihVaHAezYyo5nXbAwIdqCj2T8Nbk-Yofxa-y8$
  If you
want to index into the stored data look into fst. Else also look at databases

Dirk

--
dirk.eddelbuettel.com | @eddelbuettel | e...@debian.org

-- 

This message and its contents, including attachments are intended solely 
for the original recipient. If you are not the intended recipient or have 
received this message in error, please notify me immediately and delete 
this message from your computer system. Any unauthorized use or 
distribution is prohibited. Please consider the environment before printing 
this email.

[[alternative HTML version deleted]]

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] Fast Matrix Serialization in R?

2024-05-08 Thread Dirk Eddelbuettel


On 9 May 2024 at 03:20, Sameh Abdulah wrote:
| I need to serialize and save a 20K x 20K matrix as a binary file.

Hm that is an incomplete specification: _what_ do you want to do with it?
Read it back in R?  Share it with other languages (like Python) ? I.e. what
really is your use case?  Also, you only seem to use readBin / writeBin. Why
not readRDS / saveRDS which at least give you compression?

If it is to read/write from / to R look into the qs package. It is good. The
README.md at its repo has benchmarks: https://github.com/traversc/qs If you
want to index into the stored data look into fst. Else also look at databases

Dirk

-- 
dirk.eddelbuettel.com | @eddelbuettel | e...@debian.org

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


[R-pkg-devel] Fast Matrix Serialization in R?

2024-05-08 Thread Sameh Abdulah
Hi,

I need to serialize and save a 20K x 20K matrix as a binary file. This process 
is significantly slower in R compared to Python (4X slower).

I'm not sure about the best approach to optimize the below code. Is it possible 
to parallelize the serialization function to enhance performance?


  n <- 2^2
  cat("Generating matrices ... ")
  INI.TIME <- proc.time()
  A <- matrix(runif(n), ncol = m)
  END_GEN.TIME <- proc.time()
  arg_ser <- serialize(object = A, connection = NULL)

  END_SER.TIME <- proc.time()
  con <- file(description = "matrix_file", open = "wb")
  writeBin(object = arg_ser, con = con)
  close(con)
  END_WRITE.TIME <- proc.time()
  con <- file(description = "matrix_file", open = "rb")
  par_raw <- readBin(con, what = raw(), n = file.info("matrix_file")$size)
  END_READ.TIME <- proc.time()
  B <- unserialize(connection = par_raw)
  close(con)
  END_DES.TIME <- proc.time()
  TIME <- END_GEN.TIME - INI.TIME
  cat("Generation time", TIME[3], " seconds.")

  TIME <- END_SER.TIME - END_GEN.TIME
  cat("Serialization time", TIME[3], " seconds.")

  TIME <- END_WRITE.TIME - END_SER.TIME
  cat("Writting time", TIME[3], " seconds.")

  TIME <- END_READ.TIME - END_WRITE.TIME
  cat("Read time", TIME[3], " seconds.")

  TIME <- END_DES.TIME - END_READ.TIME
  cat("Deserialize time", TIME[3], " seconds.")




Best,
--Sameh

-- 

This message and its contents, including attachments are intended solely 
for the original recipient. If you are not the intended recipient or have 
received this message in error, please notify me immediately and delete 
this message from your computer system. Any unauthorized use or 
distribution is prohibited. Please consider the environment before printing 
this email.

[[alternative HTML version deleted]]

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] is Fortran write still strictly forbidden?

2024-05-08 Thread Erin Hodgess
Hi Jisca:

I have used the write successfully.  I’m not sure if this matters or not,
but I am using WSL 2 with Ubuntu 22.04 installed.  It works fine with R =>
4.0.

Hope this helps.

Sincerely,
Erin

Erin Hodgess, PhD
mailto: erinm.hodg...@gmail.com


On Wed, May 8, 2024 at 6:49 PM Jisca Huisman 
wrote:

> Hello,
>
> I like to use write() in Fortran code to combine text with some integers
> & doubles, to pass runtime information to R in a way that is prettier
> and more legible than with intpr() & dblepr(). In the past any calls to
> write() were strictly forbidden in Fortran code, as apparently it messed
> something up internally (I cannot recall the details). But from 'writing
> R extensions' it seems that there have been quite a few changes with
> respect to support for Fortran code, and it currently reads:
>
>
> 6.5.1 Printing from Fortran
>
> On many systems Fortran|write|and|print|statements can be used, but the
> output may not interleave well with that of C, and may be invisible
> onGUIinterfaces. They are not portable and best avoided.
>
>
> To be more specific, would the subroutine below be allowed? Is it needed
> to declare R >= 4.0 (?) in the package DESCRIPTION (& then use labelpr()
> instead of intpr() ?) Is there an alternative without write() to get the
> same result?
>
>
> subroutine Rprint_pretty(iter, x)
>  integer, intent(IN) :: iter
>  double precision, intent(IN) :: x
>  integer :: date_time_values(8), nchar, IntDummy(0)
>  character(len=8) :: time_now
>  character(len=200) :: msg_to_R
>
>  call date_and_time(VALUES=date_time_values)
>  write(time_now, '(i2.2,":",i2.2,":",i2.2)') date_time_values(5:7)
>  write(msg_to_R, '(a8, " i: ", i5, "  value: ", f8.2)') time_now,
> iter, x
>
>  nchar = len(trim(msg_to_R))
> call intpr(trim(msg_to_R), nchar, IntDummy, 0)
>
>end subroutine Rprint_pretty
>
>
> Thanks!
>
>
> [[alternative HTML version deleted]]
>
> __
> R-package-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-package-devel
>

[[alternative HTML version deleted]]

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


[R-pkg-devel] flang doesn't support derived types

2024-05-08 Thread Othman El Hammouchi
Hello everyone, this is my first post on the mailing list (as well as the first 
package I'm attempting to publish), so please forgive any obvious errors.

I'm using a lot of Fortran code which relies on derived types to manage data 
structures. The package compiles fine on Windows, Linux and Mac and passes all 
checks, including the CI pipeline on GitHub provided by the RStudio folks. 
However, upon submission I received an automatic reply shortly afterwards 
saying the build had failed on CRAN's servers for Debian. The log gives the 
following error:

flang/lib/Lower/CallInterface.cpp:949: not yet implemented: support for 
polymorphic types

I tried searching this mailing list as well as the llvm docs for a precedent or 
explanation, without much success. The baffling thing is that my code doesn't 
use polymorphism at all as far as I can tell, they're just DTs with bound 
procedures (I don't think the Fortran standard considers this polymorphic?). My 
question is: what can I do about it? This is a compiler issue, but I doesn't 
seem from "Writing R Extensions" that CRAN allows you to check the installed 
compiler and abort the install if an unsupported one is detected. However, it's 
not really a package developer's fault that the compiler doesn't support the 
standard.

Thanks in advance,

Othman El Hammouchi
[[alternative HTML version deleted]]

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


[R-pkg-devel] is Fortran write still strictly forbidden?

2024-05-08 Thread Jisca Huisman
Hello,

I like to use write() in Fortran code to combine text with some integers 
& doubles, to pass runtime information to R in a way that is prettier 
and more legible than with intpr() & dblepr(). In the past any calls to 
write() were strictly forbidden in Fortran code, as apparently it messed 
something up internally (I cannot recall the details). But from 'writing 
R extensions' it seems that there have been quite a few changes with 
respect to support for Fortran code, and it currently reads:


6.5.1 Printing from Fortran

On many systems Fortran|write|and|print|statements can be used, but the 
output may not interleave well with that of C, and may be invisible 
onGUIinterfaces. They are not portable and best avoided.


To be more specific, would the subroutine below be allowed? Is it needed 
to declare R >= 4.0 (?) in the package DESCRIPTION (& then use labelpr() 
instead of intpr() ?) Is there an alternative without write() to get the 
same result?


subroutine Rprint_pretty(iter, x)
     integer, intent(IN) :: iter
     double precision, intent(IN) :: x
     integer :: date_time_values(8), nchar, IntDummy(0)
     character(len=8) :: time_now
     character(len=200) :: msg_to_R

     call date_and_time(VALUES=date_time_values)
     write(time_now, '(i2.2,":",i2.2,":",i2.2)') date_time_values(5:7)
     write(msg_to_R, '(a8, " i: ", i5, "  value: ", f8.2)') time_now, 
iter, x

     nchar = len(trim(msg_to_R))
    call intpr(trim(msg_to_R), nchar, IntDummy, 0)

   end subroutine Rprint_pretty


Thanks!


[[alternative HTML version deleted]]

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] Overcoming CRAN's 5mb vendoring requirement

2024-05-08 Thread Vladimir Dergachev




On Wed, 8 May 2024, Josiah Parry wrote:




Yes, prqlr is a great Rust-based package! My other Rust based packages that
are on CRAN are based, in part on prqlr.




If there are many packages based on Rust that require common code, would 
it make sense to make a single "rust" compatibility package that they can 
depend on ?


best

Vladimir Dergachev

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] Overcoming CRAN's 5mb vendoring requirement

2024-05-08 Thread Ben Bolker
   Zenodo and OSF (see e.g. 
) are both non-profit 
organizations that support archiving -- that is, they both make stronger 
guarantees of permanent availability than GitHub does. Possibly Software 
Heritage https://www.softwareheritage.org/features/ ?


  Zenodo has convenient GitHub integration.

On 2024-05-08 4:01 p.m., Josiah Parry wrote:

Thank you, Ivan, for your really thoughtful feedback! I really appreciate
it!

- I'll see if there are any base R packages that support SHA-2 or SHA-3.
- I'll see if I can get the configure.ac to make the appropriate Rscript
call for configure.win.
   - I think the idea of having a single `confgure.ac` file to generate
   both configure and configure.win is nice. Guidance with GitHub
actions and
   ChatGPT is essentially a must for me since my bash is remedial at best.

Regarding the permanent storage requirement, I find it to be very strange.
I've personally never heard of Zenodo until just now! Does the CRAN team
have recommendations for what is considered "as sufficiently reliable?" I
have repos that have persisted for almost 10 years. I think that is
sufficiently reliable!

The requirement to avoid GitHub feels surprisingly anachronistic given how
central it is to the vast majority of software development. The alternative
I can think of is to create a CDN on cloudflare or something to store the
file independently.

Are there any avenues to have CRAN clarify their positions outside of
one-off processes? It would be quite unfortunate to go through all the work
of creating a way to build, store, and retrieve the dependencies only for
CRAN to decide they don't support it.


On Wed, May 8, 2024 at 3:32 PM Ivan Krylov  wrote:


В Wed, 8 May 2024 14:08:36 -0400
Josiah Parry  пишет:


With ChatGPT's ability to write autoconf, I *think *I have something
that can work.


You don't have to write autoconf if your configure.ac is mostly a plain
shell script. You can write the configure script itself. Set the PATH
and then exec "${R_HOME}/bin/Rscript" tools/configure.R (in the
regular, non-multiarch configure for Unix-like systems) or exec
"${R_HOME}/bin${R_ARCH_BIN}/Rscript.exe" tools/configure.R (in
configure.win, which you'll also need). You've already wrote the rest
of the code in a language you know well: R.

Autoconf would be useful if you had system-specific dependencies with
the need to perform lots of compile tests. Those would have been a pain
to set up in R. Here you mostly need sys.which() instead of
AC_CHECK_PROGS and command -v.


The configure file runs tools/get-deps.R which will download the
dependencies from the repo if available and verify the checksums.


One of the pain points is the need for a strong, cryptographically
secure hash. MD5 is, unfortunately, no longer such a hash. In a cmake
build, you would be able to use cmake's built in strong hashes (such as
SHA-2 or SHA-3). The CRAN policy doesn't explicitly forbid MD5; it only
requires a "checksum". If you figure out a way to use a strong hash
from tools/configure.R for the downloaded tarball, please let us know.


If the checksums don't match, an error is thrown, otherwise it can
continue. I believe this meets the requirements of CRAN?


The other important CRAN requirement is to store the vendor tarball
somewhere as permanent as CRAN itself (see the caveats at the bottom of
https://cran.r-project.org/web/packages/using_rust.html), that is, not
GitHub. I think that Zenodo counts as a sufficiently reliable store.

--
Best regards,
Ivan



[[alternative HTML version deleted]]

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


--
Dr. Benjamin Bolker
Professor, Mathematics & Statistics and Biology, McMaster University
Director, School of Computational Science and Engineering
(Acting) Graduate chair, Mathematics & Statistics
> E-mail is sent at my convenience; I don't expect replies outside of 
working hours.


__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] Overcoming CRAN's 5mb vendoring requirement

2024-05-08 Thread Josiah Parry
That's a good point! My apologies for not making that abundantly clear.

Regardless, I think it is a fair ask to not submit massive tarballs of
dependencies.
Clarifying how we might be able to store the dependencies *outside *of CRAN
would be good to figure out.
This would help packages like polars be able to exist on CRAN.

On Wed, May 8, 2024 at 4:06 PM Uwe Ligges 
wrote:

>
>
> On 08.05.2024 17:56, Josiah Parry wrote:
> > Thank you, Dirk. This was a direct email from a CRAN member and not part
> of
> > the automatic checks. The whole email is below. I think the intent of the
> > message is "please resubmit."
>
>
> Well, the CRAN maintainer has not spotted this is abour rust code. This
> was not indicated in your mail, hence you got  direct rejection.
>
> Best,
> Uwe Ligges
>
>
>
>
>
> > Thanks, we see:
> >
> >
> >> Size of tarball: 18099770 bytes
> >
> >
> >
> > Please reudce to less than 5 MB for a CRAN package.
> >
> >
> >
> > Best,
> >
> >
> > Yes, prqlr is a great Rust-based package! My other Rust based packages
> that
> > are on CRAN are based, in part on prqlr.
> >
> >
> > On Wed, May 8, 2024 at 11:51 AM Dirk Eddelbuettel 
> wrote:
> >
> >>
> >> On 8 May 2024 at 11:02, Josiah Parry wrote:
> >> | CRAN has rejected this package with:
> >> |
> >> | *   Size of tarball: 18099770 bytes*
> >> |
> >> | *Please reudce to less than 5 MB for a CRAN package.*
> >>
> >> Are you by chance confusing a NOTE (issued, but can be overruled) with a
> >> WARNING (more severe, likely a must-be-addressed) or ERROR?
> >>
> >> There are lots and lots of packages larger than 5mb -- see eg
> >>
> >> https://cran.r-project.org/src/contrib/?C=S;O=D
> >>
> >> which has a top-5 of
> >>
> >> rcdklibs   19mb
> >> fastrmodels15mb
> >> prqlr  15mb
> >> RFlocalfdr 14mb
> >> acss.data  14mb
> >>
> >> and at least one of those is also Rust-using and hence a possible
> template.
> >>
> >> Dirk
> >>
> >> --
> >> dirk.eddelbuettel.com | @eddelbuettel | e...@debian.org
> >>
> >
> >   [[alternative HTML version deleted]]
> >
> > __
> > R-package-devel@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-package-devel
>

[[alternative HTML version deleted]]

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] Overcoming CRAN's 5mb vendoring requirement

2024-05-08 Thread Uwe Ligges



On 08.05.2024 17:56, Josiah Parry wrote:

Thank you, Dirk. This was a direct email from a CRAN member and not part of
the automatic checks. The whole email is below. I think the intent of the
message is "please resubmit."



Well, the CRAN maintainer has not spotted this is abour rust code. This 
was not indicated in your mail, hence you got  direct rejection.


Best,
Uwe Ligges






Thanks, we see:



Size of tarball: 18099770 bytes




Please reudce to less than 5 MB for a CRAN package.



Best,


Yes, prqlr is a great Rust-based package! My other Rust based packages that
are on CRAN are based, in part on prqlr.


On Wed, May 8, 2024 at 11:51 AM Dirk Eddelbuettel  wrote:



On 8 May 2024 at 11:02, Josiah Parry wrote:
| CRAN has rejected this package with:
|
| *   Size of tarball: 18099770 bytes*
|
| *Please reudce to less than 5 MB for a CRAN package.*

Are you by chance confusing a NOTE (issued, but can be overruled) with a
WARNING (more severe, likely a must-be-addressed) or ERROR?

There are lots and lots of packages larger than 5mb -- see eg

https://cran.r-project.org/src/contrib/?C=S;O=D

which has a top-5 of

rcdklibs   19mb
fastrmodels15mb
prqlr  15mb
RFlocalfdr 14mb
acss.data  14mb

and at least one of those is also Rust-using and hence a possible template.

Dirk

--
dirk.eddelbuettel.com | @eddelbuettel | e...@debian.org



[[alternative HTML version deleted]]

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel
__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] Overcoming CRAN's 5mb vendoring requirement

2024-05-08 Thread Josiah Parry
Thank you, Ivan, for your really thoughtful feedback! I really appreciate
it!

   - I'll see if there are any base R packages that support SHA-2 or SHA-3.
   - I'll see if I can get the configure.ac to make the appropriate Rscript
   call for configure.win.
  - I think the idea of having a single `confgure.ac` file to generate
  both configure and configure.win is nice. Guidance with GitHub
actions and
  ChatGPT is essentially a must for me since my bash is remedial at best.

Regarding the permanent storage requirement, I find it to be very strange.
I've personally never heard of Zenodo until just now! Does the CRAN team
have recommendations for what is considered "as sufficiently reliable?" I
have repos that have persisted for almost 10 years. I think that is
sufficiently reliable!

The requirement to avoid GitHub feels surprisingly anachronistic given how
central it is to the vast majority of software development. The alternative
I can think of is to create a CDN on cloudflare or something to store the
file independently.

Are there any avenues to have CRAN clarify their positions outside of
one-off processes? It would be quite unfortunate to go through all the work
of creating a way to build, store, and retrieve the dependencies only for
CRAN to decide they don't support it.


On Wed, May 8, 2024 at 3:32 PM Ivan Krylov  wrote:

> В Wed, 8 May 2024 14:08:36 -0400
> Josiah Parry  пишет:
>
> > With ChatGPT's ability to write autoconf, I *think *I have something
> > that can work.
>
> You don't have to write autoconf if your configure.ac is mostly a plain
> shell script. You can write the configure script itself. Set the PATH
> and then exec "${R_HOME}/bin/Rscript" tools/configure.R (in the
> regular, non-multiarch configure for Unix-like systems) or exec
> "${R_HOME}/bin${R_ARCH_BIN}/Rscript.exe" tools/configure.R (in
> configure.win, which you'll also need). You've already wrote the rest
> of the code in a language you know well: R.
>
> Autoconf would be useful if you had system-specific dependencies with
> the need to perform lots of compile tests. Those would have been a pain
> to set up in R. Here you mostly need sys.which() instead of
> AC_CHECK_PROGS and command -v.
>
> > The configure file runs tools/get-deps.R which will download the
> > dependencies from the repo if available and verify the checksums.
>
> One of the pain points is the need for a strong, cryptographically
> secure hash. MD5 is, unfortunately, no longer such a hash. In a cmake
> build, you would be able to use cmake's built in strong hashes (such as
> SHA-2 or SHA-3). The CRAN policy doesn't explicitly forbid MD5; it only
> requires a "checksum". If you figure out a way to use a strong hash
> from tools/configure.R for the downloaded tarball, please let us know.
>
> > If the checksums don't match, an error is thrown, otherwise it can
> > continue. I believe this meets the requirements of CRAN?
>
> The other important CRAN requirement is to store the vendor tarball
> somewhere as permanent as CRAN itself (see the caveats at the bottom of
> https://cran.r-project.org/web/packages/using_rust.html), that is, not
> GitHub. I think that Zenodo counts as a sufficiently reliable store.
>
> --
> Best regards,
> Ivan
>

[[alternative HTML version deleted]]

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] Overcoming CRAN's 5mb vendoring requirement

2024-05-08 Thread Ivan Krylov via R-package-devel
В Wed, 8 May 2024 14:08:36 -0400
Josiah Parry  пишет:

> With ChatGPT's ability to write autoconf, I *think *I have something
> that can work.

You don't have to write autoconf if your configure.ac is mostly a plain
shell script. You can write the configure script itself. Set the PATH
and then exec "${R_HOME}/bin/Rscript" tools/configure.R (in the
regular, non-multiarch configure for Unix-like systems) or exec
"${R_HOME}/bin${R_ARCH_BIN}/Rscript.exe" tools/configure.R (in
configure.win, which you'll also need). You've already wrote the rest
of the code in a language you know well: R.

Autoconf would be useful if you had system-specific dependencies with
the need to perform lots of compile tests. Those would have been a pain
to set up in R. Here you mostly need sys.which() instead of
AC_CHECK_PROGS and command -v.

> The configure file runs tools/get-deps.R which will download the
> dependencies from the repo if available and verify the checksums.

One of the pain points is the need for a strong, cryptographically
secure hash. MD5 is, unfortunately, no longer such a hash. In a cmake
build, you would be able to use cmake's built in strong hashes (such as
SHA-2 or SHA-3). The CRAN policy doesn't explicitly forbid MD5; it only
requires a "checksum". If you figure out a way to use a strong hash
from tools/configure.R for the downloaded tarball, please let us know.

> If the checksums don't match, an error is thrown, otherwise it can
> continue. I believe this meets the requirements of CRAN?

The other important CRAN requirement is to store the vendor tarball
somewhere as permanent as CRAN itself (see the caveats at the bottom of
https://cran.r-project.org/web/packages/using_rust.html), that is, not
GitHub. I think that Zenodo counts as a sufficiently reliable store.

-- 
Best regards,
Ivan

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] Overcoming CRAN's 5mb vendoring requirement

2024-05-08 Thread Josiah Parry
Thank you, Neal!

I took some inspiration from the arrow-r github repo and ch 1.2 of the
manual

*If your configure script needs auxiliary files, it is recommended that you
> ship them in a tools directory (as R itself does).*


With ChatGPT's ability to write autoconf, I *think *I have something that
can work.

The configure file runs tools/get-deps.R which will download the
dependencies from the repo if available and verify the checksums.
If the checksums don't match, an error is thrown, otherwise it can
continue. I believe this meets the requirements of CRAN?

Repo: https://github.com/R-ArcGIS/arcgisutils/tree/main/tools

On Wed, May 8, 2024 at 11:13 AM Neal Richardson 
wrote:

> CRAN policy [1] says: "If the sources are too large, it is acceptable
> to download them as part of installation, but do ensure that the
> download is of a fixed version rather than the latest." So you could
> try downloading the source in your configure script. Though be careful
> not to be bitten by this other line from the policy: "Packages which
> use Internet resources should fail gracefully with an informative
> message if the resource is not available or has changed (and not give
> a check warning nor error)."
>
> Neal
>
> [1]: https://cran.r-project.org/web/packages/policies.html
>
>
> On Wed, May 8, 2024 at 11:03 AM Josiah Parry 
> wrote:
> >
> > I am sorry for blowing up this thread lately.
> >
> > I've submitted a package to CRAN that uses Rust which thus requires
> > dependencies to be vendored. https://github.com/R-ArcGIS/arcgisutils/
> >
> > The vendored dependencies are 18mb when zipped and 16.4mb when zipped
> with
> > XZ -9e. The *installed package size is 1.2mb* on my Mac.
> >
> > CRAN has rejected this package with:
> >
> > *   Size of tarball: 18099770 bytes*
> >
> > *Please reudce to less than 5 MB for a CRAN package.*
> >
> >
> > Due to the requirement to vendor my dependencies. I do not see any
> possible
> > way to compress 250mb of source code to <= 5mb.
> >
> > I suspect there are alternatives which have been handled in one-off
> > situations given that other packages require fairly large system
> > dependencies e.g. Arrow, DuckDB, torch etc.
> >
> > How do others handle this?
> >
> > [[alternative HTML version deleted]]
> >
> > __
> > R-package-devel@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-package-devel
>

[[alternative HTML version deleted]]

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] Overcoming CRAN's 5mb vendoring requirement

2024-05-08 Thread Josiah Parry
Thank you, Dirk. This was a direct email from a CRAN member and not part of
the automatic checks. The whole email is below. I think the intent of the
message is "please resubmit."

Thanks, we see:


>Size of tarball: 18099770 bytes



Please reudce to less than 5 MB for a CRAN package.



Best,


Yes, prqlr is a great Rust-based package! My other Rust based packages that
are on CRAN are based, in part on prqlr.


On Wed, May 8, 2024 at 11:51 AM Dirk Eddelbuettel  wrote:

>
> On 8 May 2024 at 11:02, Josiah Parry wrote:
> | CRAN has rejected this package with:
> |
> | *   Size of tarball: 18099770 bytes*
> |
> | *Please reudce to less than 5 MB for a CRAN package.*
>
> Are you by chance confusing a NOTE (issued, but can be overruled) with a
> WARNING (more severe, likely a must-be-addressed) or ERROR?
>
> There are lots and lots of packages larger than 5mb -- see eg
>
>https://cran.r-project.org/src/contrib/?C=S;O=D
>
> which has a top-5 of
>
>rcdklibs   19mb
>fastrmodels15mb
>prqlr  15mb
>RFlocalfdr 14mb
>acss.data  14mb
>
> and at least one of those is also Rust-using and hence a possible template.
>
> Dirk
>
> --
> dirk.eddelbuettel.com | @eddelbuettel | e...@debian.org
>

[[alternative HTML version deleted]]

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] Overcoming CRAN's 5mb vendoring requirement

2024-05-08 Thread Dirk Eddelbuettel


On 8 May 2024 at 11:02, Josiah Parry wrote:
| CRAN has rejected this package with:
| 
| *   Size of tarball: 18099770 bytes*
| 
| *Please reudce to less than 5 MB for a CRAN package.*

Are you by chance confusing a NOTE (issued, but can be overruled) with a
WARNING (more severe, likely a must-be-addressed) or ERROR?

There are lots and lots of packages larger than 5mb -- see eg

   https://cran.r-project.org/src/contrib/?C=S;O=D

which has a top-5 of

   rcdklibs   19mb
   fastrmodels15mb
   prqlr  15mb
   RFlocalfdr 14mb
   acss.data  14mb

and at least one of those is also Rust-using and hence a possible template.

Dirk

-- 
dirk.eddelbuettel.com | @eddelbuettel | e...@debian.org

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] package removed from CRAN

2024-05-08 Thread Ivan Krylov via R-package-devel
В Wed, 8 May 2024 17:30:46 +0200
"Jose V. Die Ramon"  пишет:

> Could anyone please help me understand the reasons behind this, or
> suggest any steps I should take to resolve it?

Here's what I could find in
https://cran.r-project.org/src/contrib/PACKAGES.in:

>> X-CRAN-Comment: Archived on 2024-04-30 for policy violation.
>>  .
>>  On Internet access.  Also other errors.

So Avi is right, this is about the tests and/or examples failing
(possibly due to problems on the remote server).

If possible, try to emit errors with a special class set for
Internet-related errors. This will make it possible for your examples
and tests to catch them, as in:

tests/*.R:

tryCatch(
 ,
 refseqR_internet_error = function(e)
  message("Caught Internet-related error")
)

-- 
Best regards,
Ivan

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] package removed from CRAN

2024-05-08 Thread Avraham Adler
According to the CRAN links
,
your package had an error on r-devel-windows-x86_64 and
r-patched-linux-x86_64 which was not addressed. Specifically, some
examples failed. See

for more specific details. Usually, fixing the problem and
incrementing the version is enough to resubmit it to CRAN.

Thanks,

Avi

On Wed, May 8, 2024 at 11:33 AM Jose V. Die Ramon  wrote:
>
> Hello,
>
> I just discovered that my package 'refseqR' was removed from the CRAN 
> repository on April 30th.
> https://cran.r-project.org/web/packages/refseqR/index.html
>
> This news is extremely upsetting, especially because I did not receive any 
> communication or warning regarding the issue. Could anyone please help me 
> understand the reasons behind this, or suggest any steps I should take to 
> resolve it?
>
> Thanks,
> Jose
> __
> R-package-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-package-devel

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


[R-pkg-devel] package removed from CRAN

2024-05-08 Thread Jose V. Die Ramon
Hello, 

I just discovered that my package 'refseqR' was removed from the CRAN 
repository on April 30th. 
https://cran.r-project.org/web/packages/refseqR/index.html

This news is extremely upsetting, especially because I did not receive any 
communication or warning regarding the issue. Could anyone please help me 
understand the reasons behind this, or suggest any steps I should take to 
resolve it?

Thanks, 
Jose 
__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] Overcoming CRAN's 5mb vendoring requirement

2024-05-08 Thread Neal Richardson
CRAN policy [1] says: "If the sources are too large, it is acceptable
to download them as part of installation, but do ensure that the
download is of a fixed version rather than the latest." So you could
try downloading the source in your configure script. Though be careful
not to be bitten by this other line from the policy: "Packages which
use Internet resources should fail gracefully with an informative
message if the resource is not available or has changed (and not give
a check warning nor error)."

Neal

[1]: https://cran.r-project.org/web/packages/policies.html


On Wed, May 8, 2024 at 11:03 AM Josiah Parry  wrote:
>
> I am sorry for blowing up this thread lately.
>
> I've submitted a package to CRAN that uses Rust which thus requires
> dependencies to be vendored. https://github.com/R-ArcGIS/arcgisutils/
>
> The vendored dependencies are 18mb when zipped and 16.4mb when zipped with
> XZ -9e. The *installed package size is 1.2mb* on my Mac.
>
> CRAN has rejected this package with:
>
> *   Size of tarball: 18099770 bytes*
>
> *Please reudce to less than 5 MB for a CRAN package.*
>
>
> Due to the requirement to vendor my dependencies. I do not see any possible
> way to compress 250mb of source code to <= 5mb.
>
> I suspect there are alternatives which have been handled in one-off
> situations given that other packages require fairly large system
> dependencies e.g. Arrow, DuckDB, torch etc.
>
> How do others handle this?
>
> [[alternative HTML version deleted]]
>
> __
> R-package-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-package-devel

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


[R-pkg-devel] Overcoming CRAN's 5mb vendoring requirement

2024-05-08 Thread Josiah Parry
I am sorry for blowing up this thread lately.

I've submitted a package to CRAN that uses Rust which thus requires
dependencies to be vendored. https://github.com/R-ArcGIS/arcgisutils/

The vendored dependencies are 18mb when zipped and 16.4mb when zipped with
XZ -9e. The *installed package size is 1.2mb* on my Mac.

CRAN has rejected this package with:

*   Size of tarball: 18099770 bytes*

*Please reudce to less than 5 MB for a CRAN package.*


Due to the requirement to vendor my dependencies. I do not see any possible
way to compress 250mb of source code to <= 5mb.

I suspect there are alternatives which have been handled in one-off
situations given that other packages require fairly large system
dependencies e.g. Arrow, DuckDB, torch etc.

How do others handle this?

[[alternative HTML version deleted]]

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] Cannot repro failing CRAN autochecks

2024-05-08 Thread Josiah Parry
Thanks to @yutannihilation for pointing out that the issue repros on
r-universe
https://github.com/r-universe/r-arcgis/actions/runs/8990426306/job/24695887245
.

Do folks know if there are any templates for the linux CRAN check? It
appears that the r-lib/actions linux checks don't cover all of the same
bases. I tried cribbing the r-universe one but it is a bit like unwinding a
cat's cradle.

Waiting an hour for each r-universe check is not really an effective
strategy! 

On Tue, May 7, 2024 at 2:58 PM Ivan Krylov  wrote:

> В Tue, 7 May 2024 21:40:31 +0300
> Ivan Krylov via R-package-devel  пишет:
>
> > It's too late for Makevars to exclude files from the source package
> > tarball. Use .Rbuildignore instead:
>
> Sorry, that was mostly misguided. .Rbuildignore won't help with the
> contents of the Rust vendor tarball.
>
> 1. Can you omit the .cff file from src/rust/vendor.tar.xz when building
> it?
>
> 2. I think that there is --exclude in both GNU tar and BSD tar. How
> about tar --exclude="*.cff" -x -f rust/vendor.tar.xz ?
>
> 3. From
> <
> https://win-builder.r-project.org/incoming_pretest/arcgisutils_0.3.0_20240507_194020/Debian/00install.out
> >,
> it can be seen that the "clean" target does not get called. Can you
> remove the *.cff file in the same Make target?
>
> --
> Best regards,
> Ivan
>

[[alternative HTML version deleted]]

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel