Re: [Bioc-devel] Package size issue

2024-03-29 Thread Kevin R. Coombes
If it is the .git pack that is the problem, you can keep it out of the 
tarball by creating a file in the main package directory (where 
DESCRIPTION lives) called ",Rbuildignore". Put a line in that file that 
reads

  \.git/(.*)
That will keep anything in the .git folder from being included.

On 3/28/2024 7:16 PM, gabriel.villa...@mdc-berlin.de wrote:

Hi!

This is my first message to the mailing list so apologies if I am breaking any 
guidelines.

I am submitting a package called Ribostan and the build report gives an error 
that the package tarball exceeds the size requirement. I am also being warned 
that individual package files exceed the 5MB size limit.

The total size of a fresh clone of my package is 5.9MB, but 3.9MB of that is 
from the single packfile in .git/objects/pack. As I understand, the packfile 
contains the history of all files that have ever been in the repository, 
including files that have been deleted.

I am wondering if this packfile is what is inflating my package size in the 
build and whether there is a solution to reduce its size. In case it is 
relevant: at some point a 1.38MB bam file was committed to the repo as toy data 
for running examples, but has since been deleted. I am also wondering why an 
individual file is reported to be exceeding 5MB when I don’t see any such file 
in my repo.

Any advice would be very welcome!

Best regards,
Gabriel
___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Rd] Question regarding .make_numeric_version with non-character input

2024-03-29 Thread Dirk Eddelbuettel


On 29 March 2024 at 17:56, Andrea Gilardi via R-devel wrote:
| Dear all,
| 
| I have a question regarding the R-devel version of .make_numeric_version() 
function. As far as I can understand, the current code 
(https://github.com/wch/r-source/blob/66b91578dfc85140968f07dd4e72d8cb8a54f4c6/src/library/base/R/version.R#L50-L56)
 runs the following steps in case of non-character input:
| 
| 1. It creates a message named msg using gettextf.
| 2. Such object is then passed to stop(msg) or warning(msg) according to the 
following condition
| 
| tolower(Sys.getenv("_R_CHECK_STOP_ON_INVALID_NUMERIC_VERSION_INPUTS_") != 
"false")
| 
| However, I don't understand the previous code since the output of 
Sys.getenv("_R_CHECK_STOP_ON_INVALID_NUMERIC_VERSION_INPUTS_") != "false" is 
just a boolean value and tolower() will just return "true" or "false". Maybe 
the intended code is 
tolower(Sys.getenv("_R_CHECK_STOP_ON_INVALID_NUMERIC_VERSION_INPUTS_")) != 
"false" ? Or am I missing something? 

Yes, agreed -- good catch.  In full, the code is (removing leading
whitespace, and putting it back onto single lines)

  msg <- gettextf("invalid non-character version specification 'x' (type: %s)", 
typeof(x))
  if(tolower(Sys.getenv("_R_CHECK_STOP_ON_INVALID_NUMERIC_VERSION_INPUTS_") != 
"false"))
  stop(msg, domain = NA)
  else
  warning(msg, domain = NA, immediate. = TRUE)  

where msg is constant (but reflecting language settings via standard i18n)
and as you not the parentheses appear wrong.  What was intended is likely

  msg <- gettextf("invalid non-character version specification 'x' (type: %s)", 
typeof(x))
  if(tolower(Sys.getenv("_R_CHECK_STOP_ON_INVALID_NUMERIC_VERSION_INPUTS_")) != 
"false")
  stop(msg, domain = NA)
  else
  warning(msg, domain = NA, immediate. = TRUE)  

If you use bugzilla before and have a handle, maybe file a bug report with
this as patch at https://bugs.r-project.org/

Dirk

-- 
dirk.eddelbuettel.com | @eddelbuettel | e...@debian.org

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Question regarding .make_numeric_version with non-character input

2024-03-29 Thread Andrea Gilardi via R-devel
Dear all,

I have a question regarding the R-devel version of .make_numeric_version() 
function. As far as I can understand, the current code 
(https://github.com/wch/r-source/blob/66b91578dfc85140968f07dd4e72d8cb8a54f4c6/src/library/base/R/version.R#L50-L56)
 runs the following steps in case of non-character input:

1. It creates a message named msg using gettextf.
2. Such object is then passed to stop(msg) or warning(msg) according to the 
following condition

tolower(Sys.getenv("_R_CHECK_STOP_ON_INVALID_NUMERIC_VERSION_INPUTS_") != 
"false")

However, I don't understand the previous code since the output of 
Sys.getenv("_R_CHECK_STOP_ON_INVALID_NUMERIC_VERSION_INPUTS_") != "false" is 
just a boolean value and tolower() will just return "true" or "false". Maybe 
the intended code is 
tolower(Sys.getenv("_R_CHECK_STOP_ON_INVALID_NUMERIC_VERSION_INPUTS_")) != 
"false" ? Or am I missing something? 

Thank you very much for your help
Kind regards

Andrea



__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] declare and validate options

2024-03-29 Thread Duncan Murdoch

On 29/03/2024 11:59 a.m., Antoine Fabri wrote:

I think there are too many packages that would need changes under this
scheme.


There would be zero if the registration of options is not required for 
packages first uploaded on CRAN before the feature is implemented.
If an option is not registered no validation is triggered and nothing 
breaks even if we opt in the behavior.


Sorry, I missed that.  Then the objection is that this would require 
CRAN to apply two different sets of rules on submissions. When a 
resubmission arrived, they'd need to look in the archive to find out 
which set of rules applied to it.  They do a bit of that now 
(determining if a submission is a resubmission, for example), but this 
would be a bigger change.  I don't think date of first submission is 
ever currently used.



If those functions could be made simple enough and bulletproof and were
widely adopted, maybe they'd be copied into one of the base packages,

Sure but realistically few maintainers will opt-in for more restrictions.


If this is something that you want CRAN to force on package authors, 
then you need to give some hard evidence that it will fix things that 
cause trouble.  But if you only apply the rule to new packages, not 
updates to old ones, it's hard to believe that it will really make much 
difference, though it will still be extra work for CRAN and R Core.


if posit did something on those lines maybe it would have a chance but 
otherwise I don't see an optional feature like this spread very far.
Or we need this package to make working with options really really much 
easier for themselves as developers, not just beneficial for users in 
the long run.


That should be a goal regardless of who does it.

Think about the development of the pipe operator:  it was in magrittr 
(and I think another package, but I forget the name) first, was widely 
adopted, then a simpler version was brought into base R.


Duncan Murdoch




Le ven. 29 mars 2024 à 16:25, Duncan Murdoch > a écrit :


On 29/03/2024 10:52 a.m., Antoine Fabri wrote:
 > Dear r-devel,
 >
 > options() are basically global variables and they come with
several issues:
 > * they're not really truly owned by a package aside from loose naming
 > conventions
 > * they're not validated
 > * their documentation is not standard, and they're often not
documented at
 > all, it's hard to know what options exist
 > * in practice they're sometimes used for internal purposes, which
is at
 > odds with their global nature and contribute to the mess, I think
they can
 > almost always be replaced by objects under a `globals`
environment in the
 > namespace, it's just a bit more work
 >
 > I tried to do as much as possible with static analysis using my
package opt
 > but it can only go so far :
https://github.com/moodymudskipper/opt

 >
 > I think we can do a bit better and that it's not necessarily so
complex,
 > here's a draft of possible design :
 >
 > We could have something like this in a package to register
options along
 > with an optional validator, triggered on `options(..)` (or a new
function).
 >
 > # similar to registerS3method() :
 > registerOption("mypkg.my_option1")
 > registerOption("mypkg.my_option2", function(x)
stopifnot(is.numeric(x))
 > # maybe a `default` arg too to avoid the .onLoad() gymnastics and
invisible
 > NULL options
 >
 > * validation is a breaking change so we'd have an environment
variable to
 > opt in
 > * validation occurs when an option is set AND the namespace is
already
 > loaded (so we can still set options without loading a namespace)
OR it
 > occurs later when an applicable namespace is loaded
 > * if we register an option that has already been registered by
another
 > package, we get a message, the validator of the last loaded
namespace is
 > used, in practice due to naming conventions it doesn't really
happen, CRAN
 > could also enforce naming conventions for new packages
 > * New packages must use registerOption() if they define options,
and there
 > must be a standard documentation page for those, separately or
together
 > (with aliases), accessible with `?mypkg.my_option1` etc...
 >
 > This could certainly be done in different ways and I'd love to
hear about
 > other ideas or obstacles to improvements in this area.
 >

I think there are too many packages that would need changes under this
scheme.

A more easily achievable improvement would be to provide functions to
support registration, validation and documentation, and leave it up to
the package author to call those.  This wouldn't give you validation at
the time a user set an option, but 

Re: [Rd] declare and validate options

2024-03-29 Thread Antoine Fabri
>
> I think there are too many packages that would need changes under this
> scheme.


There would be zero if the registration of options is not required for
packages first uploaded on CRAN before the feature is implemented.
If an option is not registered no validation is triggered and nothing
breaks even if we opt in the behavior.


> If those functions could be made simple enough and bulletproof and were
> widely adopted, maybe they'd be copied into one of the base packages,
>

Sure but realistically few maintainers will opt-in for more restrictions.
if posit did something on those lines maybe it would have a chance but
otherwise I don't see an optional feature like this spread very far.
Or we need this package to make working with options really really much
easier for themselves as developers, not just beneficial for users in the
long run.

Le ven. 29 mars 2024 à 16:25, Duncan Murdoch  a
écrit :

> On 29/03/2024 10:52 a.m., Antoine Fabri wrote:
> > Dear r-devel,
> >
> > options() are basically global variables and they come with several
> issues:
> > * they're not really truly owned by a package aside from loose naming
> > conventions
> > * they're not validated
> > * their documentation is not standard, and they're often not documented
> at
> > all, it's hard to know what options exist
> > * in practice they're sometimes used for internal purposes, which is at
> > odds with their global nature and contribute to the mess, I think they
> can
> > almost always be replaced by objects under a `globals` environment in the
> > namespace, it's just a bit more work
> >
> > I tried to do as much as possible with static analysis using my package
> opt
> > but it can only go so far : https://github.com/moodymudskipper/opt
> >
> > I think we can do a bit better and that it's not necessarily so complex,
> > here's a draft of possible design :
> >
> > We could have something like this in a package to register options along
> > with an optional validator, triggered on `options(..)` (or a new
> function).
> >
> > # similar to registerS3method() :
> > registerOption("mypkg.my_option1")
> > registerOption("mypkg.my_option2", function(x) stopifnot(is.numeric(x))
> > # maybe a `default` arg too to avoid the .onLoad() gymnastics and
> invisible
> > NULL options
> >
> > * validation is a breaking change so we'd have an environment variable to
> > opt in
> > * validation occurs when an option is set AND the namespace is already
> > loaded (so we can still set options without loading a namespace) OR it
> > occurs later when an applicable namespace is loaded
> > * if we register an option that has already been registered by another
> > package, we get a message, the validator of the last loaded namespace is
> > used, in practice due to naming conventions it doesn't really happen,
> CRAN
> > could also enforce naming conventions for new packages
> > * New packages must use registerOption() if they define options, and
> there
> > must be a standard documentation page for those, separately or together
> > (with aliases), accessible with `?mypkg.my_option1` etc...
> >
> > This could certainly be done in different ways and I'd love to hear about
> > other ideas or obstacles to improvements in this area.
> >
>
> I think there are too many packages that would need changes under this
> scheme.
>
> A more easily achievable improvement would be to provide functions to
> support registration, validation and documentation, and leave it up to
> the package author to call those.  This wouldn't give you validation at
> the time a user set an option, but could make it easier to validate when
> the package retrieved the value:  specify rules in one place, then
> retrieve from multiple places, without needing to duplicate the rules.
>
> If those functions could be made simple enough and bulletproof and were
> widely adopted, maybe they'd be copied into one of the base packages,
> but really the only need for that would be to support validation on
> setting, rather than validation on retrieval.
>
> Duncan Murdoch
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] declare and validate options

2024-03-29 Thread Duncan Murdoch

On 29/03/2024 10:52 a.m., Antoine Fabri wrote:

Dear r-devel,

options() are basically global variables and they come with several issues:
* they're not really truly owned by a package aside from loose naming
conventions
* they're not validated
* their documentation is not standard, and they're often not documented at
all, it's hard to know what options exist
* in practice they're sometimes used for internal purposes, which is at
odds with their global nature and contribute to the mess, I think they can
almost always be replaced by objects under a `globals` environment in the
namespace, it's just a bit more work

I tried to do as much as possible with static analysis using my package opt
but it can only go so far : https://github.com/moodymudskipper/opt

I think we can do a bit better and that it's not necessarily so complex,
here's a draft of possible design :

We could have something like this in a package to register options along
with an optional validator, triggered on `options(..)` (or a new function).

# similar to registerS3method() :
registerOption("mypkg.my_option1")
registerOption("mypkg.my_option2", function(x) stopifnot(is.numeric(x))
# maybe a `default` arg too to avoid the .onLoad() gymnastics and invisible
NULL options

* validation is a breaking change so we'd have an environment variable to
opt in
* validation occurs when an option is set AND the namespace is already
loaded (so we can still set options without loading a namespace) OR it
occurs later when an applicable namespace is loaded
* if we register an option that has already been registered by another
package, we get a message, the validator of the last loaded namespace is
used, in practice due to naming conventions it doesn't really happen, CRAN
could also enforce naming conventions for new packages
* New packages must use registerOption() if they define options, and there
must be a standard documentation page for those, separately or together
(with aliases), accessible with `?mypkg.my_option1` etc...

This could certainly be done in different ways and I'd love to hear about
other ideas or obstacles to improvements in this area.



I think there are too many packages that would need changes under this 
scheme.


A more easily achievable improvement would be to provide functions to 
support registration, validation and documentation, and leave it up to 
the package author to call those.  This wouldn't give you validation at 
the time a user set an option, but could make it easier to validate when 
the package retrieved the value:  specify rules in one place, then 
retrieve from multiple places, without needing to duplicate the rules.


If those functions could be made simple enough and bulletproof and were 
widely adopted, maybe they'd be copied into one of the base packages, 
but really the only need for that would be to support validation on 
setting, rather than validation on retrieval.


Duncan Murdoch

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] declare and validate options

2024-03-29 Thread Antoine Fabri
Dear r-devel,

options() are basically global variables and they come with several issues:
* they're not really truly owned by a package aside from loose naming
conventions
* they're not validated
* their documentation is not standard, and they're often not documented at
all, it's hard to know what options exist
* in practice they're sometimes used for internal purposes, which is at
odds with their global nature and contribute to the mess, I think they can
almost always be replaced by objects under a `globals` environment in the
namespace, it's just a bit more work

I tried to do as much as possible with static analysis using my package opt
but it can only go so far : https://github.com/moodymudskipper/opt

I think we can do a bit better and that it's not necessarily so complex,
here's a draft of possible design :

We could have something like this in a package to register options along
with an optional validator, triggered on `options(..)` (or a new function).

# similar to registerS3method() :
registerOption("mypkg.my_option1")
registerOption("mypkg.my_option2", function(x) stopifnot(is.numeric(x))
# maybe a `default` arg too to avoid the .onLoad() gymnastics and invisible
NULL options

* validation is a breaking change so we'd have an environment variable to
opt in
* validation occurs when an option is set AND the namespace is already
loaded (so we can still set options without loading a namespace) OR it
occurs later when an applicable namespace is loaded
* if we register an option that has already been registered by another
package, we get a message, the validator of the last loaded namespace is
used, in practice due to naming conventions it doesn't really happen, CRAN
could also enforce naming conventions for new packages
* New packages must use registerOption() if they define options, and there
must be a standard documentation page for those, separately or together
(with aliases), accessible with `?mypkg.my_option1` etc...

This could certainly be done in different ways and I'd love to hear about
other ideas or obstacles to improvements in this area.

Thanks,

Antoine

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Bioc-devel] Package size issue

2024-03-29 Thread Lluís Revilla
Hi Gabriel,

You can find how to use BFG and other related recommendations at the R
devel book:
https://contributions.bioconductor.org/git-version-control.html?q=BFG#removing-large-files-from-history-with-bfg-repo-cleaner

I hope this helps with this and other questions,

Lluís

On Fri, 29 Mar 2024 at 01:05, Ali Sajid Imami 
wrote:

> Hi,
> I am not part of the core Bioconductor team, but I have been in this
> situation before. I would suggest you use the BFG Repo Cleaner (
> https://rtyley.github.io/bfg-repo-cleaner/) to remove the errant bam file
> from your repo's history. It's very likely that adding and removing the bam
> file means that that file is still part of the history. That should help
> you cut down on the size.
>
>
> Regards,
> Dr. Ali Sajid Imami
> LinkedIn 
>
>
> On Thu, Mar 28, 2024 at 7:17 PM gabriel.villa...@mdc-berlin.de <
> gabriel.villa...@mdc-berlin.de> wrote:
>
> > Hi!
> >
> > This is my first message to the mailing list so apologies if I am
> breaking
> > any guidelines.
> >
> > I am submitting a package called Ribostan and the build report gives an
> > error that the package tarball exceeds the size requirement. I am also
> > being warned that individual package files exceed the 5MB size limit.
> >
> > The total size of a fresh clone of my package is 5.9MB, but 3.9MB of that
> > is from the single packfile in .git/objects/pack. As I understand, the
> > packfile contains the history of all files that have ever been in the
> > repository, including files that have been deleted.
> >
> > I am wondering if this packfile is what is inflating my package size in
> > the build and whether there is a solution to reduce its size. In case it
> is
> > relevant: at some point a 1.38MB bam file was committed to the repo as
> toy
> > data for running examples, but has since been deleted. I am also
> wondering
> > why an individual file is reported to be exceeding 5MB when I don’t see
> any
> > such file in my repo.
> >
> > Any advice would be very welcome!
> >
> > Best regards,
> > Gabriel
> > ___
> > Bioc-devel@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/bioc-devel
> >
>
> [[alternative HTML version deleted]]
>
> ___
> Bioc-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


[Bioc-devel] Final Reminder: BioC2024 Abstract & Sticker Contest Deadline April 1st

2024-03-29 Thread Maria . Doyle
Hi everyone,

This is the last call to submit your abstracts for the BioC2024 conference and 
entries for the sticker design contest. The submission window closes this 
Monday, April 1st.

BioC2024 Abstract Submission
Ensure your research is showcased at BioC2024. Submit your abstract 
here.

BioC2024 Sticker Design Contest
Don�t miss out on having your sticker design featured and winning free 
registration for BioC2024. Enter the contest 
here.

We�re excited to see your contributions and thank everyone who has submitted so 
far!

Best regards,
Maria

Maria Doyle, PhD
Bioconductor Community Manager

School of Medicine,
University of Limerick, Limerick, V94 T9PX
Ireland
Email: maria.do...@ul.ie
Phone (office): +353 61 234 768
[I work flexible hours across several time zones. I don't expect you to read or 
respond to my emails outside of your normal working hours]



[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel