Re: [Rd] R-devel Digest, Vol 133, Issue 23

2014-03-26 Thread Hervé Pagès

Hi,

On 03/26/2014 10:24 AM, Radford Neal wrote:

From: Richard Cotton 

The rep function is very versatile, but that versatility comes at a
cost: it takes a bit of effort to learn (and remember) its syntax.
This is a problem, since rep is one of the first functions many
beginners will come across.  Of the three main uses of rep, two have
simpler alternatives.

rep(x, times = ) has rep.int
rep(x, length.out  = ) has rep_len

I think that a rep_each function would be a worthy addition for the
third use case

rep(x, each = )

(It might also be worth having rep_times as a synonym for rep.int.)


I think this is exactly the wrong approach.  Indeed, the aim should be
to get rid of functions like rep.int (or at least discourage their
use, even if they have to be kept for compatibility).

Why is rep_each(x,n) better than rep(x,each=n)?


According to the NEWS file, it seems that R core felt that having
rep_len() was a good idea.

  There is a new function rep_len() analogous to rep.int() for when
  speed is required (and names are not).

Now one might wonder (and your students might wonder too) why having
rep_each() "for when speed is required (and names are not)" is not a
good idea.

By having rep_len(), rep_each(), and rep_times(), the 3 extra arguments
in rep(x, ...) would be covered. Plus, when I use tab completion after
typing rep_, I would get a nice summary and would be able to quickly
choose. Right now, when I do this, one function is missing, and one has
a misleading name. So I'd rather have no specialized function at all,
or have the 3. Would be cleaner and less confusing than the current
situation.

Cheers,
H.



There is no saving in
typing (which would be trivial anyway).  There *ought* to be no
significant difference in speed (though that seems to have been the
motive for rep.int).  Are you trying to let students learn R without
ever learning about specifying arguments by name?

And where would you stop?  How about seq_by(a,b,s) rather than having
to arduously type seq(a,b,by=s)?  Maybe we should have glm_binomial,
glm_poisson, etc. so we don't have to remember the "family" argument?
This way lies madness...

Radford Neal

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel



--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fhcrc.org
Phone:  (206) 667-5791
Fax:(206) 667-1319

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] R-devel Digest, Vol 133, Issue 23

2014-03-26 Thread Prof Brian Ripley

On 26/03/2014 22:00, peter dalgaard wrote:


On 26 Mar 2014, at 18:24 , Radford Neal  wrote:


From: Richard Cotton 

The rep function is very versatile, but that versatility comes at a
cost: it takes a bit of effort to learn (and remember) its syntax.
This is a problem, since rep is one of the first functions many
beginners will come across.  Of the three main uses of rep, two have
simpler alternatives.

rep(x, times = ) has rep.int
rep(x, length.out  = ) has rep_len

I think that a rep_each function would be a worthy addition for the
third use case

rep(x, each = )

(It might also be worth having rep_times as a synonym for rep.int.)


I think this is exactly the wrong approach.  Indeed, the aim should be
to get rid of functions like rep.int (or at least discourage their
use, even if they have to be kept for compatibility).

Why is rep_each(x,n) better than rep(x,each=n)?  There is no saving in
typing (which would be trivial anyway).  There *ought* to be no
significant difference in speed (though that seems to have been the
motive for rep.int).  Are you trying to let students learn R without
ever learning about specifying arguments by name?

And where would you stop?  How about seq_by(a,b,s) rather than having
to arduously type seq(a,b,by=s)?  Maybe we should have glm_binomial,
glm_poisson, etc. so we don't have to remember the "family" argument?
This way lies madness...


Spot on.

Well, maybe a slight disagreement: In a weakly typed language like R, you will 
always have performance losses due to type testing and dispatching, and no 
compiler/interpreter is intelligent enough to predict the types so that this 
can be avoided. Some amout of hinting is needed for reliable speedups, either 
by having special functions for simple cases (allowed to make assumptions on 
their inputs), or some sort of #pragma-like construction.

Actually, rep.int seems to be a poor example of this since the speedup is 
pretty negligible unless you do huge amounts of short replicates. I expect that 
the S-PLUS compatibility was the main reason to have it. Case in point:


As the help says:

 Function ‘rep.int’ is a simple case handled by internal code, and
 provided as a separate function partly for S compatibility and
 partly for speed (especially when names can be dropped).

E.g.

> a <- letters[1:10]; names(a) <- a
> system.time(for(i in 1:100) rep.int(a,10))
   user  system elapsed
  1.568   0.001   1.574
> system.time(for(i in 1:100) rep(a,10))
   user  system elapsed
  2.804   0.002   2.816

There are also rare occasions where it is useful to use rep.int to 
circumvent method dispatch.


Note that rep() was an interpreted function when that comment was first 
written, and the gap was much larger then.  (Nor was it byte-compiled, 
nor generic.)  For the version of rep in R 0.65.1:


> system.time(for(i in 1:100) rep("a",10))
   user  system elapsed
  1.612   0.000   1.616

vs the current

> system.time(for(i in 1:100) rep("a",10))
   user  system elapsed
  0.518   0.000   0.519
> system.time(for(i in 1:100) rep.int("a",10))
   user  system elapsed
  0.471   0.000   0.473


system.time(for(i in 1:1000) rep("a",10))

user  system elapsed
  16.721   0.125  19.037

system.time(for(i in 1:1000) rep.int("a",10))

user  system elapsed
  14.356   0.050  14.611

system.time(for(i in 1:100) rep("a",1000))

user  system elapsed
  11.655   2.157  14.263

system.time(for(i in 1:100) rep.int("a",1000))

user  system elapsed
  10.957   1.708  12.917


--
Brian D. Ripley,  rip...@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (se3lf)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] R-devel Digest, Vol 133, Issue 23

2014-03-26 Thread peter dalgaard

On 26 Mar 2014, at 18:24 , Radford Neal  wrote:

>> From: Richard Cotton 
>> 
>> The rep function is very versatile, but that versatility comes at a
>> cost: it takes a bit of effort to learn (and remember) its syntax.
>> This is a problem, since rep is one of the first functions many
>> beginners will come across.  Of the three main uses of rep, two have
>> simpler alternatives.
>> 
>> rep(x, times = ) has rep.int
>> rep(x, length.out  = ) has rep_len
>> 
>> I think that a rep_each function would be a worthy addition for the
>> third use case
>> 
>> rep(x, each = )
>> 
>> (It might also be worth having rep_times as a synonym for rep.int.)
> 
> I think this is exactly the wrong approach.  Indeed, the aim should be
> to get rid of functions like rep.int (or at least discourage their
> use, even if they have to be kept for compatibility).
> 
> Why is rep_each(x,n) better than rep(x,each=n)?  There is no saving in
> typing (which would be trivial anyway).  There *ought* to be no
> significant difference in speed (though that seems to have been the
> motive for rep.int).  Are you trying to let students learn R without
> ever learning about specifying arguments by name?
> 
> And where would you stop?  How about seq_by(a,b,s) rather than having
> to arduously type seq(a,b,by=s)?  Maybe we should have glm_binomial,
> glm_poisson, etc. so we don't have to remember the "family" argument?
> This way lies madness...

Spot on. 

Well, maybe a slight disagreement: In a weakly typed language like R, you will 
always have performance losses due to type testing and dispatching, and no 
compiler/interpreter is intelligent enough to predict the types so that this 
can be avoided. Some amout of hinting is needed for reliable speedups, either 
by having special functions for simple cases (allowed to make assumptions on 
their inputs), or some sort of #pragma-like construction.

Actually, rep.int seems to be a poor example of this since the speedup is 
pretty negligible unless you do huge amounts of short replicates. I expect that 
the S-PLUS compatibility was the main reason to have it. Case in point:

> system.time(for(i in 1:1000) rep("a",10))
   user  system elapsed 
 16.721   0.125  19.037 
> system.time(for(i in 1:1000) rep.int("a",10))
   user  system elapsed 
 14.356   0.050  14.611 
> system.time(for(i in 1:100) rep("a",1000))
   user  system elapsed 
 11.655   2.157  14.263 
> system.time(for(i in 1:100) rep.int("a",1000))
   user  system elapsed 
 10.957   1.708  12.917 

For more spectacular speedups compare seq(1,10) to seq_len(10) or even just to 
1:10. Then again, the slowdown in seq() is so large that it is hard to believe 
it to be completely unavoidable.
  

-- 
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd@cbs.dk  Priv: pda...@gmail.com

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] internal string comparison (Scollate)

2014-03-26 Thread Romain François

Le 26 mars 2014 à 18:03, Gabriel Becker  a écrit :

> On Wed, Mar 26, 2014 at 9:50 AM, Dirk Eddelbuettel  wrote:
> 
> On 26 March 2014 at 17:22, Romain François wrote:
> | I’d like to compare two strings internally the way R would, so I need 
> Scollate which is not part of the authorized R api.
> |
> | So:
> |  - Can Scollate (and perhaps Seql) be promoted to api ?
> |  - If not what are the alternatives ? Using strcmp or stroll does not seem 
> as general as Scollate.
> 
> I'd add a third option:
> 
>- Put this in a new package and register the functions you want.
> 
> That would not achieve what Romain wants. Or rather, it would when he did it, 
> but would not be guaranteed to do so at any point after the next release of R.

That’s one part of the problem. Indeed I’d rather use something rather than 
copy and paste it and run the risk of being outdated. The answer to that is 
testing though. I can develop a test suite that can let me know I’m out of date 
and I need to copy and paste some new code, etc … Done that before, this is 
tedious, but so what. 

The other part of the problem (the real part of the problem actually) is that, 
at least when R is built with ICU support, Scollate will depend on a the 
collator pointer in util.c
https://github.com/wch/r-source/blob/trunk/src/main/util.c#L1777

And this can be controlled by the base::icuSetCollate function. Of course the 
collator pointer is not public.

So in short that does not help. 

> My understanding of his request is that he wants something that will "behave 
> as R will", not "behave as R does now".
> 
> All that having been said, not every single thing R does internally can be 
> public and of course I'm not privy or a party to discussions on what should 
> and shouldn't be. But freezing and duplicating small pieces of R's 
> non-exported internal code seems like a dangerous move to me. 
> 
> ~G
> 
> -- 
> Gabriel Becker
> Graduate Student
> Statistics Department
> University of California, Davis


[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] [Bioc-devel] Conflicting definitions for function redefined as S4 generics

2014-03-26 Thread Hervé Pagès

Hi,

I agree. I can't think of an easy way to avoid this kind of clashes
between BioC and non-BioC S4 generics, other than by having things
like sort() already defined as an S4 generic in base R.

Note that, just having setMethod("sort", ...) in your package Ulrich, 
and not putting a setGeneric() statement, is usually the right thing

to do. Because then there will be no clash as long as everybody else
does the same thing. That's because if several packages with a
setMethod("sort", ...) statement are loaded, an implicit S4 generic
is created when the 1st package is loaded, and then "sort" methods
are attached to it when subsequent packages are loaded.

Unfortunately, the implicit S4 generic one gets when doing this
doesn't always have an optimum signature. For example, for sort()
we get dispatch on 'x' *and* 'decreasing':

  > sort
  standardGeneric for "sort" defined from package "base"

  function (x, decreasing = FALSE, ...)
  standardGeneric("sort")
  
  Methods may be defined for arguments: x, decreasing
  Use  showMethods("sort")  for currently available ones.

This is why in BiocGenerics we have:

  setGeneric("sort", signature="x")

The downside of this is that now if you load BiocGenerics after your
package, a new S4 generic is created for sort(), which overrides the
implicit S4 generic that was created when your package was loaded.
Of course we wouldn't need to do this in BiocGenerics if the implicit
S4 generic for sort() had the correct signature, or if this setGeneric()
statement we have in BiocGenerics was somewhere in base R.

Another reason for explicitly promoting some base R functions into
S4 generics in BiocGenerics is to have a man page for the generic.
That gives us a place to document some aspects of the S4 generic that
are not covered by the base man page. That's why BiocGenerics has
things like:

  setGeneric("nrow")

  setGeneric("relist")

The signatures of these generics is the same as the signature of
the implicit generic! But these explicit generics can be exported
and documented.

Back to the original issue: In the particular case of sort() though,
since base::sort() is an S3 generic, one possible workaround for you
is to define an S3 method for your objects.

Cheers,
H.


On 03/26/2014 06:44 AM, Michael Lawrence wrote:

That might be worth thinking about generally, but it would still be nice to
have the base generics pre-defined, so that people are not copy and pasting
the definitions everywhere, hoping that they stay consistent.


On Wed, Mar 26, 2014 at 6:13 AM, Gabriel Becker wrote:


Perhaps a patch to R such that generics don't clobber each-other's method
tables if the signatures agree? I haven't dug deeply, but simply merging
the method tables seems like it would be safe when there are no conflicts.

That way this type of multiplicity would not be a problem, though it
wouldn't help (as it shouldn't) if the two generics didn't agree on
signature or both carried methods for the same class signature.

~G


On Wed, Mar 26, 2014 at 4:38 AM, Michael Lawrence <
lawrence.mich...@gene.com> wrote:


The BiocGenerics package was designed to solve this issue within
Bioconductor. It wouldn't be the worst thing in the world to depend on the
simple BiocGenerics package for now, but ideally the base generics would
be
defined higher up, perhaps in the methods package itself. Maybe someone
else has a more creative solution, but any sort of conditional/dynamic
approach would probably be too problematic in comparison.

Michael



On Wed, Mar 26, 2014 at 4:26 AM, Ulrich Bodenhofer <
bodenho...@bioinf.jku.at

wrote:



[cross-posted to R-devel and bioc-devel]

Hi,

I am trying to implement a 'sort' method in one of the CRAN packages I

am

maintaining ('apcluster'). I started with using setMethod("sort", ...)

in

my package, which worked fine. Since many users of my package are from

the

bioinformatics field, I want to ensure that my package works smoothly

with

Bioconductor. The problem is that the BiocGenerics package also

redefines

'sort' as an S4 generic. If I load BiocGenerics before my package,
everything is fine. If I load BiocGeneric after I have loaded my

package,

my setMethod("sort", ...) is overridden by BiocGenerics and does not

work

anymore. A simple solution would be to import BiocGenerics in my

package,

but I do not want this, since many of this package's users are outside

the

bioinformatics domain. Moreover, I am reluctant to include a dependency

to

a Bioconductor package in a CRAN package. I thought that maybe I could
protect my setMethod("sort", ...) from being overridden by BiocGeneric

by

sealed=TRUE, but that did not work either. Any ideas are gratefully
appreciated!

Thanks a lot,
Ulrich



*Dr. Ulrich Bodenhofer*
Associate Professor
Institute of Bioinformatics

*Johannes Kepler University*
Altenberger Str. 69
4040 Linz, Austria

Tel. +43 732 2468 4526
Fax +43 732 2468 4539
bodenho.

Re: [Rd] NOTE when detecting mismatch in output, and codes for NOTEs, WARNINGs and ERRORs

2014-03-26 Thread Paul Gilbert



On 03/26/2014 04:58 AM, Kirill Müller wrote:

Dear list


It is possible to store expected output for tests and examples. From the
manual: "If tests has a subdirectory Examples containing a file
pkg-Ex.Rout.save, this is compared to the output file for running the
examples when the latter are checked." And, earlier (written in the
context of test output, but apparently applies here as well): "...,
these two are compared, with differences being reported but not causing
an error."

I think a NOTE would be appropriate here, in order to be able to detect
this by only looking at the summary. Is there a reason for not flagging
differences here?


The problem is that differences occur too often because this is a 
comparison of characters in the output files (a diff). Any output that 
is affected by locale, node name or Internet downloads, time, host, or 
OS, is likely to cause a difference. Also, if you print results to a 
high precision you will get differences on different systems, depending 
on OS, 32 vs 64 bit, numerical libraries, etc.  A better test strategy 
when it is numerical results that you want to compare is to do a 
numerical comparison and throw an error if the result is not good, 
something like


  r <- result from your function
  rGood <- known good value
  fuzz <- 1e-12  #tolerance

  if (fuzz < max(abs(r - rGood))) stop('Test xxx failed.')

It is more work to set up, but the maintenance will be less, especially 
when you consider that your tests need to run on different OSes on CRAN.


You can also use try() and catch error codes if you want to check those.

Paul



The following is slightly related: Some compilers and static code
analysis tools assign a numeric code to each type of error or warning
they check for, and print it. Would that be possible to do for the
anomalies detected by R CMD check? The most significant digit could
denote the "severity" of the NOTE, WARNING or ERROR. This would further
simplify (semi-)automated analysis of the output of R CMD check, e.g. in
the context of automated tests.


Best regards

Kirill

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] R-devel Digest, Vol 133, Issue 23

2014-03-26 Thread Radford Neal
> From: Richard Cotton 
> 
> The rep function is very versatile, but that versatility comes at a
> cost: it takes a bit of effort to learn (and remember) its syntax.
> This is a problem, since rep is one of the first functions many
> beginners will come across.  Of the three main uses of rep, two have
> simpler alternatives.
> 
> rep(x, times = ) has rep.int
> rep(x, length.out  = ) has rep_len
> 
> I think that a rep_each function would be a worthy addition for the
> third use case
> 
> rep(x, each = )
> 
> (It might also be worth having rep_times as a synonym for rep.int.)

I think this is exactly the wrong approach.  Indeed, the aim should be
to get rid of functions like rep.int (or at least discourage their
use, even if they have to be kept for compatibility).

Why is rep_each(x,n) better than rep(x,each=n)?  There is no saving in
typing (which would be trivial anyway).  There *ought* to be no
significant difference in speed (though that seems to have been the
motive for rep.int).  Are you trying to let students learn R without
ever learning about specifying arguments by name?

And where would you stop?  How about seq_by(a,b,s) rather than having
to arduously type seq(a,b,by=s)?  Maybe we should have glm_binomial,
glm_poisson, etc. so we don't have to remember the "family" argument?
This way lies madness...

   Radford Neal

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] internal string comparison (Scollate)

2014-03-26 Thread Gabriel Becker
On Wed, Mar 26, 2014 at 9:50 AM, Dirk Eddelbuettel  wrote:

>
> On 26 March 2014 at 17:22, Romain François wrote:
> | I'd like to compare two strings internally the way R would, so I need
> Scollate which is not part of the authorized R api.
> |
> | So:
> |  - Can Scollate (and perhaps Seql) be promoted to api ?
> |  - If not what are the alternatives ? Using strcmp or stroll does not
> seem as general as Scollate.
>
> I'd add a third option:
>
>- Put this in a new package and register the functions you want.
>

That would not achieve what Romain wants. Or rather, it would when he did
it, but would not be guaranteed to do so at any point after the next
release of R.

My understanding of his request is that he wants something that will
"behave as R will", not "behave as R does now".

All that having been said, not every single thing R does internally can be
public and of course I'm not privy or a party to discussions on what should
and shouldn't be. But freezing and duplicating small pieces of R's
non-exported internal code seems like a dangerous move to me.

~G




-- 
Gabriel Becker
Graduate Student
Statistics Department
University of California, Davis

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] internal string comparison (Scollate)

2014-03-26 Thread Dirk Eddelbuettel

On 26 March 2014 at 17:22, Romain François wrote:
| I’d like to compare two strings internally the way R would, so I need 
Scollate which is not part of the authorized R api. 
| 
| So: 
|  - Can Scollate (and perhaps Seql) be promoted to api ?
|  - If not what are the alternatives ? Using strcmp or stroll does not seem as 
general as Scollate. 

I'd add a third option:

   - Put this in a new package and register the functions you want.

as I don't see a R Core change on the public vs non-public APIs anytime soon.

But we have CRAN, and we can cheaply and reliably import functions from other
packages.  Yes, it is code duplication, but it offers a layer of indirection
that permits possibles changes to such an API.

I have been meaning to put such a package up for serialization code. It will
contain a few lines of C code from base R. Junji and Ei-ji have already
placed this in their Rhpc package. I am using them (in yet another copy) in
my unfinished Redis package in order to get to serialization from C++.

Dirk

-- 
Dirk Eddelbuettel | e...@debian.org | http://dirk.eddelbuettel.com

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Conflicting definitions for function redefined as S4 generics

2014-03-26 Thread John Chambers
I haven't looked at this in detail, but my guess is the following is the 
distinction:


A simple call setGeneric("sort") makes a generic of the existing 
function _with the existing package_:


> setGeneric("sort")
[1] "sort"
> sort
standardGeneric for "sort" defined from package "base"

function (x, decreasing = FALSE, ...)
standardGeneric("sort")

Methods may be defined for arguments: x, decreasing
Use  showMethods("sort")  for currently available ones.

The same thing will, I believe, happen automatically if one calls 
setMethod() without a prior call to setGeneric().


What BioGenerics does is different:  it excludes the two trailing 
arguments and so creates a new generic in its own namespace.


Similarly (from the global environment in this case):

> setGeneric("sort", signature="x")
Creating a new generic function for 'sort' in the global environment
[1] "sort"
> sort
standardGeneric for "sort" defined from package ".GlobalEnv"

function (x, decreasing = FALSE, ...)
standardGeneric("sort")

Methods may be defined for arguments: x
Use  showMethods("sort")  for currently available ones.


When packages are loaded, the methods in the new package are installed 
in the generic function (in memory) that corresponds to the information 
in the methods as to generic name and package slot.


As Duncan points out, it's essential to keep functions of the same name 
but different packages distinct.  Like all R objects, generic functions 
are referred to by the combination of a name and an environment, here a 
package namespace.


Just how this sorts out into the symptoms reported I can't say, but I 
suspect this is the underlying issue.


John





On 3/26/14, 7:11 AM, Ulrich Bodenhofer wrote:

First of all, thanks for the very interesting and encouraging replies
that have been posted so far!

Let me quickly add what I have tried up to now:

- setMethod("sort", signature("ExClust"), function(x, decreasing=FALSE,
%...%) %...% , sealed=TRUE) without any call to setGeneric(), i.e.
assuming that setMethod() would implicitly create an S4 generic out of
the S3 method sort(). Note that '%...%' in the code snippet stands for
some details that I left out.

- setGeneric("sort", def=function(x, decreasing=FALSE, ...)
standardGeneric("sort")), i.e. consistency with the S3 generic of sort()
in 'base', plus the call to setMethod() as shown above.

- setGeneric("sort", signature="x"), i.e. consistency with the generic's
definition in BiocGenerics, as suggested by Martin Morgan, plus the call
to setMethod() as shown above.

For all three trials, the result was exactly the same: (1) everything
works nicely if I load BiocGenerics before apcluster; (2) if I load
BiocGenerics after apcluster, apcluster's sort() function is broken and
gives the following error:

Error in rank(x, ties.method = "min", na.last = "keep") :
   unimplemented type 'list' in 'greater'
In addition: Warning message:
In is.na(x) : is.na() applied to non-(list or vector) of type 'S4'

Obviously, sort() is dispatched to the definition made by BiocGenerics:

 > showMethods("sort", includeDefs=TRUE)
Function: sort (package BiocGenerics)
x="ANY"
function (x, decreasing = FALSE, ...)
{
 if (!is.logical(decreasing) || length(decreasing) != 1L)
 stop("'decreasing' must be a length-1 logical vector.\nDid you
intend to set 'partial'?")
 UseMethod("sort")
}

So the method registered for class 'ExClust' is  lost if BiocGenerics is
attached. Just for your information: all these tests have been done with
R 3.0.2 and Bioconductor 2.13 (BiocGenerics version 0.8.0).

Thanks and best regards,
Ulrich



On 03/26/2014 02:48 PM, Duncan Murdoch wrote:

On 26/03/2014, 9:13 AM, Gabriel Becker wrote:

Perhaps a patch to R such that generics don't clobber each-other's
method
tables if the signatures agree? I haven't dug deeply, but simply merging
the method tables seems like it would be safe when there are no
conflicts.

That way this type of multiplicity would not be a problem, though it
wouldn't help (as it shouldn't) if the two generics didn't agree on
signature or both carried methods for the same class signature.


I don't think R should base the decision on the signature.

There are two very different situations where this might come up. In
one, package A and package B might both define a generic named foo()
that happens to have the same signature, but with nothing in common.
That should be allowed, and should behave the same as when they both
create functions with the same name:  it should be up to the user to
specify which generic is being called.  If R merged the two generics
into one, there would be chaos.

The other situation is more likely to apply to this case.  It sounds
as though both apcluster and BiocGenerics are creating a sort()
generic by promoting the base package S3 generic into an S4 generic.
Clearly they should not be creating separate generics, there's just one.

I don't know if there's something wrong with the way apcluster or
BiocGenerics are

[Rd] internal string comparison (Scollate)

2014-03-26 Thread Romain François
Hello, 

I’d like to compare two strings internally the way R would, so I need Scollate 
which is not part of the authorized R api. 

So: 
 - Can Scollate (and perhaps Seql) be promoted to api ?
 - If not what are the alternatives ? Using strcmp or stroll does not seem as 
general as Scollate. 

Romain

PS: Here is some context: https://github.com/hadley/dplyr/issues/325
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] The case for freezing CRAN

2014-03-26 Thread Geoff Jentry

On Thu, 20 Mar 2014, Dirk Eddelbuettel wrote:

o Roger correctly notes that R scripts and packages are just one issue.
  Compilers, libraries and the OS matter.  To me, the natural approach these
  days would be to think of something based on Docker or Vagrant or (if you
  must, VirtualBox).  The newer alternatives make snapshotting very cheap
  (eg by using Linux LXC).  That approach reproduces a full environemnt as
  best as we can while still ignoring the hardware layer (and some readers
  may recall the infamous Pentium bug of two decades ago).


At one of my previous jobs we did effectively this (albeit in a lower tech 
fashion). Every project had its own environment, complete with the exact 
snapshot of R & packages used, etc. All scripts/code was kept in that 
environment in a versioned fashion such that at any point one could go to 
any stage of development of that paper/project's analysis and reproduce it 
exactly.


It was hugely inefficient in terms of storage, but it solved the problem 
we're discussing here. As you note, with the tools available today it'd be 
trivial to distribute that environment for people to reproduce results.


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Conflicting definitions for function redefined as S4 generics

2014-03-26 Thread Ulrich Bodenhofer
First of all, thanks for the very interesting and encouraging replies 
that have been posted so far!


Let me quickly add what I have tried up to now:

- setMethod("sort", signature("ExClust"), function(x, decreasing=FALSE, 
%...%) %...% , sealed=TRUE) without any call to setGeneric(), i.e. 
assuming that setMethod() would implicitly create an S4 generic out of 
the S3 method sort(). Note that '%...%' in the code snippet stands for 
some details that I left out.


- setGeneric("sort", def=function(x, decreasing=FALSE, ...) 
standardGeneric("sort")), i.e. consistency with the S3 generic of sort() 
in 'base', plus the call to setMethod() as shown above.


- setGeneric("sort", signature="x"), i.e. consistency with the generic's 
definition in BiocGenerics, as suggested by Martin Morgan, plus the call 
to setMethod() as shown above.


For all three trials, the result was exactly the same: (1) everything 
works nicely if I load BiocGenerics before apcluster; (2) if I load 
BiocGenerics after apcluster, apcluster's sort() function is broken and 
gives the following error:


Error in rank(x, ties.method = "min", na.last = "keep") :
  unimplemented type 'list' in 'greater'
In addition: Warning message:
In is.na(x) : is.na() applied to non-(list or vector) of type 'S4'

Obviously, sort() is dispatched to the definition made by BiocGenerics:

> showMethods("sort", includeDefs=TRUE)
Function: sort (package BiocGenerics)
x="ANY"
function (x, decreasing = FALSE, ...)
{
if (!is.logical(decreasing) || length(decreasing) != 1L)
stop("'decreasing' must be a length-1 logical vector.\nDid you 
intend to set 'partial'?")

UseMethod("sort")
}

So the method registered for class 'ExClust' is  lost if BiocGenerics is 
attached. Just for your information: all these tests have been done with 
R 3.0.2 and Bioconductor 2.13 (BiocGenerics version 0.8.0).


Thanks and best regards,
Ulrich



On 03/26/2014 02:48 PM, Duncan Murdoch wrote:

On 26/03/2014, 9:13 AM, Gabriel Becker wrote:
Perhaps a patch to R such that generics don't clobber each-other's 
method

tables if the signatures agree? I haven't dug deeply, but simply merging
the method tables seems like it would be safe when there are no 
conflicts.


That way this type of multiplicity would not be a problem, though it
wouldn't help (as it shouldn't) if the two generics didn't agree on
signature or both carried methods for the same class signature.


I don't think R should base the decision on the signature.

There are two very different situations where this might come up. In 
one, package A and package B might both define a generic named foo() 
that happens to have the same signature, but with nothing in common. 
That should be allowed, and should behave the same as when they both 
create functions with the same name:  it should be up to the user to 
specify which generic is being called.  If R merged the two generics 
into one, there would be chaos.


The other situation is more likely to apply to this case.  It sounds 
as though both apcluster and BiocGenerics are creating a sort() 
generic by promoting the base package S3 generic into an S4 generic.  
Clearly they should not be creating separate generics, there's just one.


I don't know if there's something wrong with the way apcluster or 
BiocGenerics are doing things, or something wrong with the way the 
methods package is creating the generic, but it sure looks like a bug 
somewhere.


Duncan Murdoch



~G


On Wed, Mar 26, 2014 at 4:38 AM, Michael Lawrence 

wrote:



The BiocGenerics package was designed to solve this issue within
Bioconductor. It wouldn't be the worst thing in the world to depend 
on the
simple BiocGenerics package for now, but ideally the base generics 
would be

defined higher up, perhaps in the methods package itself. Maybe someone
else has a more creative solution, but any sort of conditional/dynamic
approach would probably be too problematic in comparison.

Michael



On Wed, Mar 26, 2014 at 4:26 AM, Ulrich Bodenhofer <
bodenho...@bioinf.jku.at

wrote:



[cross-posted to R-devel and bioc-devel]

Hi,

I am trying to implement a 'sort' method in one of the CRAN 
packages I am
maintaining ('apcluster'). I started with using setMethod("sort", 
...) in

my package, which worked fine. Since many users of my package are from

the

bioinformatics field, I want to ensure that my package works smoothly

with
Bioconductor. The problem is that the BiocGenerics package also 
redefines

'sort' as an S4 generic. If I load BiocGenerics before my package,
everything is fine. If I load BiocGeneric after I have loaded my 
package,
my setMethod("sort", ...) is overridden by BiocGenerics and does 
not work
anymore. A simple solution would be to import BiocGenerics in my 
package,

but I do not want this, since many of this package's users are outside

the
bioinformatics domain. Moreover, I am reluctant to include a 
dependency

to

a Bioconductor package in a CRAN package. I thought that maybe I could

Re: [Rd] Conflicting definitions for function redefined as S4 generics

2014-03-26 Thread Duncan Murdoch

On 26/03/2014, 9:13 AM, Gabriel Becker wrote:

Perhaps a patch to R such that generics don't clobber each-other's method
tables if the signatures agree? I haven't dug deeply, but simply merging
the method tables seems like it would be safe when there are no conflicts.

That way this type of multiplicity would not be a problem, though it
wouldn't help (as it shouldn't) if the two generics didn't agree on
signature or both carried methods for the same class signature.


I don't think R should base the decision on the signature.

There are two very different situations where this might come up.  In 
one, package A and package B might both define a generic named foo() 
that happens to have the same signature, but with nothing in common. 
That should be allowed, and should behave the same as when they both 
create functions with the same name:  it should be up to the user to 
specify which generic is being called.  If R merged the two generics 
into one, there would be chaos.


The other situation is more likely to apply to this case.  It sounds as 
though both apcluster and BiocGenerics are creating a sort() generic by 
promoting the base package S3 generic into an S4 generic.  Clearly they 
should not be creating separate generics, there's just one.


I don't know if there's something wrong with the way apcluster or 
BiocGenerics are doing things, or something wrong with the way the 
methods package is creating the generic, but it sure looks like a bug 
somewhere.


Duncan Murdoch



~G


On Wed, Mar 26, 2014 at 4:38 AM, Michael Lawrence 
wrote:



The BiocGenerics package was designed to solve this issue within
Bioconductor. It wouldn't be the worst thing in the world to depend on the
simple BiocGenerics package for now, but ideally the base generics would be
defined higher up, perhaps in the methods package itself. Maybe someone
else has a more creative solution, but any sort of conditional/dynamic
approach would probably be too problematic in comparison.

Michael



On Wed, Mar 26, 2014 at 4:26 AM, Ulrich Bodenhofer <
bodenho...@bioinf.jku.at

wrote:



[cross-posted to R-devel and bioc-devel]

Hi,

I am trying to implement a 'sort' method in one of the CRAN packages I am
maintaining ('apcluster'). I started with using setMethod("sort", ...) in
my package, which worked fine. Since many users of my package are from

the

bioinformatics field, I want to ensure that my package works smoothly

with

Bioconductor. The problem is that the BiocGenerics package also redefines
'sort' as an S4 generic. If I load BiocGenerics before my package,
everything is fine. If I load BiocGeneric after I have loaded my package,
my setMethod("sort", ...) is overridden by BiocGenerics and does not work
anymore. A simple solution would be to import BiocGenerics in my package,
but I do not want this, since many of this package's users are outside

the

bioinformatics domain. Moreover, I am reluctant to include a dependency

to

a Bioconductor package in a CRAN package. I thought that maybe I could
protect my setMethod("sort", ...) from being overridden by BiocGeneric by
sealed=TRUE, but that did not work either. Any ideas are gratefully
appreciated!

Thanks a lot,
Ulrich



*Dr. Ulrich Bodenhofer*
Associate Professor
Institute of Bioinformatics

*Johannes Kepler University*
Altenberger Str. 69
4040 Linz, Austria

Tel. +43 732 2468 4526
Fax +43 732 2468 4539
bodenho...@bioinf.jku.at 
http://www.bioinf.jku.at/ 

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel



 [[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel







__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Conflicting definitions for function redefined as S4 generics

2014-03-26 Thread Michael Lawrence
That might be worth thinking about generally, but it would still be nice to
have the base generics pre-defined, so that people are not copy and pasting
the definitions everywhere, hoping that they stay consistent.


On Wed, Mar 26, 2014 at 6:13 AM, Gabriel Becker wrote:

> Perhaps a patch to R such that generics don't clobber each-other's method
> tables if the signatures agree? I haven't dug deeply, but simply merging
> the method tables seems like it would be safe when there are no conflicts.
>
> That way this type of multiplicity would not be a problem, though it
> wouldn't help (as it shouldn't) if the two generics didn't agree on
> signature or both carried methods for the same class signature.
>
> ~G
>
>
> On Wed, Mar 26, 2014 at 4:38 AM, Michael Lawrence <
> lawrence.mich...@gene.com> wrote:
>
>> The BiocGenerics package was designed to solve this issue within
>> Bioconductor. It wouldn't be the worst thing in the world to depend on the
>> simple BiocGenerics package for now, but ideally the base generics would
>> be
>> defined higher up, perhaps in the methods package itself. Maybe someone
>> else has a more creative solution, but any sort of conditional/dynamic
>> approach would probably be too problematic in comparison.
>>
>> Michael
>>
>>
>>
>> On Wed, Mar 26, 2014 at 4:26 AM, Ulrich Bodenhofer <
>> bodenho...@bioinf.jku.at
>> > wrote:
>>
>> > [cross-posted to R-devel and bioc-devel]
>> >
>> > Hi,
>> >
>> > I am trying to implement a 'sort' method in one of the CRAN packages I
>> am
>> > maintaining ('apcluster'). I started with using setMethod("sort", ...)
>> in
>> > my package, which worked fine. Since many users of my package are from
>> the
>> > bioinformatics field, I want to ensure that my package works smoothly
>> with
>> > Bioconductor. The problem is that the BiocGenerics package also
>> redefines
>> > 'sort' as an S4 generic. If I load BiocGenerics before my package,
>> > everything is fine. If I load BiocGeneric after I have loaded my
>> package,
>> > my setMethod("sort", ...) is overridden by BiocGenerics and does not
>> work
>> > anymore. A simple solution would be to import BiocGenerics in my
>> package,
>> > but I do not want this, since many of this package's users are outside
>> the
>> > bioinformatics domain. Moreover, I am reluctant to include a dependency
>> to
>> > a Bioconductor package in a CRAN package. I thought that maybe I could
>> > protect my setMethod("sort", ...) from being overridden by BiocGeneric
>> by
>> > sealed=TRUE, but that did not work either. Any ideas are gratefully
>> > appreciated!
>> >
>> > Thanks a lot,
>> > Ulrich
>> >
>> >
>> > 
>> > *Dr. Ulrich Bodenhofer*
>> > Associate Professor
>> > Institute of Bioinformatics
>> >
>> > *Johannes Kepler University*
>> > Altenberger Str. 69
>> > 4040 Linz, Austria
>> >
>> > Tel. +43 732 2468 4526
>> > Fax +43 732 2468 4539
>> > bodenho...@bioinf.jku.at 
>> > http://www.bioinf.jku.at/ 
>> >
>> > __
>> > R-devel@r-project.org mailing list
>> > https://stat.ethz.ch/mailman/listinfo/r-devel
>> >
>>
>> [[alternative HTML version deleted]]
>>
>>
>> __
>> R-devel@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>
>
>
>
> --
> Gabriel Becker
> Graduate Student
> Statistics Department
> University of California, Davis
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Conflicting definitions for function redefined as S4 generics

2014-03-26 Thread Gabriel Becker
Perhaps a patch to R such that generics don't clobber each-other's method
tables if the signatures agree? I haven't dug deeply, but simply merging
the method tables seems like it would be safe when there are no conflicts.

That way this type of multiplicity would not be a problem, though it
wouldn't help (as it shouldn't) if the two generics didn't agree on
signature or both carried methods for the same class signature.

~G


On Wed, Mar 26, 2014 at 4:38 AM, Michael Lawrence  wrote:

> The BiocGenerics package was designed to solve this issue within
> Bioconductor. It wouldn't be the worst thing in the world to depend on the
> simple BiocGenerics package for now, but ideally the base generics would be
> defined higher up, perhaps in the methods package itself. Maybe someone
> else has a more creative solution, but any sort of conditional/dynamic
> approach would probably be too problematic in comparison.
>
> Michael
>
>
>
> On Wed, Mar 26, 2014 at 4:26 AM, Ulrich Bodenhofer <
> bodenho...@bioinf.jku.at
> > wrote:
>
> > [cross-posted to R-devel and bioc-devel]
> >
> > Hi,
> >
> > I am trying to implement a 'sort' method in one of the CRAN packages I am
> > maintaining ('apcluster'). I started with using setMethod("sort", ...) in
> > my package, which worked fine. Since many users of my package are from
> the
> > bioinformatics field, I want to ensure that my package works smoothly
> with
> > Bioconductor. The problem is that the BiocGenerics package also redefines
> > 'sort' as an S4 generic. If I load BiocGenerics before my package,
> > everything is fine. If I load BiocGeneric after I have loaded my package,
> > my setMethod("sort", ...) is overridden by BiocGenerics and does not work
> > anymore. A simple solution would be to import BiocGenerics in my package,
> > but I do not want this, since many of this package's users are outside
> the
> > bioinformatics domain. Moreover, I am reluctant to include a dependency
> to
> > a Bioconductor package in a CRAN package. I thought that maybe I could
> > protect my setMethod("sort", ...) from being overridden by BiocGeneric by
> > sealed=TRUE, but that did not work either. Any ideas are gratefully
> > appreciated!
> >
> > Thanks a lot,
> > Ulrich
> >
> >
> > 
> > *Dr. Ulrich Bodenhofer*
> > Associate Professor
> > Institute of Bioinformatics
> >
> > *Johannes Kepler University*
> > Altenberger Str. 69
> > 4040 Linz, Austria
> >
> > Tel. +43 732 2468 4526
> > Fax +43 732 2468 4539
> > bodenho...@bioinf.jku.at 
> > http://www.bioinf.jku.at/ 
> >
> > __
> > R-devel@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
> >
>
> [[alternative HTML version deleted]]
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>



-- 
Gabriel Becker
Graduate Student
Statistics Department
University of California, Davis

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Conflicting definitions for function redefined as S4 generics

2014-03-26 Thread Michael Lawrence
The BiocGenerics package was designed to solve this issue within
Bioconductor. It wouldn't be the worst thing in the world to depend on the
simple BiocGenerics package for now, but ideally the base generics would be
defined higher up, perhaps in the methods package itself. Maybe someone
else has a more creative solution, but any sort of conditional/dynamic
approach would probably be too problematic in comparison.

Michael



On Wed, Mar 26, 2014 at 4:26 AM, Ulrich Bodenhofer  wrote:

> [cross-posted to R-devel and bioc-devel]
>
> Hi,
>
> I am trying to implement a 'sort' method in one of the CRAN packages I am
> maintaining ('apcluster'). I started with using setMethod("sort", ...) in
> my package, which worked fine. Since many users of my package are from the
> bioinformatics field, I want to ensure that my package works smoothly with
> Bioconductor. The problem is that the BiocGenerics package also redefines
> 'sort' as an S4 generic. If I load BiocGenerics before my package,
> everything is fine. If I load BiocGeneric after I have loaded my package,
> my setMethod("sort", ...) is overridden by BiocGenerics and does not work
> anymore. A simple solution would be to import BiocGenerics in my package,
> but I do not want this, since many of this package's users are outside the
> bioinformatics domain. Moreover, I am reluctant to include a dependency to
> a Bioconductor package in a CRAN package. I thought that maybe I could
> protect my setMethod("sort", ...) from being overridden by BiocGeneric by
> sealed=TRUE, but that did not work either. Any ideas are gratefully
> appreciated!
>
> Thanks a lot,
> Ulrich
>
>
> 
> *Dr. Ulrich Bodenhofer*
> Associate Professor
> Institute of Bioinformatics
>
> *Johannes Kepler University*
> Altenberger Str. 69
> 4040 Linz, Austria
>
> Tel. +43 732 2468 4526
> Fax +43 732 2468 4539
> bodenho...@bioinf.jku.at 
> http://www.bioinf.jku.at/ 
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Conflicting definitions for function redefined as S4 generics

2014-03-26 Thread Ulrich Bodenhofer

[cross-posted to R-devel and bioc-devel]

Hi,

I am trying to implement a 'sort' method in one of the CRAN packages I 
am maintaining ('apcluster'). I started with using setMethod("sort", 
...) in my package, which worked fine. Since many users of my package 
are from the bioinformatics field, I want to ensure that my package 
works smoothly with Bioconductor. The problem is that the BiocGenerics 
package also redefines 'sort' as an S4 generic. If I load BiocGenerics 
before my package, everything is fine. If I load BiocGeneric after I 
have loaded my package, my setMethod("sort", ...) is overridden by 
BiocGenerics and does not work anymore. A simple solution would be to 
import BiocGenerics in my package, but I do not want this, since many of 
this package's users are outside the bioinformatics domain. Moreover, I 
am reluctant to include a dependency to a Bioconductor package in a CRAN 
package. I thought that maybe I could protect my setMethod("sort", ...) 
from being overridden by BiocGeneric by sealed=TRUE, but that did not 
work either. Any ideas are gratefully appreciated!


Thanks a lot,
Ulrich



*Dr. Ulrich Bodenhofer*
Associate Professor
Institute of Bioinformatics

*Johannes Kepler University*
Altenberger Str. 69
4040 Linz, Austria

Tel. +43 732 2468 4526
Fax +43 732 2468 4539
bodenho...@bioinf.jku.at 
http://www.bioinf.jku.at/ 

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] NOTE when detecting mismatch in output, and codes for NOTEs, WARNINGs and ERRORs

2014-03-26 Thread Kirill Müller

Dear list


It is possible to store expected output for tests and examples. From the 
manual: "If tests has a subdirectory Examples containing a file 
pkg-Ex.Rout.save, this is compared to the output file for running the 
examples when the latter are checked." And, earlier (written in the 
context of test output, but apparently applies here as well): "..., 
these two are compared, with differences being reported but not causing 
an error."


I think a NOTE would be appropriate here, in order to be able to detect 
this by only looking at the summary. Is there a reason for not flagging 
differences here?


The following is slightly related: Some compilers and static code 
analysis tools assign a numeric code to each type of error or warning 
they check for, and print it. Would that be possible to do for the 
anomalies detected by R CMD check? The most significant digit could 
denote the "severity" of the NOTE, WARNING or ERROR. This would further 
simplify (semi-)automated analysis of the output of R CMD check, e.g. in 
the context of automated tests.



Best regards

Kirill

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel