[Rd] Subject: Milestone: 10000 packages on CRAN

2017-01-27 Thread Henrik Bengtsson
Continuing the tradition to post millennia milestones on CRAN:

So, it happened. Today (January 27, 2017 PCT) CRAN reached 10,000 packages [1].

Needless to say, the rate with which new packages are added to CRAN
keeps increasing and so does the number of contributors (maintainers).
Somewhere out there, there are ~3 persons who are about to submit
their first packages to CRAN today and ~3 persons who will submit
another package of theirs. And by the amazing work of the CRAN team,
these packages are inspected and quality controlled before going live
- which often happens within a day or so.

As usual and it can't be said too many times: A big thank you to the
CRAN team, to the R core, to all package developers, to our friendly
community, to everyone out there helping others, and to various online
services that simplify package development. We can all give back by
carefully reporting bugs to the maintainers, properly citing packages
we use in publications (see citation("pkg")), and help newcomers to
use R.


Milestones:

2017-01-27 1 pkgs (+6.3/day over 158 days) 5845 mnts (+3.5/day)
2016-08-22 9000 pkgs (+5.7/day over 175 days) 5289 mnts (+5.8/day)
2016-02-29 8000 pkgs (+5.0/day over 201 days) 4279 mnts (+0.7/day)
2015-08-12 7000 pkgs (+3.4/day over 287 days) 4130 mnts (+2.4/day)
2014-10-29 6000 pkgs (+3.0/day over 335 days) 3444 mnts (+1.6/day)
2013-11-08 5000 pkgs (+2.7/day over 442 days) 2900 mnts (+1.2/day)
2012-08-23 4000 pkgs (+2.1/day over 469 days) 2350 mnts
2011-05-12 3000 pkgs (+1.7/day over 585 days)
2009-10-04 2000 pkgs (+1.1/day over 906 days)
2007-04-12 1000 pkgs
2004-10-01 500 pkgs
2003-04-01 250 pkgs
2002-09-17 68 pkgs
1997-04-23 12 pkgs

These data are for CRAN only [1-13]. There are many more packages
elsewhere, e.g. Bioconductor, GitHub, R-Forge etc.

[1] http://cran.r-project.org/web/packages/
[2] https://en.wikipedia.org/wiki/R_(programming_language)#Milestones
[3] http://www.r-pkg.org/
[4] Private data
[5] https://stat.ethz.ch/pipermail/r-devel/2007-April/045359.html
[6] https://stat.ethz.ch/pipermail/r-devel/2009-October/055049.html
[7] https://stat.ethz.ch/pipermail/r-devel/2011-May/061002.html
[8] https://stat.ethz.ch/pipermail/r-devel/2012-August/064675.html
[9] https://stat.ethz.ch/pipermail/r-devel/2013-November/067935.html
[10] https://stat.ethz.ch/pipermail/r-devel/2014-October/069997.html
[11] https://stat.ethz.ch/pipermail/r-package-devel/2015q3/000393.html
[12] https://stat.ethz.ch/pipermail/r-devel/2016-February/072388.html
[13] https://stat.ethz.ch/pipermail/r-devel/2016-August/073011.html

All the best,

Henrik
(just a user)

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Matrix package breaks as.matrix method

2017-01-27 Thread Robert McGehee
Hi,
The Matrix package and the as.matrix method do not seem to be compatible inside 
of a package.

Here's an example. I've created a simple package "mat" that defines an 
eponymous class and as.matrix method on that class. All is well, unless that 
package has the Matrix package in its Depends or Imports (and imports, e.g. the 
"Diagonal" function). Then my as.matrix method stops working, even if I'm not 
using any part of the Matrix package.

Here's an example on R 3.3.2:

First, create an empty package "mat" (e.g. with package.skeleton) with one file 
in mat/R/mat.R with the following contents:

setClass("mat", representation(H="matrix"))
mat <- function(H) new("mat", H=H)
setMethod("as.matrix", signature("mat"), function(x, ...) crossprod(x@H))
testmat <- function() {
H <- matrix(1:3, 1, 3)
M <- mat(H)
as.matrix(M)
}

Then install the mat package :
> require(mat)
> testmat()
 [,1] [,2] [,3]
[1,]123
[2,]246
[3,]369

All works fine!

Now add "Depends: Matrix" into the package's DESCRIPTION file (alternatively 
add "Imports: Matrix" and 'importFrom("Matrix","Diagonal")' in the NAMESPACE).

Try again:
> require(mat)
> testmat()
Error in as.vector(data) : 
  no method for coercing this S4 class to a vector

Bug? If not, can anyone provide a work around? In my case, I'd like to mix 
matrix and Matrix functions in my package, but am obviously having difficulty.

I've come across a somewhat similar report on stackoverflow 
http://stackoverflow.com/questions/13812373/overloading-operator-in-r-s4-classes-and-matrix-package
 regarding defining the "+" operator with the Matrix package, but I don't think 
the solution or the problem quite applies.

Thanks in advance, Robert

> R.version
   _   
platform   x86_64-pc-linux-gnu 
arch   x86_64  
os linux-gnu   
system x86_64, linux-gnu   
status 
major  3   
minor  3.2 
year   2016
month  10  
day31  
svn rev71607   
language   R   
version.string R version 3.3.2 (2016-10-31)
nickname   Sincere Pumpkin Patch   

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] cross-platform portable code in CRAN Repository Policy

2017-01-27 Thread Henrik Bengtsson
Second this.  As the CRAN Policies suggests, there's also the very
handy winbuilder service (https://win-builder.r-project.org/) you can
use to check your package on Windows.  This service has been a
valuable workhorse for years.

We should also mention the continuous integration (CI) services
provided for free by Travis (Linux and macOS) and AppVeyor (Windows)
in combination with GitHub (or GitLab, ...).  By adding simple
.travis.yml and appveyor.yml to your Git repos (e.g.
https://github.com/HenrikBengtsson/globals), they run R CMD check
--as-cran and covr::package_coverage() etc for you more or less on the
fly, e.g.

* https://travis-ci.org/HenrikBengtsson/globals
* https://ci.appveyor.com/project/HenrikBengtsson/globals

/Henrik

PS. Thanks to everyone who made all of the above possible.

On Fri, Jan 27, 2017 at 2:17 PM, Dirk Eddelbuettel  wrote:
>
> On 27 January 2017 at 21:54, Gábor Csárdi wrote:
> | On Fri, Jan 27, 2017 at 9:28 PM, Da Zheng  wrote:
> | > What major R platforms does this policy refer to?
> | >
> |
> | Linux, macOS, Windows.
> |
> |
> | > Currently, my package runs in Ubuntu. If it works on both Ubuntu and
> | > Redhat, does it count as two platforms?
> | >
> |
> | I think that Linux is just one. Is it hard to make it work on macOS?
> |
> | I am not saying that if it is Linux-only then it definitely cannot make it
> | to CRAN.
> | A CRAN maintainer will decide that.
>
> Gabor is *way* too modest here to not mention the *fabulous* tool he has
> written (with the [financial] support of the R Consortium):  R Hub.
>
> These days I just do'rhub::check_for_cran()'   and four tests launch
> covering the three required OSs as well as the required r-devel and r-release
> versions.  Results tickle in within minutes by mail; the windows one (which
> is slowest) is also display.  You need a one-time token handshake.
>
> I strongly recommend the service.
>
> Dirk
>
> --
> http://dirk.eddelbuettel.com | @eddelbuettel | e...@debian.org
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] cross-platform portable code in CRAN Repository Policy

2017-01-27 Thread Dirk Eddelbuettel

On 27 January 2017 at 21:54, Gábor Csárdi wrote:
| On Fri, Jan 27, 2017 at 9:28 PM, Da Zheng  wrote:
| > What major R platforms does this policy refer to?
| >
| 
| Linux, macOS, Windows.
| 
| 
| > Currently, my package runs in Ubuntu. If it works on both Ubuntu and
| > Redhat, does it count as two platforms?
| >
| 
| I think that Linux is just one. Is it hard to make it work on macOS?
| 
| I am not saying that if it is Linux-only then it definitely cannot make it
| to CRAN.
| A CRAN maintainer will decide that.

Gabor is *way* too modest here to not mention the *fabulous* tool he has
written (with the [financial] support of the R Consortium):  R Hub.

These days I just do'rhub::check_for_cran()'   and four tests launch
covering the three required OSs as well as the required r-devel and r-release
versions.  Results tickle in within minutes by mail; the windows one (which
is slowest) is also display.  You need a one-time token handshake.

I strongly recommend the service.

Dirk

-- 
http://dirk.eddelbuettel.com | @eddelbuettel | e...@debian.org

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] cross-platform portable code in CRAN Repository Policy

2017-01-27 Thread Gábor Csárdi
On Fri, Jan 27, 2017 at 9:28 PM, Da Zheng  wrote:

> Hello,
>
> I'm trying to submit my package to CRAN. When I read the policy, it says:
> Package authors should make all reasonable efforts to provide
> cross-platform portable code. Packages will not normally be accepted
> that do not run on at least two of the major R platforms.
>
> What major R platforms does this policy refer to?
>

Linux, macOS, Windows.


> Currently, my package runs in Ubuntu. If it works on both Ubuntu and
> Redhat, does it count as two platforms?
>

I think that Linux is just one. Is it hard to make it work on macOS?

I am not saying that if it is Linux-only then it definitely cannot make it
to CRAN.
A CRAN maintainer will decide that.

Gabor


>
> Thanks,
> Da
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] cross-platform portable code in CRAN Repository Policy

2017-01-27 Thread Marc Schwartz

> On Jan 27, 2017, at 3:28 PM, Da Zheng  wrote:
> 
> Hello,
> 
> I'm trying to submit my package to CRAN. When I read the policy, it says:
> Package authors should make all reasonable efforts to provide
> cross-platform portable code. Packages will not normally be accepted
> that do not run on at least two of the major R platforms.
> 
> What major R platforms does this policy refer to?
> Currently, my package runs in Ubuntu. If it works on both Ubuntu and
> Redhat, does it count as two platforms?
> 
> Thanks,
> Da


Hi,

A couple of comments:

1. For future reference, this query would have been better sent to 
R-Package-Devel, which is focused on this topic:

  https://stat.ethz.ch/mailman/listinfo/r-package-devel


2. "Major platforms" would typically refer to Linux, Windows and macOS. So 
Ubuntu and RH would be within Linux as a single platform.


Regards,

Marc Schwartz
 
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] cross-platform portable code in CRAN Repository Policy

2017-01-27 Thread Duncan Murdoch

On 27/01/2017 4:28 PM, Da Zheng wrote:

Hello,

I'm trying to submit my package to CRAN. When I read the policy, it says:
Package authors should make all reasonable efforts to provide
cross-platform portable code. Packages will not normally be accepted
that do not run on at least two of the major R platforms.

What major R platforms does this policy refer to?
Currently, my package runs in Ubuntu. If it works on both Ubuntu and
Redhat, does it count as two platforms?


No, those are both Linux.  Try to get it to run on Windows and Mac OS as 
well (and if possible, Solaris).


If you don't have access to Windows, submit it to
win-builder.r-project.org for testing.  Mac OS and Solaris are currently 
harder to test without setting up your own local systems.  Maybe someone 
else will report on available test systems for those platforms.


Duncan Murdoch

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] cross-platform portable code in CRAN Repository Policy

2017-01-27 Thread Da Zheng
Hello,

I'm trying to submit my package to CRAN. When I read the policy, it says:
Package authors should make all reasonable efforts to provide
cross-platform portable code. Packages will not normally be accepted
that do not run on at least two of the major R platforms.

What major R platforms does this policy refer to?
Currently, my package runs in Ubuntu. If it works on both Ubuntu and
Redhat, does it count as two platforms?

Thanks,
Da

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Bioc-devel] xps build problem on toluca2

2017-01-27 Thread cstrato

I will follow your advice and post on bioc-devel.

Christian


On 01/27/17 19:17, Obenchain, Valerie wrote:

Yes, I will be taking over the build responsibilities with help from Herve.

We don't want to give the impression that users need to contact us
personally to install system dependencies. All dependencies should be
listed in 'SystemRequirements' field of DESCRIPTION. xps does have root
listed so it will eventually be installed on toluca2.

I would encourage you to post on bioc-devel if the package were failing
on an official builder due to missing dependencies. The problem may be
more wide spread and it's helpful to have more than one person looking
at it.

Valerie


On 01/27/2017 09:53 AM, cstrato wrote:

Dear Valerie,

Thank you for your extensive reply.

I understand that it does not affect any  users. It was only a reminder.

Usually, I have contacted Dan personally, but I did not know whom to
write. I assume that you are now responsible for the setup. If this is
the case I will in the future contact you.

Best regards,
Christian


On 01/27/17 15:46, Obenchain, Valerie wrote:

Hi Christian,

toluca2 is in the process of being set up and oaxaca is still the
official Mac devel builder. There are a number of dependencies that need
to be addressed on the new machine and we are working through them.

It is important to understand that the error on toluca2 does not affect
you or your users in any way. Binaries are propagated from oaxaca, not
toluca2. Also, there are still no Mac binaries on CRAN for R 3.4. This
means Bioc devel users on Mac need to install from source so they aren't
concerned with the binaries at all.

It may be confusing seeing build results for both Mac devel builders -
internally we are tracking toluca2 but the one you need to watch is
oaxaca. I don't think we announced this and maybe we should have to
clarify what was going on. We will post on the mailing lists when
toluca2 officially replaces oaxaca and at that point you'll only see
toluca2 on the build report.

Valerie


On 01/26/2017 10:08 AM, cstrato wrote:

Dear all,

Since Dan, who was always very helpful and took care of the special
requirements of my package 'xps', is no longer part of the BioC group,
and I do not know who is currently responsible for the BioC servers, I
am writing to you to inform you of the problem building 'xps' on the new
Mac server 'toluca2':
http://bioconductor.org/checkResults/devel/bioc-LATEST/xps/toluca2-buildsrc.html

Could you please install ROOT on toluca2 to prevent the build error of
'xps'.

Since both, toluca2 and oaxaca, are running Mac OS X Mavericks, it
should be no problem to transfer ROOT and the corresponding settings to
toluca2.

Thank you in advance.
Best regards,
Christian
_._._._._._._._._._._._._._._._._._
C.h.r.i.s.t.i.a.n   S.t.r.a.t.o.w.a
V.i.e.n.n.a   A.u.s.t.r.i.a
e.m.a.i.l:cstrato at aon.at
_._._._._._._._._._._._._._._._._._

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel




This email message may contain legally privileged and/or confidential 
information.  If you are not the intended recipient(s), or the employee or 
agent responsible for the delivery of this message to the intended 
recipient(s), you are hereby notified that any disclosure, copying, 
distribution, or use of this email message is prohibited.  If you have received 
this message in error, please notify the sender immediately by e-mail and 
delete this email message from your computer. Thank you.





This email message may contain legally privileged and/or confidential 
information.  If you are not the intended recipient(s), or the employee or 
agent responsible for the delivery of this message to the intended 
recipient(s), you are hereby notified that any disclosure, copying, 
distribution, or use of this email message is prohibited.  If you have received 
this message in error, please notify the sender immediately by e-mail and 
delete this email message from your computer. Thank you.



___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] xps build problem on toluca2

2017-01-27 Thread Obenchain, Valerie
Yes, I will be taking over the build responsibilities with help from Herve.

We don't want to give the impression that users need to contact us
personally to install system dependencies. All dependencies should be
listed in 'SystemRequirements' field of DESCRIPTION. xps does have root
listed so it will eventually be installed on toluca2.

I would encourage you to post on bioc-devel if the package were failing
on an official builder due to missing dependencies. The problem may be
more wide spread and it's helpful to have more than one person looking
at it.

Valerie


On 01/27/2017 09:53 AM, cstrato wrote:
> Dear Valerie,
>
> Thank you for your extensive reply.
>
> I understand that it does not affect any  users. It was only a reminder.
>
> Usually, I have contacted Dan personally, but I did not know whom to 
> write. I assume that you are now responsible for the setup. If this is 
> the case I will in the future contact you.
>
> Best regards,
> Christian
>
>
> On 01/27/17 15:46, Obenchain, Valerie wrote:
>> Hi Christian,
>>
>> toluca2 is in the process of being set up and oaxaca is still the
>> official Mac devel builder. There are a number of dependencies that need
>> to be addressed on the new machine and we are working through them.
>>
>> It is important to understand that the error on toluca2 does not affect
>> you or your users in any way. Binaries are propagated from oaxaca, not
>> toluca2. Also, there are still no Mac binaries on CRAN for R 3.4. This
>> means Bioc devel users on Mac need to install from source so they aren't
>> concerned with the binaries at all.
>>
>> It may be confusing seeing build results for both Mac devel builders -
>> internally we are tracking toluca2 but the one you need to watch is
>> oaxaca. I don't think we announced this and maybe we should have to
>> clarify what was going on. We will post on the mailing lists when
>> toluca2 officially replaces oaxaca and at that point you'll only see
>> toluca2 on the build report.
>>
>> Valerie
>>
>>
>> On 01/26/2017 10:08 AM, cstrato wrote:
>>> Dear all,
>>>
>>> Since Dan, who was always very helpful and took care of the special
>>> requirements of my package 'xps', is no longer part of the BioC group,
>>> and I do not know who is currently responsible for the BioC servers, I
>>> am writing to you to inform you of the problem building 'xps' on the new
>>> Mac server 'toluca2':
>>> http://bioconductor.org/checkResults/devel/bioc-LATEST/xps/toluca2-buildsrc.html
>>>
>>> Could you please install ROOT on toluca2 to prevent the build error of
>>> 'xps'.
>>>
>>> Since both, toluca2 and oaxaca, are running Mac OS X Mavericks, it
>>> should be no problem to transfer ROOT and the corresponding settings to
>>> toluca2.
>>>
>>> Thank you in advance.
>>> Best regards,
>>> Christian
>>> _._._._._._._._._._._._._._._._._._
>>> C.h.r.i.s.t.i.a.n   S.t.r.a.t.o.w.a
>>> V.i.e.n.n.a   A.u.s.t.r.i.a
>>> e.m.a.i.l:cstrato at aon.at
>>> _._._._._._._._._._._._._._._._._._
>>>
>>> ___
>>> Bioc-devel@r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>>
>>
>>
>> This email message may contain legally privileged and/or confidential 
>> information.  If you are not the intended recipient(s), or the employee or 
>> agent responsible for the delivery of this message to the intended 
>> recipient(s), you are hereby notified that any disclosure, copying, 
>> distribution, or use of this email message is prohibited.  If you have 
>> received this message in error, please notify the sender immediately by 
>> e-mail and delete this email message from your computer. Thank you.
>>



This email message may contain legally privileged and/or confidential 
information.  If you are not the intended recipient(s), or the employee or 
agent responsible for the delivery of this message to the intended 
recipient(s), you are hereby notified that any disclosure, copying, 
distribution, or use of this email message is prohibited.  If you have received 
this message in error, please notify the sender immediately by e-mail and 
delete this email message from your computer. Thank you.
___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] xps build problem on toluca2

2017-01-27 Thread cstrato

Dear Valerie,

Thank you for your extensive reply.

I understand that it does not affect any  users. It was only a reminder.

Usually, I have contacted Dan personally, but I did not know whom to 
write. I assume that you are now responsible for the setup. If this is 
the case I will in the future contact you.


Best regards,
Christian


On 01/27/17 15:46, Obenchain, Valerie wrote:

Hi Christian,

toluca2 is in the process of being set up and oaxaca is still the
official Mac devel builder. There are a number of dependencies that need
to be addressed on the new machine and we are working through them.

It is important to understand that the error on toluca2 does not affect
you or your users in any way. Binaries are propagated from oaxaca, not
toluca2. Also, there are still no Mac binaries on CRAN for R 3.4. This
means Bioc devel users on Mac need to install from source so they aren't
concerned with the binaries at all.

It may be confusing seeing build results for both Mac devel builders -
internally we are tracking toluca2 but the one you need to watch is
oaxaca. I don't think we announced this and maybe we should have to
clarify what was going on. We will post on the mailing lists when
toluca2 officially replaces oaxaca and at that point you'll only see
toluca2 on the build report.

Valerie


On 01/26/2017 10:08 AM, cstrato wrote:

Dear all,

Since Dan, who was always very helpful and took care of the special
requirements of my package 'xps', is no longer part of the BioC group,
and I do not know who is currently responsible for the BioC servers, I
am writing to you to inform you of the problem building 'xps' on the new
Mac server 'toluca2':
http://bioconductor.org/checkResults/devel/bioc-LATEST/xps/toluca2-buildsrc.html

Could you please install ROOT on toluca2 to prevent the build error of
'xps'.

Since both, toluca2 and oaxaca, are running Mac OS X Mavericks, it
should be no problem to transfer ROOT and the corresponding settings to
toluca2.

Thank you in advance.
Best regards,
Christian
_._._._._._._._._._._._._._._._._._
C.h.r.i.s.t.i.a.n   S.t.r.a.t.o.w.a
V.i.e.n.n.a   A.u.s.t.r.i.a
e.m.a.i.l:cstrato at aon.at
_._._._._._._._._._._._._._._._._._

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel





This email message may contain legally privileged and/or confidential 
information.  If you are not the intended recipient(s), or the employee or 
agent responsible for the delivery of this message to the intended 
recipient(s), you are hereby notified that any disclosure, copying, 
distribution, or use of this email message is prohibited.  If you have received 
this message in error, please notify the sender immediately by e-mail and 
delete this email message from your computer. Thank you.



___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Rd] RFC: tapply(*, ..., init.value = NA)

2017-01-27 Thread Henrik Bengtsson
On Fri, Jan 27, 2017 at 12:34 AM, Martin Maechler
 wrote:
>
> > On Jan 26, 2017 07:50, "William Dunlap via R-devel" 
> 
> > wrote:
>
> > It would be cool if the default for tapply's init.value could be
> > FUN(X[0]), so it would be 0 for FUN=sum or FUN=length, TRUE for
> > FUN=all, -Inf for FUN=max, etc.  But that would take time and would
> > break code for which FUN did not work on length-0 objects.
>
> > Bill Dunlap
> > TIBCO Software
> > wdunlap tibco.com
>
> I had the same idea (after my first post), so I agree that would
> be nice. One could argue it would take time only if the user is too lazy
> to specify the value,  and we could use
>tryCatch(FUN(X[0]), error = NA)
> to safeguard against those functions that fail for 0 length arg.
>
> But I think the main reason for _not_ setting such a default is
> back-compatibility.  In my proposal, the new argument would not
> be any change by default and so all current uses of tapply()
> would remain unchanged.
>
>> Henrik Bengtsson 
>> on Thu, 26 Jan 2017 07:57:08 -0800 writes:
>
> > On a related note, the storage mode should try to match ans[[1]] (or
> > unlist:ed and) when allocating 'ansmat' to avoid coercion and hence a 
> full
> > copy.
>
> Yes, related indeed; and would fall "in line" with Bill's idea.
> OTOH, it could be implemented independently,
> by something like
>
>if(missing(init.value))
>  init.value <-
>if(length(ans)) as.vector(NA, mode=storage.mode(ans[[1]]))
>else NA

I would probably do something like:

  ans <- unlist(ans, recursive = FALSE, use.names = FALSE)
  if (length(ans)) storage.mode(init.value) <- storage.mode(ans[[1]])
  ansmat <- array(init.value, dim = extent, dimnames = namelist)

instead.  That completely avoids having to use missing() and the value
of 'init.value' will be coerced later if not done upfront.  use.names
= FALSE speeds up unlist().

/Henrik

>
> .
>
> A colleague proposed to use the shorter argument name 'default'
> instead of 'init.value'  which indeed maybe more natural and
> still not too often used as "non-first" argument in  FUN(.).
>
> Thank you for the constructive feedback!
> Martin
>
> > On Thu, Jan 26, 2017 at 2:42 AM, Martin Maechler
> >  wrote:
> >> Last week, we've talked here about "xtabs(), factors and NAs",
> -> https://stat.ethz.ch/pipermail/r-devel/2017-January/073621.html
> >>
> >> In the mean time, I've spent several hours on the issue
> >> and also committed changes to R-devel "in two iterations".
> >>
> >> In the case there is a *Left* hand side part to xtabs() formula,
> >> see the help page example using 'esoph',
> >> it uses  tapply(...,  FUN = sum)   and
> >> I now think there is a missing feature in tapply() there, which
> >> I am proposing to change.
> >>
> >> Look at a small example:
> >>
> >>> D2 <- data.frame(n = gl(3,4), L = gl(6,2, labels=LETTERS[1:6]),
> > N=3)[-c(1,5), ]; xtabs(~., D2)
> >> , , N = 3
> >>
> >> L
> >> n   A B C D E F
> >> 1 1 2 0 0 0 0
> >> 2 0 0 1 2 0 0
> >> 3 0 0 0 0 2 2
> >>
> >>> DN <- D2; DN[1,"N"] <- NA; DN
> >> n L  N
> >> 2  1 A NA
> >> 3  1 B  3
> >> 4  1 B  3
> >> 6  2 C  3
> >> 7  2 D  3
> >> 8  2 D  3
> >> 9  3 E  3
> >> 10 3 E  3
> >> 11 3 F  3
> >> 12 3 F  3
> >>> with(DN, tapply(N, list(n,L), FUN=sum))
> >> A  B  C  D  E  F
> >> 1 NA  6 NA NA NA NA
> >> 2 NA NA  3  6 NA NA
> >> 3 NA NA NA NA  6  6
> >>>
> >>
> >> and as you can see, the resulting matrix has NAs, all the same
> >> NA_real_, but semantically of two different kinds:
> >>
> >> 1) at ["1", "A"], the  NA  comes from the NA in 'N'
> >> 2) all other NAs come from the fact that there is no such factor
> > combination
> >> *and* from the fact that tapply() uses
> >>
> >> array(dim = .., dimnames = ...)
> >>
> >> i.e., initializes the array with NAs  (see definition of 'array').
> >>
> >> My proposition is the following patch to  tapply(), adding a new
> >> option 'init.value':
> >>
> >> 
> > -
> >>
> >> -tapply <- function (X, INDEX, FUN = NULL, ..., simplify = TRUE)
> >> +tapply <- function (X, INDEX, FUN = NULL, ..., init.value = NA, 
> simplify
> > = TRUE)
> >> {
> >> FUN <- if (!is.null(FUN)) match.fun(FUN)
> >> if (!is.list(INDEX)) INDEX <- list(INDEX)
> >> @@ -44,7 +44,7 @@
> >> index <- as.logical(lengths(ans))  # equivalently, lengths(ans) > 0L
> >> ans <- lapply(X = ans[index], FUN = FUN, ...)
> >> if (simplify && all(lengths(ans) == 1L)) {
> >> -   ansmat <- array(dim = extent, dimnames = namelist)
> >> 

Re: [Rd] RFC: tapply(*, ..., init.value = NA)

2017-01-27 Thread Martin Maechler
> Suharto Anggono Suharto Anggono via R-devel 
> on Fri, 27 Jan 2017 16:36:59 + writes:

> The "no factor combination" case is distinguishable by 'tapply' with 
simplify=FALSE.
>> D2 <- data.frame(n = gl(3,4), L = gl(6,2, labels=LETTERS[1:6]), N=3)
>> D2 <- D2[-c(1,5), ]
>> DN <- D2; DN[1,"N"] <- NA
>> with(DN, tapply(N, list(n,L), FUN=sum, simplify=FALSE))
> ABCDEF
> 1 NA   6NULL NULL NULL NULL
> 2 NULL NULL 36NULL NULL
> 3 NULL NULL NULL NULL 66

Yes, I know that simplify=FALSE  behaves differently, it returns
a list with dim & dimnames, sometimes also called a "list - matrix"
... and it *can* be used instead, but to be useful needs to be
post processed and that overall is somewhat inefficient and ugly.


> There is an old related discussion starting on 
https://stat.ethz.ch/pipermail/r-devel/2007-November/047338.html .

Thank you, indeed, for finding that. There Andrew Robinson did
raise the same issue, but his proposed solution was not much
back compatible and I think was primarily dismissed because of that.

Martin

> --
> Last week, we've talked here about "xtabs(), factors and NAs",
-> https://stat.ethz.ch/pipermail/r-devel/2017-January/073621.html

> In the mean time, I've spent several hours on the issue
> and also committed changes to R-devel "in two iterations".

> In the case there is a *Left* hand side part to xtabs() formula,
> see the help page example using 'esoph',
> it uses  tapply(...,  FUN = sum)   and
> I now think there is a missing feature in tapply() there, which
> I am proposing to change. 

> Look at a small example:

>> D2 <- data.frame(n = gl(3,4), L = gl(6,2, labels=LETTERS[1:6]), 
N=3)[-c(1,5), ]; xtabs(~., D2)
> , , N = 3

> L
> n   A B C D E F
> 1 1 2 0 0 0 0
> 2 0 0 1 2 0 0
> 3 0 0 0 0 2 2

>> DN <- D2; DN[1,"N"] <- NA; DN
> n L  N
> 2  1 A NA
> 3  1 B  3
> 4  1 B  3
> 6  2 C  3
> 7  2 D  3
> 8  2 D  3
> 9  3 E  3
> 10 3 E  3
> 11 3 F  3
> 12 3 F  3
>> with(DN, tapply(N, list(n,L), FUN=sum))
> A  B  C  D  E  F
> 1 NA  6 NA NA NA NA
> 2 NA NA  3  6 NA NA
> 3 NA NA NA NA  6  6
>> 

> and as you can see, the resulting matrix has NAs, all the same
> NA_real_, but semantically of two different kinds:

> 1) at ["1", "A"], the  NA  comes from the NA in 'N'
> 2) all other NAs come from the fact that there is no such factor 
combination
> *and* from the fact that tapply() uses

> array(dim = .., dimnames = ...)

> i.e., initializes the array with NAs  (see definition of 'array').

> My proposition is the following patch to  tapply(), adding a new
> option 'init.value':

> 
-
 
> -tapply <- function (X, INDEX, FUN = NULL, ..., simplify = TRUE)
> +tapply <- function (X, INDEX, FUN = NULL, ..., init.value = NA, simplify 
= TRUE)
> {
> FUN <- if (!is.null(FUN)) match.fun(FUN)
> if (!is.list(INDEX)) INDEX <- list(INDEX)
> @@ -44,7 +44,7 @@
> index <- as.logical(lengths(ans))  # equivalently, lengths(ans) > 0L
> ans <- lapply(X = ans[index], FUN = FUN, ...)
> if (simplify && all(lengths(ans) == 1L)) {
> - ansmat <- array(dim = extent, dimnames = namelist)
> + ansmat <- array(init.value, dim = extent, dimnames = namelist)
> ans <- unlist(ans, recursive = FALSE)
> } else {
> ansmat <- array(vector("list", prod(extent)),

> 
-

> With that, I can set the initial value to '0' instead of array's
> default of NA :

>> with(DN, tapply(N, list(n,L), FUN=sum, init.value=0))
> A B C D E F
> 1 NA 6 0 0 0 0
> 2  0 0 3 6 0 0
> 3  0 0 0 0 6 6
>> 

> which now has 0 counts and NA  as is desirable to be used inside
> xtabs().

> All fine... and would not be worth a posting to R-devel,
> except for this:

> The change will not be 100% back compatible -- by necessity: any new 
argument for
> tapply() will make that argument name not available to be
> specified (via '...') for 'FUN'.  The new function would be

>> str(tapply)
> function (X, INDEX, FUN = NULL, ..., init.value = NA, simplify = TRUE)  

> where the '...' are passed FUN(),  and with the new signature,
> 'init.value' then won't be passed to FUN  "anymore" (compared to
> R <= 3.3.x).

> For that reason, we could use   'INIT.VALUE' instead (possibly decreasing
> the probability the arg name is used in other functions).


> Opinions?

> Thank you in advance,
> Martin

> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] RFC: tapply(*, ..., init.value = NA)

2017-01-27 Thread Suharto Anggono Suharto Anggono via R-devel
The "no factor combination" case is distinguishable by 'tapply' with 
simplify=FALSE.

> D2 <- data.frame(n = gl(3,4), L = gl(6,2, labels=LETTERS[1:6]), N=3)
> D2 <- D2[-c(1,5), ]
> DN <- D2; DN[1,"N"] <- NA
> with(DN, tapply(N, list(n,L), FUN=sum, simplify=FALSE))
  ABCDEF
1 NA   6NULL NULL NULL NULL
2 NULL NULL 36NULL NULL
3 NULL NULL NULL NULL 66


There is an old related discussion starting on 
https://stat.ethz.ch/pipermail/r-devel/2007-November/047338.html .

--
Last week, we've talked here about "xtabs(), factors and NAs",
 ->  https://stat.ethz.ch/pipermail/r-devel/2017-January/073621.html

In the mean time, I've spent several hours on the issue
and also committed changes to R-devel "in two iterations".

In the case there is a *Left* hand side part to xtabs() formula,
see the help page example using 'esoph',
it uses  tapply(...,  FUN = sum)   and
I now think there is a missing feature in tapply() there, which
I am proposing to change. 

Look at a small example:

> D2 <- data.frame(n = gl(3,4), L = gl(6,2, labels=LETTERS[1:6]), N=3)[-c(1,5), 
> ]; xtabs(~., D2)
, , N = 3

   L
n   A B C D E F
  1 1 2 0 0 0 0
  2 0 0 1 2 0 0
  3 0 0 0 0 2 2

> DN <- D2; DN[1,"N"] <- NA; DN
   n L  N
2  1 A NA
3  1 B  3
4  1 B  3
6  2 C  3
7  2 D  3
8  2 D  3
9  3 E  3
10 3 E  3
11 3 F  3
12 3 F  3
> with(DN, tapply(N, list(n,L), FUN=sum))
   A  B  C  D  E  F
1 NA  6 NA NA NA NA
2 NA NA  3  6 NA NA
3 NA NA NA NA  6  6
>  

and as you can see, the resulting matrix has NAs, all the same
NA_real_, but semantically of two different kinds:

1) at ["1", "A"], the  NA  comes from the NA in 'N'
2) all other NAs come from the fact that there is no such factor combination
   *and* from the fact that tapply() uses

   array(dim = .., dimnames = ...)

i.e., initializes the array with NAs  (see definition of 'array').

My proposition is the following patch to  tapply(), adding a new
option 'init.value':

-
 
-tapply <- function (X, INDEX, FUN = NULL, ..., simplify = TRUE)
+tapply <- function (X, INDEX, FUN = NULL, ..., init.value = NA, simplify = 
TRUE)
 {
 FUN <- if (!is.null(FUN)) match.fun(FUN)
 if (!is.list(INDEX)) INDEX <- list(INDEX)
@@ -44,7 +44,7 @@
 index <- as.logical(lengths(ans))  # equivalently, lengths(ans) > 0L
 ans <- lapply(X = ans[index], FUN = FUN, ...)
 if (simplify && all(lengths(ans) == 1L)) {
-   ansmat <- array(dim = extent, dimnames = namelist)
+   ansmat <- array(init.value, dim = extent, dimnames = namelist)
ans <- unlist(ans, recursive = FALSE)
 } else {
ansmat <- array(vector("list", prod(extent)),

-

With that, I can set the initial value to '0' instead of array's
default of NA :

> with(DN, tapply(N, list(n,L), FUN=sum, init.value=0))
   A B C D E F
1 NA 6 0 0 0 0
2  0 0 3 6 0 0
3  0 0 0 0 6 6
> 

which now has 0 counts and NA  as is desirable to be used inside
xtabs().

All fine... and would not be worth a posting to R-devel,
except for this:

The change will not be 100% back compatible -- by necessity: any new argument 
for
tapply() will make that argument name not available to be
specified (via '...') for 'FUN'.  The new function would be

> str(tapply)
function (X, INDEX, FUN = NULL, ..., init.value = NA, simplify = TRUE)  

where the '...' are passed FUN(),  and with the new signature,
'init.value' then won't be passed to FUN  "anymore" (compared to
R <= 3.3.x).

For that reason, we could use   'INIT.VALUE' instead (possibly decreasing
the probability the arg name is used in other functions).


Opinions?

Thank you in advance,
Martin

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Suggestion: barplot function

2017-01-27 Thread Marc Schwartz

> On Jan 27, 2017, at 8:30 AM, danielren...@lycos.com wrote:
> 
> Hello developers folks!
> 
> First, congratulations for the wonderful work with R.
> 
> For science, barplots with error bars are very important. We were wondering 
> that is so easy to use the boxplot function:
> 
> boxplot(Spores~treatment, col=treatment_colors)
> 
> But there is no such function for barplots with standard deviation or 
> standard error. It becomes a "journey" to plot a simple graph (e.g. 
> https://www.r-bloggers.com/building-barplots-with-error-bars/).
> 
> The same way that is easy to use the boxplot function, do you think it is 
> possible to upgrade the barplot function: i.e.: barplot(Spores~treatment, 
> error.bar=standard_error, col=treatment_colors)
> 
> Thank you so much!
> Daniel, FU-Berlin


Hi,

With the caveat that I do not speak on behalf of R Core:

Boxplots are specifically designed to include "whiskers" (NOT error bars) that 
aid to visually describe the distribution of continuous data. The whiskers do 
not represent standard deviations (SDs). Thus, that the boxplot() function 
contains the code to draw the whiskers automatically is not relevant to 
barplot().

Barplots are best used to visually present tabulations of categorical data 
(e.g. counts or percentages), in which case, the "error" bars would typically 
represent binomial or similar confidence intervals. Even there, many will 
advocate that dotplots be used instead as a better presentation format, as 
barplots, much like pie charts, have a high "ink to data" ratio.

Barplots should not really be used to present continuous data (e.g. means and 
SDs).

You will find a great deal of disagreement with your premise that barplots with 
error bars are very important to science. If you do a Google search for 
"Dynamite Plot", especially where only the upper error bar is included, you 
will find a variety of critical discussions on that point, such as:

  http://biostat.mc.vanderbilt.edu/wiki/pub/Main/TatsukiRcode/Poster3.pdf 


You pointed to one example of how easy it is to actually add error bars to a 
barplot in R, and that approach, of incrementally building plots using multiple 
functions, is an integral part of R's philosophy. There is also an example in 
?barplot.

Generally, R's default approaches to most analyses are extremely well reasoned. 
Thus, if you don't see something in a function by default, there is generally 
strong logic behind what is being done, or as in this case, not being done.

If you wanted to, it would be a reasonable exercise for you to create your own 
plotting function that wraps barplot() and either segments() or arrows() in a 
single function call, where you can pass arguments that contain the values for 
the various components and draw the plot as you desire. That is how a lot of R 
code is created.

There are other graphic functions in R packages, such as ggplot2 
(https://www.r-bloggers.com/using-r-barplot-with-ggplot2/ 
) and others on CRAN 
that offer methods to add error bars to barplots that others have created if 
you wanted to research those.

As a result of all of the above, I am not sure that, after all these years, 
error bars would be added to barplot() as a standard feature.

Regards,

Marc Schwartz


[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] RFC: tapply(*, ..., init.value = NA)

2017-01-27 Thread Gabor Grothendieck
If xtabs is enhanced then as.data.frame.table may also need to be
modified so that it continues to be usable as an inverse, at least to
the degree feasible.


On Thu, Jan 26, 2017 at 5:42 AM, Martin Maechler
 wrote:
> Last week, we've talked here about "xtabs(), factors and NAs",
>  ->  https://stat.ethz.ch/pipermail/r-devel/2017-January/073621.html
>
> In the mean time, I've spent several hours on the issue
> and also committed changes to R-devel "in two iterations".
>
> In the case there is a *Left* hand side part to xtabs() formula,
> see the help page example using 'esoph',
> it uses  tapply(...,  FUN = sum)   and
> I now think there is a missing feature in tapply() there, which
> I am proposing to change.
>
> Look at a small example:
>
>> D2 <- data.frame(n = gl(3,4), L = gl(6,2, labels=LETTERS[1:6]), 
>> N=3)[-c(1,5), ]; xtabs(~., D2)
> , , N = 3
>
>L
> n   A B C D E F
>   1 1 2 0 0 0 0
>   2 0 0 1 2 0 0
>   3 0 0 0 0 2 2
>
>> DN <- D2; DN[1,"N"] <- NA; DN
>n L  N
> 2  1 A NA
> 3  1 B  3
> 4  1 B  3
> 6  2 C  3
> 7  2 D  3
> 8  2 D  3
> 9  3 E  3
> 10 3 E  3
> 11 3 F  3
> 12 3 F  3
>> with(DN, tapply(N, list(n,L), FUN=sum))
>A  B  C  D  E  F
> 1 NA  6 NA NA NA NA
> 2 NA NA  3  6 NA NA
> 3 NA NA NA NA  6  6
>>
>
> and as you can see, the resulting matrix has NAs, all the same
> NA_real_, but semantically of two different kinds:
>
> 1) at ["1", "A"], the  NA  comes from the NA in 'N'
> 2) all other NAs come from the fact that there is no such factor combination
>*and* from the fact that tapply() uses
>
>array(dim = .., dimnames = ...)
>
> i.e., initializes the array with NAs  (see definition of 'array').
>
> My proposition is the following patch to  tapply(), adding a new
> option 'init.value':
>
> -
>
> -tapply <- function (X, INDEX, FUN = NULL, ..., simplify = TRUE)
> +tapply <- function (X, INDEX, FUN = NULL, ..., init.value = NA, simplify = 
> TRUE)
>  {
>  FUN <- if (!is.null(FUN)) match.fun(FUN)
>  if (!is.list(INDEX)) INDEX <- list(INDEX)
> @@ -44,7 +44,7 @@
>  index <- as.logical(lengths(ans))  # equivalently, lengths(ans) > 0L
>  ans <- lapply(X = ans[index], FUN = FUN, ...)
>  if (simplify && all(lengths(ans) == 1L)) {
> -   ansmat <- array(dim = extent, dimnames = namelist)
> +   ansmat <- array(init.value, dim = extent, dimnames = namelist)
> ans <- unlist(ans, recursive = FALSE)
>  } else {
> ansmat <- array(vector("list", prod(extent)),
>
> -
>
> With that, I can set the initial value to '0' instead of array's
> default of NA :
>
>> with(DN, tapply(N, list(n,L), FUN=sum, init.value=0))
>A B C D E F
> 1 NA 6 0 0 0 0
> 2  0 0 3 6 0 0
> 3  0 0 0 0 6 6
>>
>
> which now has 0 counts and NA  as is desirable to be used inside
> xtabs().
>
> All fine... and would not be worth a posting to R-devel,
> except for this:
>
> The change will not be 100% back compatible -- by necessity: any new argument 
> for
> tapply() will make that argument name not available to be
> specified (via '...') for 'FUN'.  The new function would be
>
>> str(tapply)
> function (X, INDEX, FUN = NULL, ..., init.value = NA, simplify = TRUE)
>
> where the '...' are passed FUN(),  and with the new signature,
> 'init.value' then won't be passed to FUN  "anymore" (compared to
> R <= 3.3.x).
>
> For that reason, we could use   'INIT.VALUE' instead (possibly decreasing
> the probability the arg name is used in other functions).
>
>
> Opinions?
>
> Thank you in advance,
> Martin
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel



-- 
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Bioc-devel] xps build problem on toluca2

2017-01-27 Thread Obenchain, Valerie
Hi Christian,

toluca2 is in the process of being set up and oaxaca is still the
official Mac devel builder. There are a number of dependencies that need
to be addressed on the new machine and we are working through them.

It is important to understand that the error on toluca2 does not affect
you or your users in any way. Binaries are propagated from oaxaca, not
toluca2. Also, there are still no Mac binaries on CRAN for R 3.4. This
means Bioc devel users on Mac need to install from source so they aren't
concerned with the binaries at all.

It may be confusing seeing build results for both Mac devel builders -
internally we are tracking toluca2 but the one you need to watch is
oaxaca. I don't think we announced this and maybe we should have to
clarify what was going on. We will post on the mailing lists when
toluca2 officially replaces oaxaca and at that point you'll only see
toluca2 on the build report.

Valerie


On 01/26/2017 10:08 AM, cstrato wrote:
> Dear all,
>
> Since Dan, who was always very helpful and took care of the special 
> requirements of my package 'xps', is no longer part of the BioC group, 
> and I do not know who is currently responsible for the BioC servers, I 
> am writing to you to inform you of the problem building 'xps' on the new 
> Mac server 'toluca2':
> http://bioconductor.org/checkResults/devel/bioc-LATEST/xps/toluca2-buildsrc.html
>
> Could you please install ROOT on toluca2 to prevent the build error of 
> 'xps'.
>
> Since both, toluca2 and oaxaca, are running Mac OS X Mavericks, it 
> should be no problem to transfer ROOT and the corresponding settings to 
> toluca2.
>
> Thank you in advance.
> Best regards,
> Christian
> _._._._._._._._._._._._._._._._._._
> C.h.r.i.s.t.i.a.n   S.t.r.a.t.o.w.a
> V.i.e.n.n.a   A.u.s.t.r.i.a
> e.m.a.i.l:cstrato at aon.at
> _._._._._._._._._._._._._._._._._._
>
> ___
> Bioc-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>



This email message may contain legally privileged and/or confidential 
information.  If you are not the intended recipient(s), or the employee or 
agent responsible for the delivery of this message to the intended 
recipient(s), you are hereby notified that any disclosure, copying, 
distribution, or use of this email message is prohibited.  If you have received 
this message in error, please notify the sender immediately by e-mail and 
delete this email message from your computer. Thank you.
___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


[Rd] Suggestion: barplot function

2017-01-27 Thread danielrenato

Hello developers folks!

First, congratulations for the wonderful work with R.

For science, barplots with error bars are very important. We were 
wondering that is so easy to use the boxplot function:


boxplot(Spores~treatment, col=treatment_colors)

But there is no such function for barplots with standard deviation or 
standard error. It becomes a "journey" to plot a simple graph (e.g. 
https://www.r-bloggers.com/building-barplots-with-error-bars/).


The same way that is easy to use the boxplot function, do you think it 
is possible to upgrade the barplot function: i.e.: 
barplot(Spores~treatment, error.bar=standard_error, 
col=treatment_colors)


Thank you so much!
Daniel, FU-Berlin

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Undefined behavior of head() and tail() with n = 0

2017-01-27 Thread Martin Maechler
Dear Florent,

thank you for striving to clearly disentangle and present the
issue below.
That is a nice "role model" way of approaching such topics!

> Florent Angly 
> on Fri, 27 Jan 2017 10:24:39 +0100 writes:

> Martin, I agree with you that +0 and -0 should generally be treated as
> equal, and R does a fine job in this respect. The Wikipedia article on
> signed zero (https://en.wikipedia.org/wiki/Signed_zero) echoes this
> view but also highlights that +0 and -0 can be treated differently in
> particular situations, including their interpretation as mathematical
> limits (as in the 1/-0 case). Indeed, the main question here is
> whether head() and tail() represent a special case that would benefit
> from differentiating between +0 and -0.

> We can break down the discussion into two problems:
> A/ the discrepancy between the implementation of R head() and tail()
> and the documentation of these functions (where the use of zero is not
> documented and thus not permissible),

Ehm, no, in R (and many other software systems),

  "not documented" does *NOT* entail "not permissible"


> B/ the discrepancy between the implementation of R head() and tail()
> and their GNU equivalent (which allow zeros and differentiate between
> -0 and +0, i.e. head takes "0" and "-0", tail takes "0" and "+0").

This discrepancy, as you mention later comes from the fact that
basically, these arguments are strings in the Unix tools (GNU being a
special case of Unix, here) and integers in R.

Below, I'm giving my personal view of the issue:

> There are several possible solutions to address these discrepancies:

> 1/ Leave the code as-is but document its behavior with respect to zero
> (zeros allowed, with negative zeros treated like positive zeros).
> Advantages: This is the path of least resistance, and discrepancy A is 
fixed.
> Disadvantages: Discrepancy B remains (but is documented).

That would be my "clear" choice.


> 2/ Leave the documentation as-is but reflect this in code by not
> allowing zeros at all.
> Advantages: Discrepancy A is fixed.
> Disadvantages: Discrepancy B remains in some form (but is documented).
> Need to deprecate the usage of +0 (which was not clearly documented
> but may have been assumed by users).

2/ looks "uniformly inferior" to 1/ to me


> 3/ Update the code and documentation to differentiate between +0 and -0.
> Advantages: In my eyes, this is the ideal solution since discrepancy A
> and (most of) B are resolved.
> Disadvantages: It is unclear how to implement this solution and the
> implications it may have on backward compatibility:
> a/ Allow -0 (as double). But is it supported on all platforms used
> by R (see ?Arithmetic)? William has raised the issue that negative
> zero cannot be represented as an integer. Should head() and tail()
> then strictly check double input (while forbidding integers)?
> b/ The input could always be as character. This would allow to
> mirror even more closely GNU tail (where the prefix "+" is used to
> invert the meaning of n). This probably involves a fair amount of work
> and careful handling of deprecation.

3/ involves quite a few complications, and in my view, your
   advantages are not even getting close to counter-weigh the drawbacks.


> On 26 January 2017 at 16:51, William Dunlap  wrote:
>> In addition, signed zeroes only exist for floating point numbers - the
>> bit patterns for as.integer(0) and as.integer(-0) are identical.

indeed!

>> Bill Dunlap
>> TIBCO Software
>> wdunlap tibco.com
>> 
>> 
>> On Thu, Jan 26, 2017 at 1:53 AM, Martin Maechler
>>  wrote:
 Florent Angly 
 on Wed, 25 Jan 2017 16:31:45 +0100 writes:
>>> 
>>> > Hi all,
>>> > The documentation for head() and tail() describes the behavior of
>>> > these generic functions when n is strictly positive (n > 0) and
>>> > strictly negative (n < 0). How these functions work when given a zero
>>> > value is not defined.
>>> 
>>> > Both GNU command-line utilities head and tail behave differently with 
+0 and -0:
>>> > http://man7.org/linux/man-pages/man1/head.1.html
>>> > http://man7.org/linux/man-pages/man1/tail.1.html
>>> 
>>> > Since R supports signed zeros (1/+0 != 1/-0)
>>> 
>>> whoa, whoa, .. slow down --  The above is misleading!
>>> 
>>> Rather read in  ?Arithmetic (*the* reference to consult for such 
issues),
>>> where the 2nd part of the following section
>>> 
>>> || Implementation limits:
>>> ||
>>> ||  [..]
>>> ||
>>> ||  Another potential issue is signed zeroes: on IEC 60659 platforms
>>> ||  there are two zeroes with internal 

Re: [Bioc-devel] Code quality and bug reports

2017-01-27 Thread Martin Morgan

On 01/27/2017 03:30 AM, Lluís Revilla wrote:

Dear Valerie,

When I talked about maintenance status I thought something on the line of
this badges at http://www.repostatus.org/ :
Maybe only the last three status are relevant to Bioconductor :

  - Active: The project has reached a stable, usable state and is being
actively developed.
  - Inactive: The project has reached a stable, usable state but is no
longer being actively developed; support/maintenance will be provided as
time allows.
  - Unsupported: The project has reached a stable, usable state but the
author(s) have ceased all work on it. A new maintainer may be desired.


I'm supportive of a clearer set of labels to replace the current 'posts' 
shield.


We should not equate (new) development activity with level of support: 
there are other badges for that; one doesn't want development for 
development's sake; etc.


We have to be very careful that the tags (and current discussion) 
provide positive reinforcement to conscientious developers / community 
members, rather than serving to bully or otherwise intimidate developers 
(sometimes silence is a constructive response, and an effective use of 
time).


Iterating a little

  - Active (green): frequent support questions and answers
  - Mature (green): some support questions and answers
  - Inactive (orange):  some support questions unanswered
  - Unsupported (red): (more than a few?) support questions not answered
  - Unknown (blue): no support questions

Comments are ignored in the above scheme. Activity on packages with URL: 
or BugReports: fields in their DESCRIPTION file is ignored.  'frequent', 
'some', and 'few' could be defined based on available data.


Martin



Tweaking a bit the description it could be used to classify the support
given to a package. Given these categories and the information there is in
the repositories and in support site I suggest the following:

Giving the Active badge when there are commits (not done by the
Bioconductor project) in 6 months. Excluding the Bioconductor team is done
to prevent having packages as Active which are not, because the only change
is the version number or minor changes done by the Bioconductor team (Also,
is not the responsibility of the Bioconductor team to maintain the
packages, is it?). Or when at least once every six months the package
maintainer answers or comments a question (tagged with the package tag) in
the support site if there is any.As an example edgeR would be Active, it
has commits from the maintainers in the last 6 months.

If a question tagged with a package tag is unanswered and the maintainer
hasn't answered/commented in the last 6 months or there isn't any commit in
the last 6 months the package (by the maintainer or other than the
Bioconductor team) could be set to Inactive. CorMut would be Inactive and
close to the Unsupported status.

If there is any unanswered question tagged with a package tag and the
maintainer hasn't answered/commented in the last year and there isn't any
commit from the maintainer in the last 2 years, I would give the
Unsupported tag to that package. topGO would be in that category.

Once Unsupported status is reached the team could try to contact the
maintainer to let him/her know that the maintainer position could be taken
by somebody else willing to. Of course, if he/she makes commits or
answers/comment questions in the support site to make the status of the
package back to Active he/she could keep the maintainer status.

This system could be along with the current End of Life Policy, or not, but
gives an opportunity to the community of users to maintain a package they
deem useful. It is a bit more complex but would guide much better on what
packages are well supported and not only used. and those used but not
supported could be saved from Deprecation.

HTH,

Lluís

On 18 January 2017 at 17:52, Obenchain, Valerie <
valerie.obench...@roswellpark.org> wrote:


Hi,

On 01/14/2017 03:01 AM, Lluís Revilla wrote:

Dear Valerie and all,

If I understood the page you kindly linked correctly, a package is

deprecated:

1) When it fails to build and check (unless it is fixed).
2) When the maintainer asks for it.
3) If the maintainer is unresponsive (I assume when a mail is not
delivered) and(/or?) doesn't answers questions about the package (How
is this tracked? Which is the threshold for unanswered questions, 1
year? )

We contact maintainers when a package fails on the build system. There
isn't a set rule on the number of times contacted with no response
because there are so many exceptions, e.g., transferred maintainer ship,
away from email due to travel, etc. I'd say the average number of
contacts is 3 before getting the final 2 week notice.



In some packages, it seems the maintainers receive the mails, and the
packages build and past the daily checks of Bioconductor, but there
are bugs and issues with those packages that are left unanswered and
unsolved in support.bioconductor.org. Those 

Re: [Rd] Undefined behavior of head() and tail() with n = 0

2017-01-27 Thread Florent Angly
Martin, I agree with you that +0 and -0 should generally be treated as
equal, and R does a fine job in this respect. The Wikipedia article on
signed zero (https://en.wikipedia.org/wiki/Signed_zero) echoes this
view but also highlights that +0 and -0 can be treated differently in
particular situations, including their interpretation as mathematical
limits (as in the 1/-0 case). Indeed, the main question here is
whether head() and tail() represent a special case that would benefit
from differentiating between +0 and -0.

We can break down the discussion into two problems:
A/ the discrepancy between the implementation of R head() and tail()
and the documentation of these functions (where the use of zero is not
documented and thus not permissible),
B/ the discrepancy between the implementation of R head() and tail()
and their GNU equivalent (which allow zeros and differentiate between
-0 and +0, i.e. head takes "0" and "-0", tail takes "0" and "+0").

There are several possible solutions to address these discrepancies:

1/ Leave the code as-is but document its behavior with respect to zero
(zeros allowed, with negative zeros treated like positive zeros).
Advantages: This is the path of least resistance, and discrepancy A is fixed.
Disadvantages: Discrepancy B remains (but is documented).

2/ Leave the documentation as-is but reflect this in code by not
allowing zeros at all.
Advantages: Discrepancy A is fixed.
Disadvantages: Discrepancy B remains in some form (but is documented).
Need to deprecate the usage of +0 (which was not clearly documented
but may have been assumed by users).

3/ Update the code and documentation to differentiate between +0 and -0.
Advantages: In my eyes, this is the ideal solution since discrepancy A
and (most of) B are resolved.
Disadvantages: It is unclear how to implement this solution and the
implications it may have on backward compatibility:
   a/ Allow -0 (as double). But is it supported on all platforms used
by R (see ?Arithmetic)? William has raised the issue that negative
zero cannot be represented as an integer. Should head() and tail()
then strictly check double input (while forbidding integers)?
   b/ The input could always be as character. This would allow to
mirror even more closely GNU tail (where the prefix "+" is used to
invert the meaning of n). This probably involves a fair amount of work
and careful handling of deprecation.



On 26 January 2017 at 16:51, William Dunlap  wrote:
> In addition, signed zeroes only exist for floating point numbers - the
> bit patterns for as.integer(0) and as.integer(-0) are identical.
> Bill Dunlap
> TIBCO Software
> wdunlap tibco.com
>
>
> On Thu, Jan 26, 2017 at 1:53 AM, Martin Maechler
>  wrote:
>>> Florent Angly 
>>> on Wed, 25 Jan 2017 16:31:45 +0100 writes:
>>
>> > Hi all,
>> > The documentation for head() and tail() describes the behavior of
>> > these generic functions when n is strictly positive (n > 0) and
>> > strictly negative (n < 0). How these functions work when given a zero
>> > value is not defined.
>>
>> > Both GNU command-line utilities head and tail behave differently with 
>> +0 and -0:
>> > http://man7.org/linux/man-pages/man1/head.1.html
>> > http://man7.org/linux/man-pages/man1/tail.1.html
>>
>> > Since R supports signed zeros (1/+0 != 1/-0)
>>
>> whoa, whoa, .. slow down --  The above is misleading!
>>
>> Rather read in  ?Arithmetic (*the* reference to consult for such issues),
>> where the 2nd part of the following section
>>
>>  || Implementation limits:
>>  ||
>>  ||  [..]
>>  ||
>>  ||  Another potential issue is signed zeroes: on IEC 60659 platforms
>>  ||  there are two zeroes with internal representations differing by
>>  ||  sign.  Where possible R treats them as the same, but for example
>>  ||  direct output from C code often does not do so and may output
>>  ||  ‘-0.0’ (and on Windows whether it does so or not depends on the
>>  ||  version of Windows).  One place in R where the difference might be
>>  ||  seen is in division by zero: ‘1/x’ is ‘Inf’ or ‘-Inf’ depending on
>>  ||  the sign of zero ‘x’.  Another place is ‘identical(0, -0, num.eq =
>>  ||  FALSE)’.
>>
>> says the *contrary* ( __Where possible R treats them as the same__ ):
>> We do _not_ want to distinguish -0 and +0,
>> but there are cases where it is inavoidable
>>
>> And there are good reasons (mathematics !!) for this.
>>
>> I'm pretty sure that it would be quite a mistake to start
>> differentiating it here...  but of course we can continue
>> discussing here if you like.
>>
>> Martin Maechler
>> ETH Zurich and R Core
>>
>>
>> > and the R head() and tail() functions are modeled after
>> > their GNU counterparts, I would expect the R functions to
>> > distinguish between +0 and -0
>>
>> >> tail(1:5, n=0)
>> > integer(0)
>> >> tail(1:5, 

Re: [Rd] RFC: tapply(*, ..., init.value = NA)

2017-01-27 Thread Martin Maechler

> On Jan 26, 2017 07:50, "William Dunlap via R-devel" 

> wrote:

> It would be cool if the default for tapply's init.value could be
> FUN(X[0]), so it would be 0 for FUN=sum or FUN=length, TRUE for
> FUN=all, -Inf for FUN=max, etc.  But that would take time and would
> break code for which FUN did not work on length-0 objects.

> Bill Dunlap
> TIBCO Software
> wdunlap tibco.com

I had the same idea (after my first post), so I agree that would
be nice. One could argue it would take time only if the user is too lazy
to specify the value,  and we could use 
   tryCatch(FUN(X[0]), error = NA)
to safeguard against those functions that fail for 0 length arg.

But I think the main reason for _not_ setting such a default is
back-compatibility.  In my proposal, the new argument would not
be any change by default and so all current uses of tapply()
would remain unchanged.

> Henrik Bengtsson 
> on Thu, 26 Jan 2017 07:57:08 -0800 writes:

> On a related note, the storage mode should try to match ans[[1]] (or
> unlist:ed and) when allocating 'ansmat' to avoid coercion and hence a full
> copy.

Yes, related indeed; and would fall "in line" with Bill's idea.
OTOH, it could be implemented independently,
by something like

   if(missing(init.value))
 init.value <-
   if(length(ans)) as.vector(NA, mode=storage.mode(ans[[1]]))
   else NA

.

A colleague proposed to use the shorter argument name 'default'
instead of 'init.value'  which indeed maybe more natural and
still not too often used as "non-first" argument in  FUN(.).

Thank you for the constructive feedback!
Martin

> On Thu, Jan 26, 2017 at 2:42 AM, Martin Maechler
>  wrote:
>> Last week, we've talked here about "xtabs(), factors and NAs",
-> https://stat.ethz.ch/pipermail/r-devel/2017-January/073621.html
>> 
>> In the mean time, I've spent several hours on the issue
>> and also committed changes to R-devel "in two iterations".
>> 
>> In the case there is a *Left* hand side part to xtabs() formula,
>> see the help page example using 'esoph',
>> it uses  tapply(...,  FUN = sum)   and
>> I now think there is a missing feature in tapply() there, which
>> I am proposing to change.
>> 
>> Look at a small example:
>> 
>>> D2 <- data.frame(n = gl(3,4), L = gl(6,2, labels=LETTERS[1:6]),
> N=3)[-c(1,5), ]; xtabs(~., D2)
>> , , N = 3
>> 
>> L
>> n   A B C D E F
>> 1 1 2 0 0 0 0
>> 2 0 0 1 2 0 0
>> 3 0 0 0 0 2 2
>> 
>>> DN <- D2; DN[1,"N"] <- NA; DN
>> n L  N
>> 2  1 A NA
>> 3  1 B  3
>> 4  1 B  3
>> 6  2 C  3
>> 7  2 D  3
>> 8  2 D  3
>> 9  3 E  3
>> 10 3 E  3
>> 11 3 F  3
>> 12 3 F  3
>>> with(DN, tapply(N, list(n,L), FUN=sum))
>> A  B  C  D  E  F
>> 1 NA  6 NA NA NA NA
>> 2 NA NA  3  6 NA NA
>> 3 NA NA NA NA  6  6
>>> 
>> 
>> and as you can see, the resulting matrix has NAs, all the same
>> NA_real_, but semantically of two different kinds:
>> 
>> 1) at ["1", "A"], the  NA  comes from the NA in 'N'
>> 2) all other NAs come from the fact that there is no such factor
> combination
>> *and* from the fact that tapply() uses
>> 
>> array(dim = .., dimnames = ...)
>> 
>> i.e., initializes the array with NAs  (see definition of 'array').
>> 
>> My proposition is the following patch to  tapply(), adding a new
>> option 'init.value':
>> 
>> 
> -
>> 
>> -tapply <- function (X, INDEX, FUN = NULL, ..., simplify = TRUE)
>> +tapply <- function (X, INDEX, FUN = NULL, ..., init.value = NA, simplify
> = TRUE)
>> {
>> FUN <- if (!is.null(FUN)) match.fun(FUN)
>> if (!is.list(INDEX)) INDEX <- list(INDEX)
>> @@ -44,7 +44,7 @@
>> index <- as.logical(lengths(ans))  # equivalently, lengths(ans) > 0L
>> ans <- lapply(X = ans[index], FUN = FUN, ...)
>> if (simplify && all(lengths(ans) == 1L)) {
>> -   ansmat <- array(dim = extent, dimnames = namelist)
>> +   ansmat <- array(init.value, dim = extent, dimnames = namelist)
>> ans <- unlist(ans, recursive = FALSE)
>> } else {
>> ansmat <- array(vector("list", prod(extent)),
>> 
>> 
> -
>> 
>> With that, I can set the initial value to '0' instead of array's
>> default of NA :
>> 
>>> with(DN, tapply(N, list(n,L), FUN=sum, init.value=0))
>> A B C D E F
>> 1 NA 6 0 0 0 0
>> 2  0 0 3 6 0 0
>> 3  0 0 0 0 6 6
>>> 
>> 
>> which now has 0 counts and NA  as is desirable to be used inside
>> xtabs().
>> 
>> All fine... and would not be worth a posting to R-devel,

Re: [Bioc-devel] Code quality and bug reports

2017-01-27 Thread Lluís Revilla
Dear Valerie,

When I talked about maintenance status I thought something on the line of
this badges at http://www.repostatus.org/ :
Maybe only the last three status are relevant to Bioconductor :

  - Active: The project has reached a stable, usable state and is being
actively developed.
  - Inactive: The project has reached a stable, usable state but is no
longer being actively developed; support/maintenance will be provided as
time allows.
  - Unsupported: The project has reached a stable, usable state but the
author(s) have ceased all work on it. A new maintainer may be desired.

Tweaking a bit the description it could be used to classify the support
given to a package. Given these categories and the information there is in
the repositories and in support site I suggest the following:

Giving the Active badge when there are commits (not done by the
Bioconductor project) in 6 months. Excluding the Bioconductor team is done
to prevent having packages as Active which are not, because the only change
is the version number or minor changes done by the Bioconductor team (Also,
is not the responsibility of the Bioconductor team to maintain the
packages, is it?). Or when at least once every six months the package
maintainer answers or comments a question (tagged with the package tag) in
the support site if there is any.As an example edgeR would be Active, it
has commits from the maintainers in the last 6 months.

If a question tagged with a package tag is unanswered and the maintainer
hasn't answered/commented in the last 6 months or there isn't any commit in
the last 6 months the package (by the maintainer or other than the
Bioconductor team) could be set to Inactive. CorMut would be Inactive and
close to the Unsupported status.

If there is any unanswered question tagged with a package tag and the
maintainer hasn't answered/commented in the last year and there isn't any
commit from the maintainer in the last 2 years, I would give the
Unsupported tag to that package. topGO would be in that category.

Once Unsupported status is reached the team could try to contact the
maintainer to let him/her know that the maintainer position could be taken
by somebody else willing to. Of course, if he/she makes commits or
answers/comment questions in the support site to make the status of the
package back to Active he/she could keep the maintainer status.

This system could be along with the current End of Life Policy, or not, but
gives an opportunity to the community of users to maintain a package they
deem useful. It is a bit more complex but would guide much better on what
packages are well supported and not only used. and those used but not
supported could be saved from Deprecation.

HTH,

Lluís

On 18 January 2017 at 17:52, Obenchain, Valerie <
valerie.obench...@roswellpark.org> wrote:

> Hi,
>
> On 01/14/2017 03:01 AM, Lluís Revilla wrote:
> > Dear Valerie and all,
> >
> > If I understood the page you kindly linked correctly, a package is
> deprecated:
> > 1) When it fails to build and check (unless it is fixed).
> > 2) When the maintainer asks for it.
> > 3) If the maintainer is unresponsive (I assume when a mail is not
> > delivered) and(/or?) doesn't answers questions about the package (How
> > is this tracked? Which is the threshold for unanswered questions, 1
> > year? )
> We contact maintainers when a package fails on the build system. There
> isn't a set rule on the number of times contacted with no response
> because there are so many exceptions, e.g., transferred maintainer ship,
> away from email due to travel, etc. I'd say the average number of
> contacts is 3 before getting the final 2 week notice.
>
> >
> > In some packages, it seems the maintainers receive the mails, and the
> > packages build and past the daily checks of Bioconductor, but there
> > are bugs and issues with those packages that are left unanswered and
> > unsolved in support.bioconductor.org. Those packages that are still
> > functional and used but don't receive maintenance are then kept ? How
> > should the community help to solve those issues?
> A primary motivation for implementing badges on the landing pages was to
> provide the "maintenance status" you mention below. The badge stats give
> an idea of how active the maintainer is (posts, commits, coverage) as
> well as the level of use by others (downloads). The 'posts' badge shows
> support site activity over that past 6 months in terms of 4 fields:
> tagged questions / average answers per question / average comments per
> question / accepted answers.
>
> The CorMut package has no tagged questions:
>
>   http://www.bioconductor.org/packages/3.5/bioc/html/CorMut.html
>
> If Guangchuang had asked questions on the support site instead of
> posting comments in a gist there would be a number of tagged questions
> with no answers. This would give other users some data to go on - an
> unresponsive maintainer of a package with no unit tests. In contrast, a
> package like edgeR has an