Re: [Rd] Documentation of addmargins

2021-12-07 Thread SOEIRO Thomas
Yes, it is!

There is only a small typo (missing punctuation for easier reading)

Sorry for the misunderstanding, it may not be clear enough in my previous mail.

-Message d'origine-
De : GILLIBERT, Andre [mailto:andre.gillib...@chu-rouen.fr] 
Envoyé : mardi 7 décembre 2021 16:59
À : SOEIRO Thomas; R Development List
Objet : RE: Documentation of addmargins

EMAIL EXTERNE - TRAITER AVEC PRÉCAUTION LIENS ET FICHIERS

Thomas SOEIRO wrote:
> Dear list,

> There is a minor typo in addmargins (section Details):

> - If the functions used to form margins are not commutative the result 
> depends on the order in which margins are computed. Annotation of margins is 
> done via naming the FUN list.
> + If the functions used to form margins are not commutative**add ':' or ', 
> i.e.' here** the result depends on the order in which margins are computed. 
> Annotation of margins is done via naming the FUN list.
>
>
> I'm not sure if such minor things really need to be reported when they are 
> noticed... Please let me know if not. Of course this is minor, but imho one 
> of the strengths of R is also its documentation!
>

The documentation looks correct to me.
If the function FUN is not commutative (i.e. the result depends on the order of 
the vector passed to it), then the result of addmargins() will depend on the 
order of the 'margin' argument to the addmargins() function.

For instance:
mat <- rbind(c(1,10),c(100,1000))
fun <- function(x) {x[1]-x[2]-x[1]*x[2]} # non-commutative function a <- 
addmargins(mat ,margin=c(1,2), FUN=fun) b <- addmargins(mat ,margin=c(2,1), 
FUN=fun)

a and b are different, because the fun function is not commutative.

--
Sincerely
André GILLIBERT
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Documentation of addmargins

2021-12-07 Thread SOEIRO Thomas
Dear list,

There is a minor typo in addmargins (section Details):

- If the functions used to form margins are not commutative the result depends 
on the order in which margins are computed. Annotation of margins is done via 
naming the FUN list.
+ If the functions used to form margins are not commutative**add ':' or ', 
i.e.' here** the result depends on the order in which margins are computed. 
Annotation of margins is done via naming the FUN list.


I'm not sure if such minor things really need to be reported when they are 
noticed... Please let me know if not. Of course this is minor, but imho one of 
the strengths of R is also its documentation!

Best,

Thomas

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Add ... to Reduce?

2021-12-01 Thread SOEIRO Thomas
Dear list,

Currently, it is needed to use anonymous functions to pass additional 
parameters to f in Reduce.

The following patch adds ... to pass additional arguments directly and seems to 
work in simple cases (see example below).

However, since this was not available (even though it is common for similar 
functions), I suspect that I am missing something...

Best,

Thomas


dfs <- list(x = warpbreaks)
dfs$x$id <- seq_along(dfs$x$breaks)
dfs$y <- dfs$x[1:15, ]
dfs$z <- dfs$x[20:35, ]

identical(
  Reduce(function(...) merge(..., by = "id", all = TRUE), dfs),
  Reduce(merge, dfs, by = "id", all = TRUE)
)


diff -u orig/funprog.R mod/funprog.R
--- orig/funprog.R  2021-12-01 23:02:09.710231318 +0100
+++ mod/funprog.R   2021-12-01 23:23:58.591120101 +0100
@@ -1,7 +1,7 @@
 #  File src/library/base/R/funprog.R
 #  Part of the R package, https://www.R-project.org
 #
-#  Copyright (C) 1995-2014 The R Core Team
+#  Copyright (C) 1995-2021 The R Core Team
 #
 #  This program is free software; you can redistribute it and/or modify
 #  it under the terms of the GNU General Public License as published by
@@ -17,7 +17,7 @@
 #  https://www.R-project.org/Licenses/
 
 Reduce <-
-function(f, x, init, right = FALSE, accumulate = FALSE)
+function(f, x, init, right = FALSE, accumulate = FALSE, ...)
 {
 mis <- missing(init)
 len <- length(x)
@@ -49,11 +49,11 @@
 if(!accumulate) {
 if(right) {
 for(i in rev(ind))
-init <- forceAndCall(2, f, x[[i]], init)
+init <- forceAndCall(2, f, x[[i]], init, ...)
 }
 else {
 for(i in ind)
-init <- forceAndCall(2, f, init, x[[i]])
+init <- forceAndCall(2, f, init, x[[i]], ...)
 }
 init
 }
@@ -66,13 +66,13 @@
 if(right) {
 out[[len]] <- init
 for(i in rev(ind)) {
-init <- forceAndCall(2, f, x[[i]], init)
+init <- forceAndCall(2, f, x[[i]], init, ...)
 out[[i]] <- init
 }
 } else {
 out[[1L]] <- init
 for(i in ind) {
-init <- forceAndCall(2, f, init, x[[i]])
+init <- forceAndCall(2, f, init, x[[i]], ...)
 out[[i]] <- init
 }
 }
@@ -80,14 +80,14 @@
 if(right) {
 out[[len]] <- init
 for(i in rev(ind)) {
-init <- forceAndCall(2, f, x[[i]], init)
+init <- forceAndCall(2, f, x[[i]], init, ...)
 out[[i]] <- init
 }
 }
 else {
 for(i in ind) {
 out[[i]] <- init
-init <- forceAndCall(2, f, init, x[[i]])
+init <- forceAndCall(2, f, init, x[[i]], ...)
 }
 out[[len]] <- init
 }
diff -u orig/funprog.Rd mod/funprog.Rd
--- orig/funprog.Rd 2021-12-01 23:02:38.400738386 +0100
+++ mod/funprog.Rd  2021-12-01 23:29:28.993976101 +0100
@@ -21,7 +21,7 @@
   given function.
 }
 \usage{
-Reduce(f, x, init, right = FALSE, accumulate = FALSE)
+Reduce(f, x, init, right = FALSE, accumulate = FALSE, ...)
 Filter(f, x)
 Find(f, x, right = FALSE, nomatch = NULL)
 Map(f, ...)
@@ -44,7 +44,7 @@
 combination is used.}
   \item{nomatch}{the value to be returned in the case when
 \dQuote{no match} (no element satisfying the predicate) is found.}
-  \item{\dots}{vectors.}
+  \item{\dots}{arguments to be passed to FUN.}
 }
 \details{
   If \code{init} is given, \code{Reduce} logically adds it to the start



__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] documentation of asplit

2021-11-19 Thread SOEIRO Thomas
Dear list,

The documentation of `asplit` currently says (section Details): "apply *always* 
simplifies common length results, so attempting to split via apply(x, MARGIN, 
identity) does not work (as it simply gives x)."

This may be updated (e.g., by simply removing "always") since `apply` recently 
gained a `simplify` argument.

Best,

Thomas

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Potential bugs in table dnn

2021-10-24 Thread SOEIRO Thomas
Dear Martin,

Should I finally report a bug for this (these?) remaining issue as initially 
agreed?

Best regards,

Thomas

> > Dear Martin,
> >
> > Thank you for the perfect fix. It fixes both issues in the 1-dim case (i.e. 
> > automatic dnn *and* disregard dnn/names in ...), as well as the 
> > documentation.
>
>
> Finally, there is still a corner case that the patch did not fix in the 
> 1D-case. We cannot override the data frame's names with the dnn argument:
>
> tab(warpbreaks[2], dnn = letters[1]) # dnn ignored
> # wool
> #  A  B
> # 27 27
>
> tab(warpbreaks[2:3], dnn = letters[1:2]) # works
> #b
> # a   L M H
> #   A 9 9 9
> #   B 9 9 9
>
> But I did not manage to fix it...
>
>
> > While working on table, may be this should be an error?
> >
> > table(warpbreaks[2], warpbreaks[3])
> > #
> > #   1:3
> > #   1:2   0
> > # Warning messages:
> > # 1: In xtfrm.data.frame(x) : cannot xtfrm data frames
> > # 2: In xtfrm.data.frame(x) : cannot xtfrm data frames
> >
> > Best regards,
> >
> > Thomas
> >
> > > -Message d'origine-
> > > De : Martin Maechler [mailto:maechler using stat.math.ethz.ch]
> > > Envoyé : jeudi 14 octobre 2021 11:44
> > > À : SOEIRO Thomas
> > > Cc : R Development List
> > > Objet : Re: [Rd] Potential bugs in table dnn
> > >
> > > EMAIL EXTERNE - TRAITER AVEC PRÉCAUTION LIENS ET FICHIERS
> > >
> > > Dear Thomas,
> > >
> > > actually, I have in the mean time already applied the changes I think are
> > > needed, both in the code and in the documentation.
> > >
> > > So, in this case, it may be a waste of time to still open a bugzilla 
> > > issue, I think.
> > >
> > > Here are my current changes (not yet committed; of course I would also add
> > > a NEWS entry, mentioning you):
> > >
> > >
> > > Index: src/library/base/R/table.R
> > > ==
> > > =
> > > 53c53
> > > <   if (length(dnn) != length(args))
> > > ---
> > > >   if(length(args) == 1L || length(dnn) != length(args))
> > > Index: src/library/base/man/table.Rd
> > > ==
> > > =
> > > 23c23
> > > <   \code{table} uses the cross-classifying factors to build a contingency
> > > ---
> > > >   \code{table} uses cross-classifying factors to build a contingency
> > > 41c41,42
> > > < (including character strings), or a list (or data frame) whose
> > > ---
> > > > (including numbers or character strings), or a \code{\link{list}} 
> > > > (such
> > > > as a data frame) whose
> > > 67c68,69
> > > <   If the argument \code{dnn} is not supplied, the internal function
> > > ---
> > > >   If the argument \code{dnn} is not supplied \emph{and} if \code{\dots} 
> > > > is
> > > >   not one \code{list} with its own \code{\link{names}()}, the internal
> > > > function
> > >
> > >
> > >
> > > With regards,
> > > Martin

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Potential improvements of ave?

2021-10-24 Thread SOEIRO Thomas
Since the original report raised several proposals, I submitted a bug report on 
R Bugzilla trying to summarize the discussion: 
https://bugs.r-project.org/show_bug.cgi?id=18223

(Maybe I should have ask before if it is really appropriate to do so. Please 
let me no if not.)

> Hi Abby,
> 
> I actually have a patch submitted that does this for unique/duplicated
> (only numeric cases I think) but it is, as patches from external
> contributors go, quite sizable which means it requires a correspondingly
> large amount of an R-core member's time and energy to vet and consider. It
> is in the queue, and so, I expect (/hope, provided I didn't make a mistake)
> it will be incorporated at some point. (
> https://bugs.r-project.org/bugzilla/show_bug.cgi?id=17993)
> 
> You are correct that the speedups are quite significant for calling
> unique/duplicated on large vectors that know they are sorted: Speedup on my
> machine for a fairly sizable vector (length 1e7) ranges from about ~10x in
> the densely duplicated case up to ~60-70x in the sparsely duplicated case
> for duplicated(). For unique() it seems to range from ~10x in the densely
> duplicated case to ~15 in the spare case.
> 
> I had thought that min and max already did this, but looking now, they
> don't seem to by default, thought ALTREP classes themselves do have an
> option of setting a min/max method, which would be hit. That does seem like
> low-hanging fruit, I agree, though in many cases the slow down from a
> single pass over the data to get a min probably isn't earthshattering.
> 
> The others do seem like they could benefit as well.
> 
> Best,
> ~G
> 
> On Tue, Mar 16, 2021 at 2:54 PM Abby Spurdle  
> wrote:
> 
> > There are some relatively obvious examples:
> > unique, which.min/which.max/etc, range/min/max, quantile, aggregate/split
> >
> > Also, many timeseries, graphics and spline functions are dependent on the
> > order.
> >
> > In the case of data.frame(s), a boolean flag would probably need to be
> > extended to allow for multiple column sorting, and
> > ascending/descending options.
> >
> > On Tue, Mar 16, 2021 at 11:08 AM Gabriel Becker  > gmail.com>
> > wrote:
> > >
> > > Abby,
> > >
> > > Vectors do have an internal mechanism for knowing that they are sorted
> > via ALTREP (it was one of 2 core motivating features for 'smart vectors'
> > the other being knowledge about presence of NAs).
> > >
> > > Currently I don't think we expose it at the R level, though it is part
> > of the official C API. I don't know of any plans for this to change, but I
> > suppose it could. Plus for functions in R itself, we could even use it
> > without exposing it more widely. A number of functions, including sort
> > itself, already do this in fact, but more could. I'd be interested in
> > hearing which functions you think would particularly benefit from this.
> > >
> > > ~G
> > >
> > > On Mon, Mar 15, 2021 at 12:01 PM SOEIRO Thomas  > > ap-hm.fr>
> > wrote:
> > > >
> > > > Hi Abby,
> > > >
> > > > Thank you for your positive feedback.
> > > >
> > > > I agree for your general comment about sorting.
> > > >
> > > > For ave specifically, ordering may not help because the output must
> > maintain the order of the input (as ave returns only x and not the entiere
> > data.frame).
> > > >
> > > > Thanks,
> > > >
> > > > Thomas
> > > > 
> > > > De : Abby Spurdle 
> > > > Envoyé : lundi 15 mars 2021 10:22
> > > > À : SOEIRO Thomas
> > > > Cc : r-devel using r-project.org
> > > > Objet : Re: [Rd] Potential improvements of ave?
> > > >
> > > > EMAIL EXTERNE - TRAITER AVEC PRÉCAUTION LIENS ET FICHIERS
> > > >
> > > > Hi Thomas,
> > > >
> > > > These are some great suggestions.
> > > > But I can't help but feel there's a much bigger problem here.
> > > >
> > > > Intuitively, the ave function could (or should) sort the data.
> > > > Then the indexing step becomes almost trivial, in terms of both time
> > > > and space complexity.
> > > > And the ave function is not the only example of where a problem
> > > > becomes much simpler, if the data is sorted.
> > > >
> > > > Historically, I've never found base R functions user-friendly for
> > > > aggreg

[Rd] Potential bugs in table dnn

2021-10-15 Thread SOEIRO Thomas
> Dear Martin,
> 
> Thank you for the perfect fix. It fixes both issues in the 1-dim case (i.e. 
> automatic dnn *and* disregard dnn/names in ...), as well as the documentation.


Finally, there is still a corner case that the patch did not fix in the 
1D-case. We cannot override the data frame's names with the dnn argument:

tab(warpbreaks[2], dnn = letters[1]) # dnn ignored
# wool
#  A  B 
# 27 27 

tab(warpbreaks[2:3], dnn = letters[1:2]) # works
#b
# a   L M H
#   A 9 9 9
#   B 9 9 9

But I did not manage to fix it...


> While working on table, may be this should be an error?
> 
> table(warpbreaks[2], warpbreaks[3]) 
> #  
> #   1:3
> #   1:2   0
> # Warning messages:
> # 1: In xtfrm.data.frame(x) : cannot xtfrm data frames
> # 2: In xtfrm.data.frame(x) : cannot xtfrm data frames
> 
> Best regards,
> 
> Thomas
> 
> > -Message d'origine-
> > De : Martin Maechler [mailto:maechler using stat.math.ethz.ch]
> > Envoyé : jeudi 14 octobre 2021 11:44
> > À : SOEIRO Thomas
> > Cc : R Development List
> > Objet : Re: [Rd] Potential bugs in table dnn
> > 
> > EMAIL EXTERNE - TRAITER AVEC PRÉCAUTION LIENS ET FICHIERS
> > 
> > Dear Thomas,
> > 
> > actually, I have in the mean time already applied the changes I think are
> > needed, both in the code and in the documentation.
> > 
> > So, in this case, it may be a waste of time to still open a bugzilla issue, 
> > I think.
> > 
> > Here are my current changes (not yet committed; of course I would also add
> > a NEWS entry, mentioning you):
> > 
> > 
> > Index: src/library/base/R/table.R
> > ==
> > =
> > 53c53
> > <   if (length(dnn) != length(args))
> > ---
> > >   if(length(args) == 1L || length(dnn) != length(args))
> > Index: src/library/base/man/table.Rd
> > ==
> > =
> > 23c23
> > <   \code{table} uses the cross-classifying factors to build a contingency
> > ---
> > >   \code{table} uses cross-classifying factors to build a contingency
> > 41c41,42
> > < (including character strings), or a list (or data frame) whose
> > ---
> > > (including numbers or character strings), or a \code{\link{list}} 
> > > (such
> > > as a data frame) whose
> > 67c68,69
> > <   If the argument \code{dnn} is not supplied, the internal function
> > ---
> > >   If the argument \code{dnn} is not supplied \emph{and} if \code{\dots} is
> > >   not one \code{list} with its own \code{\link{names}()}, the internal
> > > function
> > 
> > 
> > 
> > With regards,
> > Martin
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Potential bugs in table dnn

2021-10-14 Thread SOEIRO Thomas
Dear Martin,

Thank you for the perfect fix. It fixes both issues in the 1-dim case (i.e. 
automatic dnn *and* disregard dnn/names in ...), as well as the documentation.



While working on table, may be this should be an error?

table(warpbreaks[2], warpbreaks[3]) 
#  
#   1:3
#   1:2   0
# Warning messages:
# 1: In xtfrm.data.frame(x) : cannot xtfrm data frames
# 2: In xtfrm.data.frame(x) : cannot xtfrm data frames

Best regards,

Thomas

> -Message d'origine-
> De : Martin Maechler [mailto:maech...@stat.math.ethz.ch]
> Envoyé : jeudi 14 octobre 2021 11:44
> À : SOEIRO Thomas
> Cc : R Development List
> Objet : Re: [Rd] Potential bugs in table dnn
> 
> EMAIL EXTERNE - TRAITER AVEC PRÉCAUTION LIENS ET FICHIERS
> 
> Dear Thomas,
> 
> actually, I have in the mean time already applied the changes I think are
> needed, both in the code and in the documentation.
> 
> So, in this case, it may be a waste of time to still open a bugzilla issue, I 
> think.
> 
> Here are my current changes (not yet committed; of course I would also add
> a NEWS entry, mentioning you):
> 
> 
> Index: src/library/base/R/table.R
> ==
> =
> 53c53
> <   if (length(dnn) != length(args))
> ---
> >   if(length(args) == 1L || length(dnn) != length(args))
> Index: src/library/base/man/table.Rd
> ==
> =
> 23c23
> <   \code{table} uses the cross-classifying factors to build a contingency
> ---
> >   \code{table} uses cross-classifying factors to build a contingency
> 41c41,42
> < (including character strings), or a list (or data frame) whose
> ---
> > (including numbers or character strings), or a \code{\link{list}} (such
> > as a data frame) whose
> 67c68,69
> <   If the argument \code{dnn} is not supplied, the internal function
> ---
> >   If the argument \code{dnn} is not supplied \emph{and} if \code{\dots} is
> >   not one \code{list} with its own \code{\link{names}()}, the internal
> > function
> 
> 
> 
> With regards,
> Martin
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Potential bugs in table dnn

2021-10-13 Thread SOEIRO Thomas
Inline comments below in the previous message

I'm not 100% sure if the current behavior is intended or not. If not, here is a 
patch (which I can submit on R Bugzilla if appropriate):


diff -u orig/table.R mod/table.R
--- orig/table.R2021-10-13 10:04:28.560912800 +0200
+++ mod/table.R 2021-10-13 10:43:43.815915100 +0200
@@ -1,7 +1,7 @@
 #  File src/library/base/R/table.R
 #  Part of the R package, https://www.R-project.org
 #
-#  Copyright (C) 1995-2020 The R Core Team
+#  Copyright (C) 1995-2021 The R Core Team
 #
 #  This program is free software; you can redistribute it and/or modify
 #  it under the terms of the GNU General Public License as published by
@@ -50,9 +50,8 @@
 args <- list(...)
 if (length(args) == 1L && is.list(args[[1L]])) { ## e.g. a data.frame
args <- args[[1L]]
-   if (length(dnn) != length(args))
-   dnn <- if (!is.null(argn <- names(args))) argn
-  else paste(dnn[1L], seq_along(args), sep = ".")
+   dnn <- if (!is.null(argn <- names(args))) argn
+  else paste(dnn[1L], seq_along(args), sep = ".")
 }
 if (!length(args))
stop("nothing to tabulate")
diff -u orig/table.Rd mod/table.Rd
--- orig/table.Rd   2021-10-13 11:39:45.839097000 +0200
+++ mod/table.Rd2021-10-13 11:56:25.620660900 +0200
@@ -1,6 +1,6 @@
 % File src/library/base/man/table.Rd
 % Part of the R package, https://www.R-project.org
-% Copyright 1995-2021 R Core Team
+% Copyright 1995-2016 R Core Team
 % Distributed under GPL 2 or later
 
 \name{table}
@@ -48,7 +48,7 @@
   \item{useNA}{whether to include \code{NA} values in the table.
 See \sQuote{Details}.  Can be abbreviated.}
   \item{dnn}{the names to be given to the dimensions in the result (the
-\emph{dimnames names}).}
+\emph{dimnames names}).  See \sQuote{Details}.}
   \item{deparse.level}{controls how the default \code{dnn} is
 constructed.  See \sQuote{Details}.}
   \item{x}{an arbitrary \R object, or an object inheriting from class
@@ -64,12 +64,15 @@
   \item{sep, base}{passed to \code{\link{provideDimnames}}.}
 }
 \details{
-  If the argument \code{dnn} is not supplied, the internal function
+  If ... is one or more objects which can be interpreted as factors
+  and the argument \code{dnn} is not supplied, the internal function
   \code{list.names} is called to compute the \sQuote{dimname names}.  If the
   arguments in \code{\dots} are named, those names are used.  For the
   remaining arguments, \code{deparse.level = 0} gives an empty name,
   \code{deparse.level = 1} uses the supplied argument if it is a symbol,
-  and \code{deparse.level = 2} will deparse the argument.
+  and \code{deparse.level = 2} will deparse the argument.  Otherwise,
+  if ... is a list (or data frame), its names are used as the
+  \sQuote{dimname names} and the argument \code{dnn} is not used.
 
   Only when \code{exclude} is specified (i.e., not by default) and
   non-empty, will \code{table} potentially drop levels of factor



> Dear list,
> 
> table does not set dnn for dataframes of length 1:
> 
> table(warpbreaks[2:3]) # has dnn
> # tension
> # wool L M H
> #A 9 9 9
> #B 9 9 9
> 
> table(warpbreaks[2]) # has no dnn
> # 
> #  A  B 
> # 27 27 
> 
> This is because of if (length(dnn) != length(args)) (line 53 in 
> https://github.com/wch/r-source/blob/trunk/src/library/base/R/table.R). When 
> commenting this line or modifying it to if (length(dnn) != length(args) || 
> dnn == ""), dnn are set as expected:
> 
> table2(warpbreaks[2:3]) # has dnn
> # tension
> # wool L M H
> #A 9 9 9
> #B 9 9 9
> 
> table2(warpbreaks[2]) # has dnn
> # wool
> #  A  B 
> # 27 27 
> 
> However, I do not get the logic for the initial if (length(dnn) != 
> length(args)), so the change may break something else...

I guess the purpose of this line is to have the possibility to set the dimname 
names through the dnn argument for lists (or data frames) of length 1, e.g.:

table(warpbreaks[2], dnn = "xxx")
# xxx
#  A  B 
# 27 27

However, this seems inconsistent with the behavior for lists (or data frames) 
of length >1. Removing the exception introduced by the if clause restore the 
consistency in dimname names for lists (or data frames) whatever their length.


> In addition, table documentation says "If the argument dnn is not supplied, 
> the internal function list.names is called to compute the 'dimname names'. If 
> the arguments in ... are named, those names are used." Some cases seem 
> inconsistent or may return a warning:

The documentation seems not very clear on how dimname names are computed for 
lists (or data frames). If removing the if clause [i.e., consistent behavior 
for lists (or data frames), see above], I think it only requires to document 
the "precedence" of list (or data frame) names over dnn when ... is a list (or 
data frame), e.g.: "if ... is a list (or data frame), its names are used as the 
\sQuote{dimname names} and the argument \code{dnn} is not

[Rd] Potential bugs in table dnn

2021-10-10 Thread SOEIRO Thomas
Dear list,

table does not set dnn for dataframes of length 1:

table(warpbreaks[2:3]) # has dnn
# tension
# wool L M H
#A 9 9 9
#B 9 9 9

table(warpbreaks[2]) # has no dnn
# 
#  A  B 
# 27 27 

This is because of if (length(dnn) != length(args)) (line 53 in 
https://github.com/wch/r-source/blob/trunk/src/library/base/R/table.R). When 
commenting this line or modifying it to if (length(dnn) != length(args) || dnn 
== ""), dnn are set as expected:

table2(warpbreaks[2:3]) # has dnn
# tension
# wool L M H
#A 9 9 9
#B 9 9 9

table2(warpbreaks[2]) # has dnn
# wool
#  A  B 
# 27 27 

However, I do not get the logic for the initial if (length(dnn) != 
length(args)), so the change may break something else...

In addition, table documentation says "If the argument dnn is not supplied, the 
internal function list.names is called to compute the ‘dimname names’. If the 
arguments in ... are named, those names are used." Some cases seem inconsistent 
or may return a warning:

table(warpbreaks[2], dnn = letters) # no warning/not as documented
# wool
#  A  B 
# 27 27 

table(warpbreaks[2], dnn = letters[1]) # as documented
# a
#  A  B 
# 27 27 

table(zzz = warpbreaks[2], dnn = letters[1]) # as documented
# a
#  A  B 
# 27 27 

table(zzz = warpbreaks$wool, dnn = letters[1]) # as documented
# a
#  A  B 
# 27 27 

table(warpbreaks$wool, dnn = letters) # as expected
# Error in names(dn) <- dnn : 
#   attribut 'names' [26] doit être de même longueur que le vecteur [1]

Best regards,

Thomas
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] trunc.Date and round.Date + documentation of DateTimeClasses

2021-09-30 Thread SOEIRO Thomas
Sorry for missing the issue on Bugzilla with Dirk's (better) proposal before 
posting on the list. I agree adding the whole family (ceiling(), floor(), 
trunc(x), and round ()) would be very useful (while it may be useful to provide 
the enhanced trunc.Date in the meantime). I unfortunately don't have the skills 
to contribute for the related functions. In any case thanks for all the work!

In addition, what do you think about the second proposal? (documentation of 
DateTimeClasses)

-Message d'origine-
De : Martin Maechler [mailto:maech...@stat.math.ethz.ch] 
Envoyé : jeudi 30 septembre 2021 15:27
À : SOEIRO Thomas
Cc : r-devel@r-project.org; Dirk Eddelbuettel
Objet : Re: [Rd] trunc.Date and round.Date + documentation of DateTimeClasses

EMAIL EXTERNE - TRAITER AVEC PRÉCAUTION LIENS ET FICHIERS

Excuse the exceptional top-reply:

Note that a very related issue has been raised not so long ago by Dirk (in CC) 
on R's Bugzilla :

  trunc.Date should support months and years arguments as trunc.POSIXt does
  
https://urldefense.com/v3/__https://bugs.r-project.org/show_bug.cgi?id=18099__;!!JQ5agg!IcA4cGE0MAGw1HNXz9F5WN_MhMReK2hNeT997bHYUMwGr3_tISpW0NTUF1Ll1MMV614A$

which had some agreement (also with you: I agree we should change something 
about this) but I also had proposed to approach it more generally than in the 
PR .. which you already did by mentioning trunc() and round() methods together.

Still, Dirk's proposal would try harder to remain back compatible in those 
cases where  trunc.Date() currently does "behave as it should".

Martin Maechler
ETH Zurich  and  R Core

>>>>> SOEIRO Thomas
>>>>> on Thu, 30 Sep 2021 10:32:32 + writes:

> About fractional days, trunc.Date2 actually seems to have no regression 
and to be backward compatible compared to the original trunc.Date:

> frac <- as.Date("2020-01-01") + 0.5
> identical(trunc(frac), trunc.Date2(frac))

> (I may still miss something since I do not understand how
> trunc.Date manage fractional days with round(x - 0.499).)

> -Message d'origine-
> De : SOEIRO Thomas
> Envoyé : mercredi 29 septembre 2021 17:00
> À : 'r-devel@r-project.org'
> Objet : trunc.Date and round.Date + documentation of DateTimeClasses

> Dear All,

> 1) trunc.Date and round.Date:

> Currently, the help page for trunc.Date and round.Date
> says "The methods for class "Date" are of little use
> except to remove fractional days". However, e.g.,
> trunc.POSIXt(Sys.Date(), "years") and
> round.POSIXt(Sys.Date(), "years") work because the
> functions start with x <- as.POSIXlt(x).

> Would you consider a simple implementation of trunc.Date
> and round.Date based on trunc.POSIXt and round.POSIXt?
> This would enable to avoid coercion from Date to POSIXt
> and back to Date for these simple manipulations.

> For example:
> # (I do not have a clear understanding of what "remove fractional days" 
means, and I did not implement it.)

> trunc.Date2 <-
>   function(x, units = c("days", "months", "years"), ...)
>   {
> units <- match.arg(units)
> x <- as.POSIXlt(x)
>
> switch(units,
>"days" = {
>  x$sec[] <- 0; x$min[] <- 0L; x$hour[] <- 0L;
>  x$isdst[] <- -1L
>},
>"months" = {
>  x$sec[] <- 0; x$min[] <- 0L; x$hour[] <- 0L;
>  x$mday[] <- 1L
>  x$isdst[] <- -1L
>},
>"years" = {
>  x$sec[] <- 0; x$min[] <- 0L; x$hour[] <- 0L;
>  x$mday[] <- 1L; x$mon[] <- 0L
>  x$isdst[] <- -1L
>}
> )
> as.Date(x)
>   }



> 2) documentation of DateTimeClasses:

> It may be useful to add in the documentation of
> DateTimeClasses that manipulating elements of POSIXlt
> objects may results in "invalid" entries (e.g., mon = 12
> or mday = 0), but that the object is nevertheless
> correctly printed/coerced.

> Is this behavior explicitly supported?

> d <- as.POSIXlt("2000-01-01")
> unclass(d)
> d$mon <- d$mon + 12
> d$mday <- d$ mday - 1
> unclass(d)
> d
> d <- as.POSIXlt(as.POSIXct(d))
> dput(d)



> Best,
> Thomas

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] trunc.Date and round.Date + documentation of DateTimeClasses

2021-09-30 Thread SOEIRO Thomas
About fractional days, trunc.Date2 actually seems to have no regression and to 
be backward compatible compared to the original trunc.Date:

frac <- as.Date("2020-01-01") + 0.5
identical(trunc(frac), trunc.Date2(frac))

(I may still miss something since I do not understand how trunc.Date manage 
fractional days with round(x - 0.499).)

-Message d'origine-
De : SOEIRO Thomas 
Envoyé : mercredi 29 septembre 2021 17:00
À : 'r-devel@r-project.org'
Objet : trunc.Date and round.Date + documentation of DateTimeClasses

Dear All,

1) trunc.Date and round.Date:

Currently, the help page for trunc.Date and round.Date says "The methods for 
class "Date" are of little use except to remove fractional days". However, 
e.g., trunc.POSIXt(Sys.Date(), "years") and round.POSIXt(Sys.Date(), "years") 
work because the functions start with x <- as.POSIXlt(x).

Would you consider a simple implementation of trunc.Date and round.Date based 
on trunc.POSIXt and round.POSIXt? This would enable to avoid coercion from Date 
to POSIXt and back to Date for these simple manipulations.

For example:
# (I do not have a clear understanding of what "remove fractional days" means, 
and I did not implement it.)

trunc.Date2 <-
  function(x, units = c("days", "months", "years"), ...)
  {
units <- match.arg(units)
x <- as.POSIXlt(x)

switch(units,
   "days" = {
 x$sec[] <- 0; x$min[] <- 0L; x$hour[] <- 0L;
 x$isdst[] <- -1L
   },
   "months" = {
 x$sec[] <- 0; x$min[] <- 0L; x$hour[] <- 0L;
 x$mday[] <- 1L
 x$isdst[] <- -1L
   },
   "years" = {
 x$sec[] <- 0; x$min[] <- 0L; x$hour[] <- 0L;
 x$mday[] <- 1L; x$mon[] <- 0L
 x$isdst[] <- -1L
   }
)
as.Date(x)
  }



2) documentation of DateTimeClasses:

It may be useful to add in the documentation of DateTimeClasses that 
manipulating elements of POSIXlt objects may results in "invalid" entries 
(e.g., mon = 12 or mday = 0), but that the object is nevertheless correctly 
printed/coerced.

Is this behavior explicitly supported?

d <- as.POSIXlt("2000-01-01")
unclass(d)
d$mon <- d$mon + 12
d$mday <- d$ mday - 1
unclass(d)
d
d <- as.POSIXlt(as.POSIXct(d))
dput(d)



Best,

Thomas

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] trunc.Date and round.Date + documentation of DateTimeClasses

2021-09-29 Thread SOEIRO Thomas
Dear All,

1) trunc.Date and round.Date:

Currently, the help page for trunc.Date and round.Date says "The methods for 
class "Date" are of little use except to remove fractional days". However, 
e.g., trunc.POSIXt(Sys.Date(), "years") and round.POSIXt(Sys.Date(), "years") 
work because the functions start with x <- as.POSIXlt(x).

Would you consider a simple implementation of trunc.Date and round.Date based 
on trunc.POSIXt and round.POSIXt? This would enable to avoid coercion from Date 
to POSIXt and back to Date for these simple manipulations.

For example:
# (I do not have a clear understanding of what "remove fractional days" means, 
and I did not implement it.)

trunc.Date2 <-
  function(x, units = c("days", "months", "years"), ...)
  {
units <- match.arg(units)
x <- as.POSIXlt(x)

switch(units,
   "days" = {
 x$sec[] <- 0; x$min[] <- 0L; x$hour[] <- 0L;
 x$isdst[] <- -1L
   },
   "months" = {
 x$sec[] <- 0; x$min[] <- 0L; x$hour[] <- 0L;
 x$mday[] <- 1L
 x$isdst[] <- -1L
   },
   "years" = {
 x$sec[] <- 0; x$min[] <- 0L; x$hour[] <- 0L;
 x$mday[] <- 1L; x$mon[] <- 0L
 x$isdst[] <- -1L
   }
)
as.Date(x)
  }



2) documentation of DateTimeClasses:

It may be useful to add in the documentation of DateTimeClasses that 
manipulating elements of POSIXlt objects may results in "invalid" entries 
(e.g., mon = 12 or mday = 0), but that the object is nevertheless correctly 
printed/coerced.

Is this behavior explicitly supported?

d <- as.POSIXlt("2000-01-01")
unclass(d)
d$mon <- d$mon + 12
d$mday <- d$ mday - 1
unclass(d)
d
d <- as.POSIXlt(as.POSIXct(d))
dput(d)



Best,

Thomas

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] sep hard coded in write.ftable

2021-09-02 Thread SOEIRO Thomas
There is a small typo in the NEWS file: write.table -> write.ftable

-Message d'origine-----
De : SOEIRO Thomas 
Envoyé : jeudi 2 septembre 2021 13:10
À : 'Martin Maechler'
Cc : r-devel@r-project.org
Objet : RE: [Rd] sep hard coded in write.ftable

Dear Martin,

Thank you very much for your prompt feedback!

Best regards,

Thomas

-Message d'origine-
De : Martin Maechler [mailto:maech...@stat.math.ethz.ch] 
Envoyé : jeudi 2 septembre 2021 11:30
À : SOEIRO Thomas
Cc : r-devel@r-project.org
Objet : Re: [Rd] sep hard coded in write.ftable

EMAIL EXTERNE - TRAITER AVEC PRÉCAUTION LIENS ET FICHIERS

>>>>> SOEIRO Thomas
>>>>> on Wed, 1 Sep 2021 15:01:43 + writes:

> Dear all,

> (This is a follow up of a previous suggestion for ftable that was added 
in R 4.1.0: 
https://urldefense.com/v3/__https://stat.ethz.ch/pipermail/r-devel/2020-May/079451.html__;!!JQ5agg!KmMY870t4h88LdWZEtwjGSopF57R5zxrL05DHa6nECGqI5_nFYUsC3OJoOzD0LQYZLOR$
 )

> The sep argument is hard coded in write.ftable:

> write.ftable <- function(x, file = "", quote = TRUE, append = FALSE,
> digits = getOption("digits"), ...)
> {
> r <- format.ftable(x, quote = quote, digits = digits, ...)
> cat(t(r), file = file, append = append,
> sep = c(rep(" ", ncol(r) - 1), "\n"))
> invisible(x)
> }

> A minor change would allow users to modify it:

> write.ftable2 <- function(x, file = "", quote = TRUE, append = FALSE,
> digits = getOption("digits"), sep = " ", ...)
> {
> r <- stats:::format.ftable(x, quote = quote, digits = digits, ...)
> cat(t(r), file = file, append = append,
> sep = c(rep(sep, ncol(r) - 1), "\n"))
> invisible(x)
> }

I agree this sounds reasonable, and am currently running 'make check-devel' on 
sources modified accordingly ..

Martin


> This would allow to avoid a previous call to format.ftable (although 
write.ftable is significantly slower than write.table):

> ftable(formula = wool + tension ~ breaks, data = warpbreaks) |>
> format(quote = FALSE) |>
> write.table(sep = ";", row.names = FALSE, col.names = FALSE)

> ftable(formula = wool + tension ~ breaks, data = warpbreaks) |>
> write.ftable2(sep = ";")

> Best regards,
> Thomas

> __
> R-devel@r-project.org mailing list
> 
https://urldefense.com/v3/__https://stat.ethz.ch/mailman/listinfo/r-devel__;!!JQ5agg!KmMY870t4h88LdWZEtwjGSopF57R5zxrL05DHa6nECGqI5_nFYUsC3OJoOzD0M93CBRa$
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] sep hard coded in write.ftable

2021-09-02 Thread SOEIRO Thomas
Dear Martin,

Thank you very much for your prompt feedback!

Best regards,

Thomas

-Message d'origine-
De : Martin Maechler [mailto:maech...@stat.math.ethz.ch] 
Envoyé : jeudi 2 septembre 2021 11:30
À : SOEIRO Thomas
Cc : r-devel@r-project.org
Objet : Re: [Rd] sep hard coded in write.ftable

EMAIL EXTERNE - TRAITER AVEC PRÉCAUTION LIENS ET FICHIERS

>>>>> SOEIRO Thomas
>>>>> on Wed, 1 Sep 2021 15:01:43 + writes:

> Dear all,

> (This is a follow up of a previous suggestion for ftable that was added 
in R 4.1.0: 
https://urldefense.com/v3/__https://stat.ethz.ch/pipermail/r-devel/2020-May/079451.html__;!!JQ5agg!KmMY870t4h88LdWZEtwjGSopF57R5zxrL05DHa6nECGqI5_nFYUsC3OJoOzD0LQYZLOR$
 )

> The sep argument is hard coded in write.ftable:

> write.ftable <- function(x, file = "", quote = TRUE, append = FALSE,
> digits = getOption("digits"), ...)
> {
> r <- format.ftable(x, quote = quote, digits = digits, ...)
> cat(t(r), file = file, append = append,
> sep = c(rep(" ", ncol(r) - 1), "\n"))
> invisible(x)
> }

> A minor change would allow users to modify it:

> write.ftable2 <- function(x, file = "", quote = TRUE, append = FALSE,
> digits = getOption("digits"), sep = " ", ...)
> {
> r <- stats:::format.ftable(x, quote = quote, digits = digits, ...)
> cat(t(r), file = file, append = append,
> sep = c(rep(sep, ncol(r) - 1), "\n"))
> invisible(x)
> }

I agree this sounds reasonable, and am currently running 'make check-devel' on 
sources modified accordingly ..

Martin


> This would allow to avoid a previous call to format.ftable (although 
write.ftable is significantly slower than write.table):

> ftable(formula = wool + tension ~ breaks, data = warpbreaks) |>
> format(quote = FALSE) |>
> write.table(sep = ";", row.names = FALSE, col.names = FALSE)

> ftable(formula = wool + tension ~ breaks, data = warpbreaks) |>
> write.ftable2(sep = ";")

> Best regards,
> Thomas

> __
> R-devel@r-project.org mailing list
> 
https://urldefense.com/v3/__https://stat.ethz.ch/mailman/listinfo/r-devel__;!!JQ5agg!KmMY870t4h88LdWZEtwjGSopF57R5zxrL05DHa6nECGqI5_nFYUsC3OJoOzD0M93CBRa$
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] sep hard coded in write.ftable

2021-09-01 Thread SOEIRO Thomas
Dear all,

(This is a follow up of a previous suggestion for ftable that was added in R 
4.1.0: https://stat.ethz.ch/pipermail/r-devel/2020-May/079451.html)

The sep argument is hard coded in write.ftable:

write.ftable <- function(x, file = "", quote = TRUE, append = FALSE,
 digits = getOption("digits"), ...)
{
r <- format.ftable(x, quote = quote, digits = digits, ...)
cat(t(r), file = file, append = append,
sep = c(rep(" ", ncol(r) - 1), "\n"))
invisible(x)
}

A minor change would allow users to modify it:

write.ftable2 <- function(x, file = "", quote = TRUE, append = FALSE,
  digits = getOption("digits"), sep = " ", ...)
{
  r <- stats:::format.ftable(x, quote = quote, digits = digits, ...)
  cat(t(r), file = file, append = append,
  sep = c(rep(sep, ncol(r) - 1), "\n"))
  invisible(x)
}

This would allow to avoid a previous call to format.ftable (although 
write.ftable is significantly slower than write.table):

ftable(formula = wool + tension ~ breaks, data = warpbreaks) |>
  format(quote = FALSE) |>
  write.table(sep = ";", row.names = FALSE, col.names = FALSE)

ftable(formula = wool + tension ~ breaks, data = warpbreaks) |>
  write.ftable2(sep = ";")

Best regards,

Thomas

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Potential improvements of ave? (Act 2)

2021-04-17 Thread SOEIRO Thomas
Dear list, 
 
This is a follow-up with another potential improvements of ave.
 
In the doc, x is documented as to be "a numeric", but this is not mandatory. 
 
DF <- data.frame(x = letters, group = rep(1:2, each = 13)) 
ave(DF$x, DF$group, FUN = function(i) "a") 
#  [1] "a" "a" "a" "a" "a" "a" "a" "a" "a" "a" "a" "a" "a" "a" "a" "a" "a" "a" 
"a" 
# [20] "a" "a" "a" "a" "a" "a" "a" 
 
However coercion can raise issues if the type of x and FUN(x) do not match. 
Coercion happens in split<-.default in the for loop with x[i] <- value[[j]].
(NB: In the following example, we can work around the problem by wrapping x 
with as.numeric.)
 
DF <- data.frame(x = Sys.Date() + 1:10, group = rep(1:2, each = 5)) 
ave(DF$x, DF$group, FUN = function(i) 1) 
# Error in as.Date.numeric(value) : 'origin' must be supplied 
 
So I have 2 questions/suggestions: 
- Could the doc rather state that x must match the type of FUN(x) and warn for 
coercion?
- Could ave be more flexible (i.e. allow different type of x and FUN(x)) if 
using another approach than x[i] <- value[[j]] in split<-.default for 
recycling? 
 
This has already been discussed on r-help and stackoverflow (e.g. 
https://stat.ethz.ch/pipermail/r-help/2016-November/442855.html) 
 
Best, 
 
Thomas

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] reshape documentation

2021-04-17 Thread SOEIRO Thomas
Dear Deepayan,

I do not have further suggestions, but I just wanted to thank you for taking 
the time to improve the documentation so much! (and for adding support for 
specifying "varying" as a vector)

Both "Typical usage" and the details are useful additions. Adding a vignette 
also seems an excellent idea.

These changes will probably helps numerous users.

Best,

Thomas




On Wed, Mar 17, 2021 at 7:55 PM Michael Dewey  
wrote:
>
> Comments in line
>
> On 13/03/2021 09:50, SOEIRO Thomas wrote:
> > Dear list,
> >
> > I have some questions/suggestions about reshape.
> >
> > 1) I think a good amount of the popularity of base::reshape alternative is 
> > due to the complexity of reshape documentation. It is quite hard (at least 
> > it is for me) to figure out what argument is needed for respectively "long 
> > to wide" and "wide to long", because reshapeWide and reshapeLong are 
> > documented together.
> > - Do you agree with this?
> > - Would you consider a proposal to modify the documentation?
> > - If yes, what approach do you suggest? e.g. split in two pages?
>
> The current documentation is much clearer than it was when I first
> started using R but we should always strive for more.
>
> I would suggest leaving the documentation in one place but it might be
> helpful to add which direction is relevant for each parameter by placing
> (to wide) or (to long) as appropriate. I think having completely
> separate lists is not needed

I have just checked in some updates to the documentation (in R-devel)
which hopefully makes usage clearer. Any further suggestions are
welcome. We are planning to add a short vignette as well, hopefully in
time for R 4.1.0.

> > 2) I do not think the documentation indicates that we can use varying 
> > argument to rename variables in reshapeWide.
> > - Is this worth documenting?
> > - Is the construct list(c()) really needed?
>
> Yes, because you may have more than one set of variables which need to
> correspond to a single variable in long format. So in your example if
> you also had 11 variables for the temperature as well as the
> concentration each would need specifying as a separate vector in the list.

That's a valid point, but on the other hand, direction="long" already
supports specifying 'varying' as a vector, and it does simplify the
single variable case. So we decided to be consistent and allow it for
direction="wide" too, hopefully with loud enough warnings in the
documentation about using the feature carelessly.

Best,
-Deepayan

> Michael
>
> >
> > reshape(Indometh,
> >  v.names = "conc",
> >  idvar = "Subject",
> >  timevar = "time",
> >  direction = "wide",
> >  varying = list(c("conc_0.25hr",
> >   "conc_0.5hr",
> >   "conc.0.75hr",
> >   "conc_1hr",
> >   "conc_1.25hr",
> >   "conc_2hr",
> >   "conc_3hr",
> >   "conc_4hr",
> >   "conc_5hr",
> >   "conc_6hr",
> >   "conc_8hr")))
> >
> > Thanks,
> >
> > Thomas
> > __
> > R-devel using r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
> >
>
> --
> Michael
> http://www.dewey.myzen.co.uk/home.html
>
> __
> R-devel using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Potential improvements of ave?

2021-03-16 Thread SOEIRO Thomas
Dear all,

Thank you for your consideration on this topic.

I do not have enough knowledge of R internals to join the discussion about 
sorting mechanisms. In fact, I did not get how ordering could help for ave as 
the output must maintain the order of the input (because ave returns only x and 
not the entiere data.frame).

However, while the proposed workaround (i.e. paste0 instead of interaction, cf 
https://stat.ethz.ch/pipermail/r-devel/2021-March/080509.html) does not solves 
the "bigger problem" of sorting, it is usable as is and solves the issue. 
Therefore, what do you think about it? (i.e is it relevant for a patch?)

Thanks,

Thomas


> 
> De : Abby Spurdle 
> Envoyé : lundi 15 mars 2021 10:22
> À : SOEIRO Thomas
> Cc : r-devel@r-project.org
> Objet : Re: [Rd] Potential improvements of ave?
>
> Hi Thomas,
>
> These are some great suggestions.
> But I can't help but feel there's a much bigger problem here.
>
> Intuitively, the ave function could (or should) sort the data.
> Then the indexing step becomes almost trivial, in terms of both time
> and space complexity.
> And the ave function is not the only example of where a problem
> becomes much simpler, if the data is sorted.
>
> Historically, I've never found base R functions user-friendly for
> aggregation purposes, or for sorting.
> (At least, not by comparison to SQL).
>
> But that's not the main problem.
> It would seem preferable to sort the data, only once.
> (Rather than sorting it repeatedly, or not at all).
>
> Perhaps, objects such as vectors and data.frame(s) could have a
> boolean attribute, to indicate if they're sorted.
> Or functions such as ave could have a sorted argument.
> In either case, if true, the function assumes the data is sorted and
> applies a more efficient algorithm.
>
>
> B.
>
>
> On Sat, Mar 13, 2021 at 1:07 PM SOEIRO Thomas  wrote:
>>
>> Dear all,
>>
>> I have two questions/suggestions about ave, but I am not sure if it's 
>> relevant for bug reports.
>>
>>
>>
>> 1) I have performance issues with ave in a case where I didn't expect it. 
>> The following code runs as expected:
>>
>> set.seed(1)
>>
>> df1 <- data.frame(id1 = sample(1:1e2, 5e2, TRUE),
>>   id2 = sample(1:3, 5e2, TRUE),
>>   id3 = sample(1:5, 5e2, TRUE),
>>   val = sample(1:300, 5e2, TRUE))
>>
>> df1$diff <- ave(df1$val,
>> df1$id1,
>> df1$id2,
>> df1$id3,
>> FUN = function(i) c(diff(i), 0))
>>
>> head(df1[order(df1$id1,
>>df1$id2,
>>df1$id3), ])
>>
>> But when expanding the data.frame (* 1e4), ave fails (Error: cannot allocate 
>> vector of size 1110.0 Gb):
>>
>> df2 <- data.frame(id1 = sample(1:(1e2 * 1e4), 5e2 * 1e4, TRUE),
>>   id2 = sample(1:3, 5e2 * 1e4, TRUE),
>>   id3 = sample(1:(5 * 1e4), 5e2 * 1e4, TRUE),
>>   val = sample(1:300, 5e2 * 1e4, TRUE))
>>
>> df2$diff <- ave(df2$val,
>> df2$id1,
>> df2$id2,
>> df2$id3,
>> FUN = function(i) c(diff(i), 0))
>>
>> This use case does not seem extreme to me (e.g. aggregate et al work 
>> perfectly on this data.frame).
>> So my question is: Is this expected/intended/reasonable? i.e. Does ave need 
>> to be optimized?
>>
>>
>>
>> 2) Gabor Grothendieck pointed out in 2011 that drop = TRUE is needed to 
>> avoid warnings in case of unused levels 
>> (https://stat.ethz.ch/pipermail/r-devel/2011-February/059947.html).
>> Is it relevant/possible to expose the drop argument explicitly?
>>
>>
>>
>> Thanks,
>>
>> Thomas
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Potential improvements of ave?

2021-03-15 Thread SOEIRO Thomas
Hi Abby,

Thank you for your positive feedback.

I agree for your general comment about sorting.

For ave specifically, ordering may not help because the output must maintain 
the order of the input (as ave returns only x and not the entiere data.frame).

Thanks,

Thomas

De : Abby Spurdle 
Envoyé : lundi 15 mars 2021 10:22
À : SOEIRO Thomas
Cc : r-devel@r-project.org
Objet : Re: [Rd] Potential improvements of ave?

EMAIL EXTERNE - TRAITER AVEC PRÉCAUTION LIENS ET FICHIERS

Hi Thomas,

These are some great suggestions.
But I can't help but feel there's a much bigger problem here.

Intuitively, the ave function could (or should) sort the data.
Then the indexing step becomes almost trivial, in terms of both time
and space complexity.
And the ave function is not the only example of where a problem
becomes much simpler, if the data is sorted.

Historically, I've never found base R functions user-friendly for
aggregation purposes, or for sorting.
(At least, not by comparison to SQL).

But that's not the main problem.
It would seem preferable to sort the data, only once.
(Rather than sorting it repeatedly, or not at all).

Perhaps, objects such as vectors and data.frame(s) could have a
boolean attribute, to indicate if they're sorted.
Or functions such as ave could have a sorted argument.
In either case, if true, the function assumes the data is sorted and
applies a more efficient algorithm.


B.


On Sat, Mar 13, 2021 at 1:07 PM SOEIRO Thomas  wrote:
>
> Dear all,
>
> I have two questions/suggestions about ave, but I am not sure if it's 
> relevant for bug reports.
>
>
>
> 1) I have performance issues with ave in a case where I didn't expect it. The 
> following code runs as expected:
>
> set.seed(1)
>
> df1 <- data.frame(id1 = sample(1:1e2, 5e2, TRUE),
>   id2 = sample(1:3, 5e2, TRUE),
>   id3 = sample(1:5, 5e2, TRUE),
>   val = sample(1:300, 5e2, TRUE))
>
> df1$diff <- ave(df1$val,
> df1$id1,
> df1$id2,
> df1$id3,
> FUN = function(i) c(diff(i), 0))
>
> head(df1[order(df1$id1,
>df1$id2,
>df1$id3), ])
>
> But when expanding the data.frame (* 1e4), ave fails (Error: cannot allocate 
> vector of size 1110.0 Gb):
>
> df2 <- data.frame(id1 = sample(1:(1e2 * 1e4), 5e2 * 1e4, TRUE),
>   id2 = sample(1:3, 5e2 * 1e4, TRUE),
>   id3 = sample(1:(5 * 1e4), 5e2 * 1e4, TRUE),
>   val = sample(1:300, 5e2 * 1e4, TRUE))
>
> df2$diff <- ave(df2$val,
> df2$id1,
> df2$id2,
> df2$id3,
> FUN = function(i) c(diff(i), 0))
>
> This use case does not seem extreme to me (e.g. aggregate et al work 
> perfectly on this data.frame).
> So my question is: Is this expected/intended/reasonable? i.e. Does ave need 
> to be optimized?
>
>
>
> 2) Gabor Grothendieck pointed out in 2011 that drop = TRUE is needed to avoid 
> warnings in case of unused levels 
> (https://urldefense.com/v3/__https://stat.ethz.ch/pipermail/r-devel/2011-February/059947.html__;!!JQ5agg!J2AUFbQr31F2c6LUpTnyc5TX2Kh1bJ-VqhMND1c0N5axWO_tQl0pCJhtucPfjU7NXrBO$
>  ).
> Is it relevant/possible to expose the drop argument explicitly?
>
>
>
> Thanks,
>
> Thomas
> __
> R-devel@r-project.org mailing list
> https://urldefense.com/v3/__https://stat.ethz.ch/mailman/listinfo/r-devel__;!!JQ5agg!J2AUFbQr31F2c6LUpTnyc5TX2Kh1bJ-VqhMND1c0N5axWO_tQl0pCJhtucPfjUzdLFM1$

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Potential improvements of ave?

2021-03-13 Thread SOEIRO Thomas
The bottleneck of ave is the call to interaction (i.e. not the call to 
split/lapply).

Therefore, the following code runs as expected (but I may miss something...):

ave2 <- function (x, ..., FUN = mean)
{
if(missing(...))
x[] <- FUN(x)
else {
#g <- interaction(...)
g <- paste0(...)
split(x,g) <- lapply(split(x, g), FUN)
}
x
}

df2$diff <- ave2(df2$val,
 df2$id1,
 df2$id2,
 df2$id3,
 FUN = function(i) c(diff(i), 0))



Of course I can also simply solve my current issue with:

df2$id123 <- paste0(df2$id1,
df2$id2,
df2$id3)

df2$diff <- ave(df2$val,
df2$id123,
FUN = function(i) c(diff(i), 0))



In addition, ave2 also avoid warnings in case of unused levels (see point 2) in 
my previous message).
________
De : SOEIRO Thomas
Envoyé : vendredi 12 mars 2021 23:59
À : r-devel@r-project.org
Objet : Potential improvements of ave?

Dear all,

I have two questions/suggestions about ave, but I am not sure if it's relevant 
for bug reports.



1) I have performance issues with ave in a case where I didn't expect it. The 
following code runs as expected:

set.seed(1)

df1 <- data.frame(id1 = sample(1:1e2, 5e2, TRUE),
  id2 = sample(1:3, 5e2, TRUE),
  id3 = sample(1:5, 5e2, TRUE),
  val = sample(1:300, 5e2, TRUE))

df1$diff <- ave(df1$val,
df1$id1,
df1$id2,
df1$id3,
FUN = function(i) c(diff(i), 0))

head(df1[order(df1$id1,
   df1$id2,
   df1$id3), ])

But when expanding the data.frame (* 1e4), ave fails (Error: cannot allocate 
vector of size 1110.0 Gb):

df2 <- data.frame(id1 = sample(1:(1e2 * 1e4), 5e2 * 1e4, TRUE),
  id2 = sample(1:3, 5e2 * 1e4, TRUE),
  id3 = sample(1:(5 * 1e4), 5e2 * 1e4, TRUE),
  val = sample(1:300, 5e2 * 1e4, TRUE))

df2$diff <- ave(df2$val,
df2$id1,
df2$id2,
df2$id3,
FUN = function(i) c(diff(i), 0))

This use case does not seem extreme to me (e.g. aggregate et al work perfectly 
on this data.frame).
So my question is: Is this expected/intended/reasonable? i.e. Does ave need to 
be optimized?



2) Gabor Grothendieck pointed out in 2011 that drop = TRUE is needed to avoid 
warnings in case of unused levels 
(https://stat.ethz.ch/pipermail/r-devel/2011-February/059947.html).
Is it relevant/possible to expose the drop argument explicitly?



Thanks,

Thomas

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] reshape documentation

2021-03-13 Thread SOEIRO Thomas
Dear list,

I have some questions/suggestions about reshape.

1) I think a good amount of the popularity of base::reshape alternative is due 
to the complexity of reshape documentation. It is quite hard (at least it is 
for me) to figure out what argument is needed for respectively "long to wide" 
and "wide to long", because reshapeWide and reshapeLong are documented together.
- Do you agree with this?
- Would you consider a proposal to modify the documentation?
- If yes, what approach do you suggest? e.g. split in two pages?
 
2) I do not think the documentation indicates that we can use varying argument 
to rename variables in reshapeWide.
- Is this worth documenting?
- Is the construct list(c()) really needed?

reshape(Indometh,
v.names = "conc",
idvar = "Subject",
timevar = "time",
direction = "wide",
varying = list(c("conc_0.25hr",
 "conc_0.5hr",
 "conc.0.75hr",
 "conc_1hr",
 "conc_1.25hr",
 "conc_2hr",
 "conc_3hr",
 "conc_4hr",
 "conc_5hr",
 "conc_6hr",
 "conc_8hr")))

Thanks,

Thomas
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Potential improvements of ave?

2021-03-12 Thread SOEIRO Thomas
Dear all,

I have two questions/suggestions about ave, but I am not sure if it's relevant 
for bug reports.



1) I have performance issues with ave in a case where I didn't expect it. The 
following code runs as expected:

set.seed(1)

df1 <- data.frame(id1 = sample(1:1e2, 5e2, TRUE),
  id2 = sample(1:3, 5e2, TRUE),
  id3 = sample(1:5, 5e2, TRUE),
  val = sample(1:300, 5e2, TRUE))

df1$diff <- ave(df1$val,
df1$id1,
df1$id2,
df1$id3,
FUN = function(i) c(diff(i), 0))

head(df1[order(df1$id1,
   df1$id2,
   df1$id3), ])

But when expanding the data.frame (* 1e4), ave fails (Error: cannot allocate 
vector of size 1110.0 Gb):

df2 <- data.frame(id1 = sample(1:(1e2 * 1e4), 5e2 * 1e4, TRUE),
  id2 = sample(1:3, 5e2 * 1e4, TRUE),
  id3 = sample(1:(5 * 1e4), 5e2 * 1e4, TRUE),
  val = sample(1:300, 5e2 * 1e4, TRUE))

df2$diff <- ave(df2$val,
df2$id1,
df2$id2,
df2$id3,
FUN = function(i) c(diff(i), 0))

This use case does not seem extreme to me (e.g. aggregate et al work perfectly 
on this data.frame).
So my question is: Is this expected/intended/reasonable? i.e. Does ave need to 
be optimized?



2) Gabor Grothendieck pointed out in 2011 that drop = TRUE is needed to avoid 
warnings in case of unused levels 
(https://stat.ethz.ch/pipermail/r-devel/2011-February/059947.html).
Is it relevant/possible to expose the drop argument explicitly?



Thanks,

Thomas
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Patch proposal for bug 17770 - xtabs does not act as documented for na.action = na.pass

2020-05-21 Thread SOEIRO Thomas
Dear all,

(This issue was previously reported on Bugzilla 
(https://bugs.r-project.org/bugzilla/show_bug.cgi?id=17770) and discussed on 
Stack Overflow (https://stackoverflow.com/q/61240049).)

The documentation of xtabs says:

"na.action: When it is na.pass and formula has a left hand side (with counts), 
sum(*, na.rm = TRUE) is used instead of sum(*) for the counts."

However, this is not the case:
 
DF <- data.frame(group = c("a", "a", "b", "b"),
 count = c(NA, TRUE, FALSE, TRUE))

xtabs(formula = count ~ group,
  data = DF,
  na.action = na.pass)

# group
# a b
# 1

In the code, na.rm is TRUE if and only if na.action = na.omit:

na.rm <- 
  identical(naAct, quote(na.omit)) || identical(naAct, na.omit) ||
  identical(naAct, "na.omit")

xtabs(formula = count ~ group,
  data = DF,
  na.action = na.omit)

# group
# a b
# 1 1

The example works as documented if we change the code to:

na.rm <- 
  identical(naAct, quote(na.pass)) || identical(naAct, na.pass) ||
  identical(naAct, "na.pass")

However, there may be something I am missing, and na.omit may be necessary for 
something else...

Best regards,

Thomas

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] justify hard coded in format.ftable

2020-05-15 Thread SOEIRO Thomas
Thanks for the links. I agree that such a feature would be a nice addition, and 
could make ftable even more useful.

In the same spirit, I think it could be useful to mention the undocumented 
base::as.data.frame.matrix function in documentation of table and xtabs (in 
addition to the already mentioned base::as.data.frame.table). The conversion 
from ftable/table/xtabs to data.frame is a common task that some users seem to 
struggle with 
(https://stackoverflow.com/questions/10758961/how-to-convert-a-table-to-a-data-frame).

tab <- table(warpbreaks$wool, warpbreaks$tension)
as.data.frame(tab) # reshaped table
as.data.frame.matrix(tab) # non-reshaped table

To sum up, for the sake of clarity, these proposals address two different 
topics:
- The justify argument would reduce the need to reformat the exported ftable
- An ftable2df-like function (and the mention of as.data.frame.matrix in the 
documentation) would facilitate the reuse of ftable results for further 
analysis.

Thank you very much,

Thomas

> If you are looking at ftable could you also consider adding a way to convert 
> an ftable into a usable data.frame such as the ftable2df function defined 
> here:
> 
> https://stackoverflow.com/questions/11141406/reshaping-an-array-to-data-frame/11143126#11143126
> 
> and there is an example of using it here:
> 
> https://stackoverflow.com/questions/61333663/manipulating-an-array-into-a-data-frame-in-base-r/61334756#61334756
> 
> Being able to move back and forth between various base class representations 
> seems like something that would be natural to provide.
> 
> Thanks.
> 
> On Thu, May 14, 2020 at 5:32 AM Martin Maechler  
> wrote:
>>
>>>>>>> SOEIRO Thomas
>>>>>>> on Wed, 13 May 2020 20:27:15 + writes:
>>
>>> Dear all,
>>> I haven't received any feedback so far on my proposal to make 
>> "justify" argument available in stats:::format.ftable
>>
>>> Is this list the appropriate place for this kind of proposal?
>>
>> Yes, it is.. Actually such a post is even a "role model" post for 
>> R-devel.
>>
>>> I hope this follow-up to my message won't be taken as rude. Of course it's 
>>> not meant to be, but I'm not used to the R mailing lists...
>>
>> well, there could be said much, and many stories told here ... ;-)
>>
>>> Thank you in advance for your comments,
>>
>>> Best,
>>> Thomas
>>
>> The main reasons for "no reaction" (for such nice post) probably are 
>> combination of the following
>>
>> - we are busy
>> - if we have time, we think other things are more exciting
>> - we have not used ftable much/at all and are not interested.
>>
>> Even though the first 2 apply to me, I'll have a 2nd look into your 
>> post now, and may end up well agreeing with your proposal.
>>
>> Martin Maechler
>> ETH Zurich  and  R Core team
>>
>>
>>
>>
>>>> Dear all,
>>>>
>>>> justify argument is hard coded in format.ftable:
>>>>
>>>> cbind(apply(LABS, 2L, format, justify = "left"),
>>>> apply(DATA, 2L, format, justify = "right"))
>>>>
>>>> It would be useful to have the possibility to modify the argument between 
>>>> c("left", "right", "centre", "none") as in format.default.
>>>>
>>>> The lines could be changed to:
>>>>
>>>> if(length(justify) != 2)
>>>> stop("justify must be length 2")
>>>> cbind(apply(LABS, 2L, format, justify = justify[1]),
>>>> apply(DATA, 2L, format, justify = justify[2]))
>>>>
>>>> The argument justify could defaults to c("left", "right") for backward 
>>>> compatibility.
>>>>
>>>> It could then allow:
>>>> ftab <- ftable(wool + tension ~ breaks, warpbreaks)
>>>> format.ftable(ftab, justify = c("none", "none"))
>>>>
>>>> Best regards,
>>>>
>>>> Thomas
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] justify hard coded in format.ftable

2020-05-14 Thread SOEIRO Thomas
I suspected it was partly due to the fact that ftable doesn't get much 
interest/isn't much used...

So thank you very much for answering, and for your time!

>> Dear all,
>> I haven't received any feedback so far on my proposal to make "justify" 
>> argument available in stats:::format.ftable
>>
>> Is this list the appropriate place for this kind of proposal?
> 
> Yes, it is.. Actually such a post is even a "role model" post for R-devel.
> 
>> I hope this follow-up to my message won't be taken as rude. Of course it's 
>> not meant to be, but I'm not used to the R mailing lists...
> 
> well, there could be said much, and many stories told here ... ;-)
> 
>> Thank you in advance for your comments,
>> 
>> Best,
>> Thomas
> 
> The main reasons for "no reaction" (for such nice post) probably are 
> combination of the following
> 
> - we are busy
> - if we have time, we think other things are more exciting
> - we have not used ftable much/at all and are not interested.
> 
> Even though the first 2 apply to me, I'll have a 2nd look into your post now, 
> and may end up well agreeing with your proposal.
> 
> Martin Maechler
> ETH Zurich  and  R Core team
> 
>>> Dear all,
>>>
>>> justify argument is hard coded in format.ftable:
>>>
>>> cbind(apply(LABS, 2L, format, justify = "left"),
>>> apply(DATA, 2L, format, justify = "right"))
>>>
>>> It would be useful to have the possibility to modify the argument between 
>>> c("left", "right", "centre", "none") as in format.default.
>>>
>>> The lines could be changed to:
>>>
>>> if(length(justify) != 2)
>>> stop("justify must be length 2")
>>> cbind(apply(LABS, 2L, format, justify = justify[1]),
>>> apply(DATA, 2L, format, justify = justify[2]))
>>>
>>> The argument justify could defaults to c("left", "right") for backward 
>>> compatibility.
>>>
>>> It could then allow:
>>> ftab <- ftable(wool + tension ~ breaks, warpbreaks)
>>> format.ftable(ftab, justify = c("none", "none"))
>>>
>>> Best regards,
>>>
>>> Thomas
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] justify hard coded in format.ftable

2020-05-13 Thread SOEIRO Thomas
Dear all,

I haven't received any feedback so far on my proposal to make "justify" 
argument available in stats:::format.ftable

Is this list the appropriate place for this kind of proposal?

I hope this follow-up to my message won't be taken as rude. Of course it's not 
meant to be, but I'm not used to the R mailing lists...

Thank you in advance for your comments,

Best,

Thomas

> Dear all,
>
> justify argument is hard coded in format.ftable:
>
> cbind(apply(LABS, 2L, format, justify = "left"),
>   apply(DATA, 2L, format, justify = "right"))
>
> It would be useful to have the possibility to modify the argument between 
> c("left", "right", "centre", "none") as in format.default.
>
> The lines could be changed to:
>
> if(length(justify) != 2)
>   stop("justify must be length 2")
> cbind(apply(LABS, 2L, format, justify = justify[1]),
>   apply(DATA, 2L, format, justify = justify[2]))
>
> The argument justify could defaults to c("left", "right") for backward 
> compatibility.
>
> It could then allow:
> ftab <- ftable(wool + tension ~ breaks, warpbreaks)
> format.ftable(ftab, justify = c("none", "none"))
>
> Best regards,
>
> Thomas

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] justify hard coded in format.ftable

2020-05-10 Thread SOEIRO Thomas
Dear all,

justify argument is hard coded in format.ftable:

cbind(apply(LABS, 2L, format, justify = "left"),
  apply(DATA, 2L, format, justify = "right"))

It would be useful to have the possibility to modify the argument between 
c("left", "right", "centre", "none") as in format.default.

The lines could be changed to:

if(length(justify) != 2)
  stop("justify must be length 2")
cbind(apply(LABS, 2L, format, justify = justify[1]),
  apply(DATA, 2L, format, justify = justify[2]))

The argument justify could defaults to c("left", "right") for backward 
compatibility.

It could then allow:
ftab <- ftable(wool + tension ~ breaks, warpbreaks)
format.ftable(ftab, justify = c("none", "none"))

Best regards,

Thomas

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel