Re: [Rd] head.matrix can return 1000s of columns -- limit to n or add new argument?

2019-10-29 Thread Gabriel Becker
Hi all,

So I've started working on this and I ran into something that I didn't
know, namely that for x a multi-dimensional (2+) array, head(x) and tail(x)
ignore dimension completely, treat x as an atomic vector, and return an
(unclassed) atomic vector:

> x = array(100, c(4, 5, 5))

> dim(x)

[1] 4 5 5

> head(x, 1)

[1] 100

> class(head(x))

[1] "numeric"


(For a 1d array, it does return another 1d array).

When extending head/tail to understand multiple dimensions as discussed in
this thread, then, should the behavior for 2+d arrays be explicitly
retained, or should head and tail do the analogous thing (with a head(<2d
array>) behaving the same as head(), which honestly is what I
expected to already be happening)?

Are people using/relying on this behavior in their code, and if so, why/for
what?

Even more generally, one way forward is to have the default methods check
for dimensions, and use length if it is null:

tail.default <- tail.data.frame <- function(x, n = 6L, ...)
{
if(any(n == 0))
stop("n must be non-zero or unspecified for all dimensions")
if(!is.null(dim(x)))
dimsx <- dim(x)
else
dimsx <- length(x)

## this returns a list of vectors of indices in each
## dimension, regardless of length of the the n
## argument
sel <- lapply(seq_along(dimsx), function(i) {
dxi <- dimsx[i]
## select all indices (full dim) if not specified
ni <- if(length(n) >= i) n[i] else dxi
## handle negative ns
ni <- if (ni < 0L) max(dxi + ni, 0L) else min(ni, dxi)
seq.int(to = dxi, length.out = ni)
})
args <- c(list(x), sel, drop = FALSE)
do.call("[", args)
}


I think this precludes the need for a separate data.frame method at all,
actually, though (I would think) tail.data.frame would still be defined and
exported for backwards compatibility. (the matrix method has some extra
bits so my current conception of it is still separate, though it might not
NEED to be).

The question then becomes, should head/tail always return something with
the same dimensionally (number of dims) it got, or should data.frame and
matrix be special cased in this regard, as they are now?

What are people's thoughts?
~G

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] head.matrix can return 1000s of columns -- limit to n or add new argument?

2019-10-29 Thread Jan Gorecki
Gabriel,
My view is rather radical.

- head/tail should return object having same number of dimensions
- data.frame should be a special case
- matrix should be handled as 2D array

P.S. idea of accepting `n` argument as a vector of corresponding
dimensions is a brilliant one

On Wed, Oct 30, 2019 at 1:13 AM Gabriel Becker  wrote:
>
> Hi all,
>
> So I've started working on this and I ran into something that I didn't
> know, namely that for x a multi-dimensional (2+) array, head(x) and tail(x)
> ignore dimension completely, treat x as an atomic vector, and return an
> (unclassed) atomic vector:
>
> > x = array(100, c(4, 5, 5))
>
> > dim(x)
>
> [1] 4 5 5
>
> > head(x, 1)
>
> [1] 100
>
> > class(head(x))
>
> [1] "numeric"
>
>
> (For a 1d array, it does return another 1d array).
>
> When extending head/tail to understand multiple dimensions as discussed in
> this thread, then, should the behavior for 2+d arrays be explicitly
> retained, or should head and tail do the analogous thing (with a head(<2d
> array>) behaving the same as head(), which honestly is what I
> expected to already be happening)?
>
> Are people using/relying on this behavior in their code, and if so, why/for
> what?
>
> Even more generally, one way forward is to have the default methods check
> for dimensions, and use length if it is null:
>
> tail.default <- tail.data.frame <- function(x, n = 6L, ...)
> {
> if(any(n == 0))
> stop("n must be non-zero or unspecified for all dimensions")
> if(!is.null(dim(x)))
> dimsx <- dim(x)
> else
> dimsx <- length(x)
>
> ## this returns a list of vectors of indices in each
> ## dimension, regardless of length of the the n
> ## argument
> sel <- lapply(seq_along(dimsx), function(i) {
> dxi <- dimsx[i]
> ## select all indices (full dim) if not specified
> ni <- if(length(n) >= i) n[i] else dxi
> ## handle negative ns
> ni <- if (ni < 0L) max(dxi + ni, 0L) else min(ni, dxi)
> seq.int(to = dxi, length.out = ni)
> })
> args <- c(list(x), sel, drop = FALSE)
> do.call("[", args)
> }
>
>
> I think this precludes the need for a separate data.frame method at all,
> actually, though (I would think) tail.data.frame would still be defined and
> exported for backwards compatibility. (the matrix method has some extra
> bits so my current conception of it is still separate, though it might not
> NEED to be).
>
> The question then becomes, should head/tail always return something with
> the same dimensionally (number of dims) it got, or should data.frame and
> matrix be special cased in this regard, as they are now?
>
> What are people's thoughts?
> ~G
>
> [[alternative HTML version deleted]]
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel