Re: [R] unexpected behavior in apply

2021-10-13 Thread Derickson, Ryan, VHA NCOD via R-help
If an oven expects fried potatoes and I put a cake in, I would hope it
complains or does nothing rather than surreptitiously poisoning my cake.
Jiefei's finding that "6" becomes " 6" during matrix coercion (apparently
for aesthetic reasons only) feels more like the latter. But I appreciate the
explanation and the solutions.   



-Original Message-
From: PIKAL Petr  
Sent: Monday, October 11, 2021 5:15 AM
To: Jiefei Wang ; Derickson, Ryan, VHA NCOD

Cc: r-help@r-project.org
Subject: [EXTERNAL] RE: [R] unexpected behavior in apply

Hi

it is not surprising at all.

from apply documentation

Arguments
X   
an array, including a matrix.

data.frame is not matrix or array (even if it rather resembles one)

So if you put a cake into oven you cannot expect getting fried potatoes from
it.

For data frames sapply or lapply is preferable as it is designed for lists
and data frame is (again from documentation)

A data frame is a list of variables of the same number of rows with unique
row names, given class "data.frame".

> sapply(d,function(x) all(x[!is.na(x)]<=3))
   d1d2d3 
FALSE  TRUE FALSE 

Cheers
Petr


> -Original Message-
> From: R-help  On Behalf Of Jiefei Wang
> Sent: Friday, October 8, 2021 8:22 PM
> To: Derickson, Ryan, VHA NCOD 
> Cc: r-help@r-project.org
> Subject: Re: [R] unexpected behavior in apply
> 
> Ok, it turns out that this is documented, even though it looks surprising.
> 
> First of all, the apply function will try to convert any object with the
dim
> attribute to a matrix(my intuition agrees with you that there should be no
> conversion), so the first step of the apply function is
> 
> > as.matrix.data.frame(d)
>  d1  d2  d3
> [1,] "a" "1" NA
> [2,] "b" "2" NA
> [3,] "c" "3" " 6"
> 
> Since the data frame `d` is a mixture of character and non-character
values,
> the non-character value will be converted to the character using the
function
> `format`. However, the problem is that the NA value will also be formatted
to
> the character
> 
> > format(c(NA, 6))
> [1] "NA" " 6"
> 
> That's where the space comes from. It is purely for making the result
pretty...
> The character NA will be removed later, but the space is not stripped. I
would
> say this is not a good design, and it might be worth not including the NA
value
> in the format function. At the current stage, I will suggest using the
function
> `lapply` to do what you want.
> 
> > lapply(d, FUN=function(x)all(x[!is.na(x)] <= 3))
> $d1
> [1] FALSE
> $d2
> [1] TRUE
> $d3
> [1] FALSE
> 
> Everything should work as you expect.
> 
> Best,
> Jiefei
> 
> On Sat, Oct 9, 2021 at 2:03 AM Jiefei Wang  wrote:
> >
> > Hi,
> >
> > I guess this can tell you what happens behind the scene
> >
> >
> > > d<-data.frame(d1 = letters[1:3],
> > +   d2 = c(1,2,3),
> > +   d3 = c(NA,NA,6))
> > > apply(d, 2, FUN=function(x)x)
> >  d1  d2  d3
> > [1,] "a" "1" NA
> > [2,] "b" "2" NA
> > [3,] "c" "3" " 6"
> > > "a"<=3
> > [1] FALSE
> > > "2"<=3
> > [1] TRUE
> > > "6"<=3
> > [1] FALSE
> >
> > Note that there is an additional space in the character value " 6",
> > that's why your comparison fails. I do not understand why but this
> > might be a bug in R
> >
> > Best,
> > Jiefei
> >
> > On Sat, Oct 9, 2021 at 1:49 AM Derickson, Ryan, VHA NCOD via R-help
> >  wrote:
> > >
> > > Hello,
> > >
> > > I'm seeing unexpected behavior when using apply() compared to a for
> loop when a character vector is part of the data subjected to the apply
> statement. Below, I check whether all non-missing values are <= 3. If I
> include a character column, apply incorrectly returns TRUE for d3. If I
only
> pass the numeric columns to apply, it is correct for d3. If I use a for
loop, it is
> correct.
> > >
> > > > d<-data.frame(d1 = letters[1:3],
> > > +   d2 = c(1,2,3),
> > > +   d3 = c(NA,NA,6))
> > > >
> > > > d
> > >   d1 d2 d3
> > > 1  a  1 NA
> > > 2  b  2 NA
> > > 3  c  3  6
> > > >
> > > > # results are incorrect
> > > > apply(d, 2, FUN=function(x)all(x[!is.na(x)] <= 3))
> > >d1d2d3
> > > FALSE  TRUE  TRUE
> > > >
> >

Re: [R] unexpected behavior in apply

2021-10-11 Thread Rolf Turner
On Mon, 11 Oct 2021 09:15:27 +
PIKAL Petr  wrote:



> 
> data.frame is not matrix or array (even if it rather resembles one)
> 
> So if you put a cake into oven you cannot expect getting fried
> potatoes from it.



Another fortune nomination!

cheers,

Rolf

-- 
Honorary Research Fellow
Department of Statistics
University of Auckland
Phone: +64-9-373-7599 ext. 88276

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] unexpected behavior in apply

2021-10-11 Thread PIKAL Petr
Hi

it is not surprising at all.

from apply documentation

Arguments
X   
an array, including a matrix.

data.frame is not matrix or array (even if it rather resembles one)

So if you put a cake into oven you cannot expect getting fried potatoes from
it.

For data frames sapply or lapply is preferable as it is designed for lists
and data frame is (again from documentation)

A data frame is a list of variables of the same number of rows with unique
row names, given class "data.frame".

> sapply(d,function(x) all(x[!is.na(x)]<=3))
   d1d2d3 
FALSE  TRUE FALSE 

Cheers
Petr


> -Original Message-
> From: R-help  On Behalf Of Jiefei Wang
> Sent: Friday, October 8, 2021 8:22 PM
> To: Derickson, Ryan, VHA NCOD 
> Cc: r-help@r-project.org
> Subject: Re: [R] unexpected behavior in apply
> 
> Ok, it turns out that this is documented, even though it looks surprising.
> 
> First of all, the apply function will try to convert any object with the
dim
> attribute to a matrix(my intuition agrees with you that there should be no
> conversion), so the first step of the apply function is
> 
> > as.matrix.data.frame(d)
>  d1  d2  d3
> [1,] "a" "1" NA
> [2,] "b" "2" NA
> [3,] "c" "3" " 6"
> 
> Since the data frame `d` is a mixture of character and non-character
values,
> the non-character value will be converted to the character using the
function
> `format`. However, the problem is that the NA value will also be formatted
to
> the character
> 
> > format(c(NA, 6))
> [1] "NA" " 6"
> 
> That's where the space comes from. It is purely for making the result
pretty...
> The character NA will be removed later, but the space is not stripped. I
would
> say this is not a good design, and it might be worth not including the NA
value
> in the format function. At the current stage, I will suggest using the
function
> `lapply` to do what you want.
> 
> > lapply(d, FUN=function(x)all(x[!is.na(x)] <= 3))
> $d1
> [1] FALSE
> $d2
> [1] TRUE
> $d3
> [1] FALSE
> 
> Everything should work as you expect.
> 
> Best,
> Jiefei
> 
> On Sat, Oct 9, 2021 at 2:03 AM Jiefei Wang  wrote:
> >
> > Hi,
> >
> > I guess this can tell you what happens behind the scene
> >
> >
> > > d<-data.frame(d1 = letters[1:3],
> > +   d2 = c(1,2,3),
> > +   d3 = c(NA,NA,6))
> > > apply(d, 2, FUN=function(x)x)
> >  d1  d2  d3
> > [1,] "a" "1" NA
> > [2,] "b" "2" NA
> > [3,] "c" "3" " 6"
> > > "a"<=3
> > [1] FALSE
> > > "2"<=3
> > [1] TRUE
> > > "6"<=3
> > [1] FALSE
> >
> > Note that there is an additional space in the character value " 6",
> > that's why your comparison fails. I do not understand why but this
> > might be a bug in R
> >
> > Best,
> > Jiefei
> >
> > On Sat, Oct 9, 2021 at 1:49 AM Derickson, Ryan, VHA NCOD via R-help
> >  wrote:
> > >
> > > Hello,
> > >
> > > I'm seeing unexpected behavior when using apply() compared to a for
> loop when a character vector is part of the data subjected to the apply
> statement. Below, I check whether all non-missing values are <= 3. If I
> include a character column, apply incorrectly returns TRUE for d3. If I
only
> pass the numeric columns to apply, it is correct for d3. If I use a for
loop, it is
> correct.
> > >
> > > > d<-data.frame(d1 = letters[1:3],
> > > +   d2 = c(1,2,3),
> > > +   d3 = c(NA,NA,6))
> > > >
> > > > d
> > >   d1 d2 d3
> > > 1  a  1 NA
> > > 2  b  2 NA
> > > 3  c  3  6
> > > >
> > > > # results are incorrect
> > > > apply(d, 2, FUN=function(x)all(x[!is.na(x)] <= 3))
> > >d1d2d3
> > > FALSE  TRUE  TRUE
> > > >
> > > > # results are correct
> > > > apply(d[,2:3], 2, FUN=function(x)all(x[!is.na(x)] <= 3))
> > >d2d3
> > >  TRUE FALSE
> > > >
> > > > # results are correct
> > > > for(i in names(d)){
> > > +   print(all(d[!is.na(d[,i]),i] <= 3)) }
> > > [1] FALSE
> > > [1] TRUE
> > > [1] FALSE
> > >
> > >
> > > Finally, if I remove the NA values from d3 and include the character
> column in apply, it is correct.
> > >
> > > > d<-data.frame(d1 = let

Re: [R] unexpected behavior in apply

2021-10-08 Thread Jiefei Wang
Ok, it turns out that this is documented, even though it looks surprising.

First of all, the apply function will try to convert any object with
the dim attribute to a matrix(my intuition agrees with you that there
should be no conversion), so the first step of the apply function is

> as.matrix.data.frame(d)
 d1  d2  d3
[1,] "a" "1" NA
[2,] "b" "2" NA
[3,] "c" "3" " 6"

Since the data frame `d` is a mixture of character and non-character
values, the non-character value will be converted to the character
using the function `format`. However, the problem is that the NA value
will also be formatted to the character

> format(c(NA, 6))
[1] "NA" " 6"

That's where the space comes from. It is purely for making the result
pretty... The character NA will be removed later, but the space is not
stripped. I would say this is not a good design, and it might be worth
not including the NA value in the format function. At the current
stage, I will suggest using the function `lapply` to do what you want.

> lapply(d, FUN=function(x)all(x[!is.na(x)] <= 3))
$d1
[1] FALSE
$d2
[1] TRUE
$d3
[1] FALSE

Everything should work as you expect.

Best,
Jiefei

On Sat, Oct 9, 2021 at 2:03 AM Jiefei Wang  wrote:
>
> Hi,
>
> I guess this can tell you what happens behind the scene
>
>
> > d<-data.frame(d1 = letters[1:3],
> +   d2 = c(1,2,3),
> +   d3 = c(NA,NA,6))
> > apply(d, 2, FUN=function(x)x)
>  d1  d2  d3
> [1,] "a" "1" NA
> [2,] "b" "2" NA
> [3,] "c" "3" " 6"
> > "a"<=3
> [1] FALSE
> > "2"<=3
> [1] TRUE
> > "6"<=3
> [1] FALSE
>
> Note that there is an additional space in the character value " 6",
> that's why your comparison fails. I do not understand why but this
> might be a bug in R
>
> Best,
> Jiefei
>
> On Sat, Oct 9, 2021 at 1:49 AM Derickson, Ryan, VHA NCOD via R-help
>  wrote:
> >
> > Hello,
> >
> > I'm seeing unexpected behavior when using apply() compared to a for loop 
> > when a character vector is part of the data subjected to the apply 
> > statement. Below, I check whether all non-missing values are <= 3. If I 
> > include a character column, apply incorrectly returns TRUE for d3. If I 
> > only pass the numeric columns to apply, it is correct for d3. If I use a 
> > for loop, it is correct.
> >
> > > d<-data.frame(d1 = letters[1:3],
> > +   d2 = c(1,2,3),
> > +   d3 = c(NA,NA,6))
> > >
> > > d
> >   d1 d2 d3
> > 1  a  1 NA
> > 2  b  2 NA
> > 3  c  3  6
> > >
> > > # results are incorrect
> > > apply(d, 2, FUN=function(x)all(x[!is.na(x)] <= 3))
> >d1d2d3
> > FALSE  TRUE  TRUE
> > >
> > > # results are correct
> > > apply(d[,2:3], 2, FUN=function(x)all(x[!is.na(x)] <= 3))
> >d2d3
> >  TRUE FALSE
> > >
> > > # results are correct
> > > for(i in names(d)){
> > +   print(all(d[!is.na(d[,i]),i] <= 3))
> > + }
> > [1] FALSE
> > [1] TRUE
> > [1] FALSE
> >
> >
> > Finally, if I remove the NA values from d3 and include the character column 
> > in apply, it is correct.
> >
> > > d<-data.frame(d1 = letters[1:3],
> > +   d2 = c(1,2,3),
> > +   d3 = c(4,5,6))
> > >
> > > d
> >   d1 d2 d3
> > 1  a  1  4
> > 2  b  2  5
> > 3  c  3  6
> > >
> > > # results are correct
> > > apply(d, 2, FUN=function(x)all(x[!is.na(x)] <= 3))
> >d1d2d3
> > FALSE  TRUE FALSE
> >
> >
> > Can someone help me understand what's happening?
> >
> > __
> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] unexpected behavior in apply

2021-10-08 Thread Andrew Simmons
Hello,


The issue comes that 'apply' tries to coerce its argument to a matrix. This
means that all your columns will become character class, and the result
will not be what you wanted. I would suggest something more like:


sapply(d, function(x) all(x[!is.na(x)] <= 3))

or

vapply(d, function(x) all(x[!is.na(x)] <= 3), NA)


Also, here is a different method that might look cleaner:


sapply(d, function(x) all(x <= 3, na.rm = TRUE))

vapply(d, function(x) all(x <= 3, na.rm = TRUE), NA)


It's up to you which you choose. I hope this helps!

On Fri, Oct 8, 2021 at 1:50 PM Derickson, Ryan, VHA NCOD via R-help <
r-help@r-project.org> wrote:

> Hello,
>
> I'm seeing unexpected behavior when using apply() compared to a for loop
> when a character vector is part of the data subjected to the apply
> statement. Below, I check whether all non-missing values are <= 3. If I
> include a character column, apply incorrectly returns TRUE for d3. If I
> only pass the numeric columns to apply, it is correct for d3. If I use a
> for loop, it is correct.
>
> > d<-data.frame(d1 = letters[1:3],
> +   d2 = c(1,2,3),
> +   d3 = c(NA,NA,6))
> >
> > d
>   d1 d2 d3
> 1  a  1 NA
> 2  b  2 NA
> 3  c  3  6
> >
> > # results are incorrect
> > apply(d, 2, FUN=function(x)all(x[!is.na(x)] <= 3))
>d1d2d3
> FALSE  TRUE  TRUE
> >
> > # results are correct
> > apply(d[,2:3], 2, FUN=function(x)all(x[!is.na(x)] <= 3))
>d2d3
>  TRUE FALSE
> >
> > # results are correct
> > for(i in names(d)){
> +   print(all(d[!is.na(d[,i]),i] <= 3))
> + }
> [1] FALSE
> [1] TRUE
> [1] FALSE
>
>
> Finally, if I remove the NA values from d3 and include the character
> column in apply, it is correct.
>
> > d<-data.frame(d1 = letters[1:3],
> +   d2 = c(1,2,3),
> +   d3 = c(4,5,6))
> >
> > d
>   d1 d2 d3
> 1  a  1  4
> 2  b  2  5
> 3  c  3  6
> >
> > # results are correct
> > apply(d, 2, FUN=function(x)all(x[!is.na(x)] <= 3))
>d1d2d3
> FALSE  TRUE FALSE
>
>
> Can someone help me understand what's happening?
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] unexpected behavior in apply

2021-10-08 Thread Jiefei Wang
Hi,

I guess this can tell you what happens behind the scene


> d<-data.frame(d1 = letters[1:3],
+   d2 = c(1,2,3),
+   d3 = c(NA,NA,6))
> apply(d, 2, FUN=function(x)x)
 d1  d2  d3
[1,] "a" "1" NA
[2,] "b" "2" NA
[3,] "c" "3" " 6"
> "a"<=3
[1] FALSE
> "2"<=3
[1] TRUE
> "6"<=3
[1] FALSE

Note that there is an additional space in the character value " 6",
that's why your comparison fails. I do not understand why but this
might be a bug in R

Best,
Jiefei

On Sat, Oct 9, 2021 at 1:49 AM Derickson, Ryan, VHA NCOD via R-help
 wrote:
>
> Hello,
>
> I'm seeing unexpected behavior when using apply() compared to a for loop when 
> a character vector is part of the data subjected to the apply statement. 
> Below, I check whether all non-missing values are <= 3. If I include a 
> character column, apply incorrectly returns TRUE for d3. If I only pass the 
> numeric columns to apply, it is correct for d3. If I use a for loop, it is 
> correct.
>
> > d<-data.frame(d1 = letters[1:3],
> +   d2 = c(1,2,3),
> +   d3 = c(NA,NA,6))
> >
> > d
>   d1 d2 d3
> 1  a  1 NA
> 2  b  2 NA
> 3  c  3  6
> >
> > # results are incorrect
> > apply(d, 2, FUN=function(x)all(x[!is.na(x)] <= 3))
>d1d2d3
> FALSE  TRUE  TRUE
> >
> > # results are correct
> > apply(d[,2:3], 2, FUN=function(x)all(x[!is.na(x)] <= 3))
>d2d3
>  TRUE FALSE
> >
> > # results are correct
> > for(i in names(d)){
> +   print(all(d[!is.na(d[,i]),i] <= 3))
> + }
> [1] FALSE
> [1] TRUE
> [1] FALSE
>
>
> Finally, if I remove the NA values from d3 and include the character column 
> in apply, it is correct.
>
> > d<-data.frame(d1 = letters[1:3],
> +   d2 = c(1,2,3),
> +   d3 = c(4,5,6))
> >
> > d
>   d1 d2 d3
> 1  a  1  4
> 2  b  2  5
> 3  c  3  6
> >
> > # results are correct
> > apply(d, 2, FUN=function(x)all(x[!is.na(x)] <= 3))
>d1d2d3
> FALSE  TRUE FALSE
>
>
> Can someone help me understand what's happening?
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] unexpected behavior in apply

2021-10-08 Thread Derickson, Ryan, VHA NCOD via R-help
Hello, 

I'm seeing unexpected behavior when using apply() compared to a for loop when a 
character vector is part of the data subjected to the apply statement. Below, I 
check whether all non-missing values are <= 3. If I include a character column, 
apply incorrectly returns TRUE for d3. If I only pass the numeric columns to 
apply, it is correct for d3. If I use a for loop, it is correct. 

> d<-data.frame(d1 = letters[1:3],
+   d2 = c(1,2,3),
+   d3 = c(NA,NA,6))
> 
> d
  d1 d2 d3
1  a  1 NA
2  b  2 NA
3  c  3  6
> 
> # results are incorrect
> apply(d, 2, FUN=function(x)all(x[!is.na(x)] <= 3))
   d1d2d3 
FALSE  TRUE  TRUE 
> 
> # results are correct
> apply(d[,2:3], 2, FUN=function(x)all(x[!is.na(x)] <= 3))
   d2d3 
 TRUE FALSE 
> 
> # results are correct
> for(i in names(d)){
+   print(all(d[!is.na(d[,i]),i] <= 3))
+ }
[1] FALSE
[1] TRUE
[1] FALSE


Finally, if I remove the NA values from d3 and include the character column in 
apply, it is correct.

> d<-data.frame(d1 = letters[1:3],
+   d2 = c(1,2,3),
+   d3 = c(4,5,6))
> 
> d
  d1 d2 d3
1  a  1  4
2  b  2  5
3  c  3  6
> 
> # results are correct
> apply(d, 2, FUN=function(x)all(x[!is.na(x)] <= 3))
   d1d2d3 
FALSE  TRUE FALSE


Can someone help me understand what's happening?

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Unexpected behavior of apply when FUN=sample

2013-05-14 Thread Luca Nanetti
Dear experts,

I wanted to signal a peculiar, unexpected behaviour of 'apply'. It is not a
bug, it is per spec, but it is so counterintuitive that I thought it could
be interesting.

I have an array, let's say test, dim=c(7,5).

 test - array(1:35, dim=c(7, 5))
 test

 [,1] [,2] [,3] [,4] [,5]
[1,]18   15   22   29
[2,]29   16   23   30
[3,]3   10   17   24   31
[4,]4   11   18   25   32
[5,]5   12   19   26   33
[6,]6   13   20   27   34
[7,]7   14   21   28   35

I want a new array where the content of the rows (columns) are permuted,
differently per row (per column)

Let's start with the columns, i.e. the second MARGIN of the array:
 test.m2 - apply(test, 2, sample)
 test.m2

 [,1] [,2] [,3] [,4] [,5]
[1,]1   10   18   23   32
[2,]79   16   25   30
[3,]6   14   17   22   33
[4,]4   11   15   24   34
[5,]2   12   21   28   31
[6,]58   20   26   29
[7,]3   13   19   27   35

perfect. That was exactly what I wanted: the content of each column is
shuffled, and differently for each column.
However, if I use the same with the rows (MARGIIN = 1), the output is
transposed!

 test.m1 - apply(test, 1, sample)
 test.m1

 [,1] [,2] [,3] [,4] [,5] [,6] [,7]
[1,]12345   13   21
[2,]   22   30   17   18   19   20   35
[3,]   15   23   24   32   26   27   14
[4,]   29   16   31   25   33   34   28
[5,]89   10   11   1267

In other words, I wanted to permute the content of the rows of test, and
I expected to see in the output, well, the shuffled rows as rows, not as
column!

I would respectfully suggest to make this behavior more explicit in the
documentation.

Kind regards,
Luca Nanetti
-- 
__

Luca Nanetti, MSc, MRI
University Medical Center Groningen
Neuroimaging Center Groningen
Groningen, The Netherlands
Tel: +31 50 363 4733

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Unexpected behavior of apply when FUN=sample

2013-05-14 Thread Duncan Murdoch

On 13-05-14 4:52 AM, Luca Nanetti wrote:

Dear experts,

I wanted to signal a peculiar, unexpected behaviour of 'apply'. It is not a
bug, it is per spec, but it is so counterintuitive that I thought it could
be interesting.

I have an array, let's say test, dim=c(7,5).


test - array(1:35, dim=c(7, 5))
test


  [,1] [,2] [,3] [,4] [,5]
[1,]18   15   22   29
[2,]29   16   23   30
[3,]3   10   17   24   31
[4,]4   11   18   25   32
[5,]5   12   19   26   33
[6,]6   13   20   27   34
[7,]7   14   21   28   35

I want a new array where the content of the rows (columns) are permuted,
differently per row (per column)

Let's start with the columns, i.e. the second MARGIN of the array:

test.m2 - apply(test, 2, sample)
test.m2


  [,1] [,2] [,3] [,4] [,5]
[1,]1   10   18   23   32
[2,]79   16   25   30
[3,]6   14   17   22   33
[4,]4   11   15   24   34
[5,]2   12   21   28   31
[6,]58   20   26   29
[7,]3   13   19   27   35

perfect. That was exactly what I wanted: the content of each column is
shuffled, and differently for each column.
However, if I use the same with the rows (MARGIIN = 1), the output is
transposed!


test.m1 - apply(test, 1, sample)
test.m1


  [,1] [,2] [,3] [,4] [,5] [,6] [,7]
[1,]12345   13   21
[2,]   22   30   17   18   19   20   35
[3,]   15   23   24   32   26   27   14
[4,]   29   16   31   25   33   34   28
[5,]89   10   11   1267

In other words, I wanted to permute the content of the rows of test, and
I expected to see in the output, well, the shuffled rows as rows, not as
column!

I would respectfully suggest to make this behavior more explicit in the
documentation.


It's is already very explicit:  If each call to FUN returns a vector of 
length n, then apply returns an array of dimension c(n, dim(X)[MARGIN]) 
if n  1.  In your first case, sample is applied to columns, and 
returns length 7 results, so the shape of the final result is c(7, 5). 
In the second case it is applied to rows, and returns length 5 results, 
so the shape is c(5, 7).


Duncan Murdoch

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Unexpected behavior of apply when FUN=sample

2013-05-14 Thread Enrico Schumann
On Tue, 14 May 2013, Luca Nanetti luca.nane...@gmail.com writes:

 Dear experts,

 I wanted to signal a peculiar, unexpected behaviour of 'apply'. It is not a
 bug, it is per spec, but it is so counterintuitive that I thought it could
 be interesting.

 I have an array, let's say test, dim=c(7,5).

 test - array(1:35, dim=c(7, 5))
 test

  [,1] [,2] [,3] [,4] [,5]
 [1,]18   15   22   29
 [2,]29   16   23   30
 [3,]3   10   17   24   31
 [4,]4   11   18   25   32
 [5,]5   12   19   26   33
 [6,]6   13   20   27   34
 [7,]7   14   21   28   35

 I want a new array where the content of the rows (columns) are permuted,
 differently per row (per column)

 Let's start with the columns, i.e. the second MARGIN of the array:
 test.m2 - apply(test, 2, sample)
 test.m2

  [,1] [,2] [,3] [,4] [,5]
 [1,]1   10   18   23   32
 [2,]79   16   25   30
 [3,]6   14   17   22   33
 [4,]4   11   15   24   34
 [5,]2   12   21   28   31
 [6,]58   20   26   29
 [7,]3   13   19   27   35

 perfect. That was exactly what I wanted: the content of each column is
 shuffled, and differently for each column.
 However, if I use the same with the rows (MARGIIN = 1), the output is
 transposed!

 test.m1 - apply(test, 1, sample)
 test.m1

  [,1] [,2] [,3] [,4] [,5] [,6] [,7]
 [1,]12345   13   21
 [2,]   22   30   17   18   19   20   35
 [3,]   15   23   24   32   26   27   14
 [4,]   29   16   31   25   33   34   28
 [5,]89   10   11   1267

 In other words, I wanted to permute the content of the rows of test, and
 I expected to see in the output, well, the shuffled rows as rows, not as
 column!

 I would respectfully suggest to make this behavior more explicit in the
 documentation.

As you said yourself, this behaviour is documented:

  If each call to ‘FUN’ returns a vector of length ‘n’, then ‘apply’
  returns an array of dimension ‘c(n, dim(X)[MARGIN])’ [...]

And it has nothing to do with 'sample'. Try:

  apply(test, 1, function(x) x)
  apply(test, 2, function(x) x)

The result is only counterintuitive (or inconvenient, perhaps) in the
special case in which apply is supposed to return an array that has the
same dimension as its input.  More generally, you will do something like 

  apply(test, 1, median)
  apply(test, 1, function(x) list(sum = sum(x), values = x))

and in such cases, apply does not return an array.


  
-- 
Enrico Schumann
Lucerne, Switzerland
http://enricoschumann.net

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Unexpected behavior of apply when FUN=sample

2013-05-14 Thread Rui Barradas

Hello,

The problem is that apply returns the results vector by vector and in R 
vectors are column vectors. This is not exclusive of apply with sample 
as the function to be called, but of apply in general. Try, for instance


apply(test, 1, identity)  # transposes the array

The rows are returned as column vectors. And you should expect this 
behavior from apply with MARGIN = 1.

And this is in fact documented, in the Value section of ?apply:

Value

If each call to FUN returns a vector of length n, then apply returns an 
array of dimension c(n, dim(X)[MARGIN]) if n  1.


The length of the returned vector is the number of rows and the number 
of columns is the dim corresponding to MARGIN...


Hope this helps,

Rui Barradas

Em 14-05-2013 09:52, Luca Nanetti escreveu:

Dear experts,

I wanted to signal a peculiar, unexpected behaviour of 'apply'. It is not a
bug, it is per spec, but it is so counterintuitive that I thought it could
be interesting.

I have an array, let's say test, dim=c(7,5).


test - array(1:35, dim=c(7, 5))
test


  [,1] [,2] [,3] [,4] [,5]
[1,]18   15   22   29
[2,]29   16   23   30
[3,]3   10   17   24   31
[4,]4   11   18   25   32
[5,]5   12   19   26   33
[6,]6   13   20   27   34
[7,]7   14   21   28   35

I want a new array where the content of the rows (columns) are permuted,
differently per row (per column)

Let's start with the columns, i.e. the second MARGIN of the array:

test.m2 - apply(test, 2, sample)
test.m2


  [,1] [,2] [,3] [,4] [,5]
[1,]1   10   18   23   32
[2,]79   16   25   30
[3,]6   14   17   22   33
[4,]4   11   15   24   34
[5,]2   12   21   28   31
[6,]58   20   26   29
[7,]3   13   19   27   35

perfect. That was exactly what I wanted: the content of each column is
shuffled, and differently for each column.
However, if I use the same with the rows (MARGIIN = 1), the output is
transposed!


test.m1 - apply(test, 1, sample)
test.m1


  [,1] [,2] [,3] [,4] [,5] [,6] [,7]
[1,]12345   13   21
[2,]   22   30   17   18   19   20   35
[3,]   15   23   24   32   26   27   14
[4,]   29   16   31   25   33   34   28
[5,]89   10   11   1267

In other words, I wanted to permute the content of the rows of test, and
I expected to see in the output, well, the shuffled rows as rows, not as
column!

I would respectfully suggest to make this behavior more explicit in the
documentation.

Kind regards,
Luca Nanetti



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Unexpected behavior of apply when FUN=sample

2013-05-14 Thread Ted Harding
On 14-May-2013 09:46:32 Duncan Murdoch wrote:
 On 13-05-14 4:52 AM, Luca Nanetti wrote:
 Dear experts,

 I wanted to signal a peculiar, unexpected behaviour of 'apply'.
 It is not a bug, it is per spec, but it is so counterintuitive
 that I thought it could be interesting.

 I have an array, let's say test, dim=c(7,5).

 test - array(1:35, dim=c(7, 5))
 test

   [,1] [,2] [,3] [,4] [,5]
 [1,]18   15   22   29
 [2,]29   16   23   30
 [3,]3   10   17   24   31
 [4,]4   11   18   25   32
 [5,]5   12   19   26   33
 [6,]6   13   20   27   34
 [7,]7   14   21   28   35

 I want a new array where the content of the rows (columns) are
 permuted, differently per row (per column)

 Let's start with the columns, i.e. the second MARGIN of the array:
 test.m2 - apply(test, 2, sample)
 test.m2

   [,1] [,2] [,3] [,4] [,5]
 [1,]1   10   18   23   32
 [2,]79   16   25   30
 [3,]6   14   17   22   33
 [4,]4   11   15   24   34
 [5,]2   12   21   28   31
 [6,]58   20   26   29
 [7,]3   13   19   27   35

 perfect. That was exactly what I wanted: the content of each column is
 shuffled, and differently for each column.
 However, if I use the same with the rows (MARGIIN = 1), the output is
 transposed!

 test.m1 - apply(test, 1, sample)
 test.m1

   [,1] [,2] [,3] [,4] [,5] [,6] [,7]
 [1,]12345   13   21
 [2,]   22   30   17   18   19   20   35
 [3,]   15   23   24   32   26   27   14
 [4,]   29   16   31   25   33   34   28
 [5,]89   10   11   1267

 In other words, I wanted to permute the content of the rows of test, and
 I expected to see in the output, well, the shuffled rows as rows, not as
 column!

 I would respectfully suggest to make this behavior more explicit in the
 documentation.
 
 It's is already very explicit:  If each call to FUN returns a vector of 
 length n, then apply returns an array of dimension c(n, dim(X)[MARGIN]) 
 if n  1.  In your first case, sample is applied to columns, and 
 returns length 7 results, so the shape of the final result is c(7, 5). 
 In the second case it is applied to rows, and returns length 5 results, 
 so the shape is c(5, 7).
 
 Duncan Murdoch

And the (quite simple) practical implication of what Duncan points out is:

  test - array(1:35, dim=c(7, 5))
  test
  #  [,1] [,2] [,3] [,4] [,5]
  # [1,]18   15   22   29
  # [2,]29   16   23   30
  # [3,]3   10   17   24   31
  # [4,]4   11   18   25   32
  # [5,]5   12   19   26   33
  # [6,]6   13   20   27   34
  # [7,]7   14   21   28   35

# To permute the rows:
  t(apply(t(test), 2, sample))
  #  [,1] [,2] [,3] [,4] [,5]
  # [1,]   22   298   151
  # [2,]   30   16   2329
  # [3,]   10   31   243   17
  # [4,]   114   25   32   18
  # [5,]   265   12   33   19
  # [6,]   27   34   20   136
  # [7,]   35   28   147   21

which looks right!
Ted.

-
E-Mail: (Ted Harding) ted.hard...@wlandres.net
Date: 14-May-2013  Time: 11:07:46
This message was sent by XFMail

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Unexpected behavior of apply when FUN=sample

2013-05-14 Thread Gabor Grothendieck
On Tue, May 14, 2013 at 4:52 AM, Luca Nanetti luca.nane...@gmail.com wrote:
 Dear experts,

 I wanted to signal a peculiar, unexpected behaviour of 'apply'. It is not a
 bug, it is per spec, but it is so counterintuitive that I thought it could
 be interesting.

 I have an array, let's say test, dim=c(7,5).

 test - array(1:35, dim=c(7, 5))
 test

  [,1] [,2] [,3] [,4] [,5]
 [1,]18   15   22   29
 [2,]29   16   23   30
 [3,]3   10   17   24   31
 [4,]4   11   18   25   32
 [5,]5   12   19   26   33
 [6,]6   13   20   27   34
 [7,]7   14   21   28   35

 I want a new array where the content of the rows (columns) are permuted,
 differently per row (per column)

 Let's start with the columns, i.e. the second MARGIN of the array:
 test.m2 - apply(test, 2, sample)
 test.m2

  [,1] [,2] [,3] [,4] [,5]
 [1,]1   10   18   23   32
 [2,]79   16   25   30
 [3,]6   14   17   22   33
 [4,]4   11   15   24   34
 [5,]2   12   21   28   31
 [6,]58   20   26   29
 [7,]3   13   19   27   35

 perfect. That was exactly what I wanted: the content of each column is
 shuffled, and differently for each column.
 However, if I use the same with the rows (MARGIIN = 1), the output is
 transposed!

 test.m1 - apply(test, 1, sample)
 test.m1

  [,1] [,2] [,3] [,4] [,5] [,6] [,7]
 [1,]12345   13   21
 [2,]   22   30   17   18   19   20   35
 [3,]   15   23   24   32   26   27   14
 [4,]   29   16   31   25   33   34   28
 [5,]89   10   11   1267

 In other words, I wanted to permute the content of the rows of test, and
 I expected to see in the output, well, the shuffled rows as rows, not as
 column!

 I would respectfully suggest to make this behavior more explicit in the
 documentation.

aaply in the plyr package works in the way you expected.

--
Statistics  Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Unexpected behavior of apply when FUN=sample

2013-05-14 Thread Tsjerk Wassenaar
t(apply(test,1,sample)) will also do.
As the OP noted, the results are simply transposed. So if an operation is
to be applied to rows, yielding modified rows, simply transpose the results.

Cheers,

Tsjerk


On Tue, May 14, 2013 at 12:07 PM, Ted Harding ted.hard...@wlandres.netwrote:

 On 14-May-2013 09:46:32 Duncan Murdoch wrote:
  On 13-05-14 4:52 AM, Luca Nanetti wrote:
  Dear experts,
 
  I wanted to signal a peculiar, unexpected behaviour of 'apply'.
  It is not a bug, it is per spec, but it is so counterintuitive
  that I thought it could be interesting.
 
  I have an array, let's say test, dim=c(7,5).
 
  test - array(1:35, dim=c(7, 5))
  test
 
[,1] [,2] [,3] [,4] [,5]
  [1,]18   15   22   29
  [2,]29   16   23   30
  [3,]3   10   17   24   31
  [4,]4   11   18   25   32
  [5,]5   12   19   26   33
  [6,]6   13   20   27   34
  [7,]7   14   21   28   35
 
  I want a new array where the content of the rows (columns) are
  permuted, differently per row (per column)
 
  Let's start with the columns, i.e. the second MARGIN of the array:
  test.m2 - apply(test, 2, sample)
  test.m2
 
[,1] [,2] [,3] [,4] [,5]
  [1,]1   10   18   23   32
  [2,]79   16   25   30
  [3,]6   14   17   22   33
  [4,]4   11   15   24   34
  [5,]2   12   21   28   31
  [6,]58   20   26   29
  [7,]3   13   19   27   35
 
  perfect. That was exactly what I wanted: the content of each column is
  shuffled, and differently for each column.
  However, if I use the same with the rows (MARGIIN = 1), the output is
  transposed!
 
  test.m1 - apply(test, 1, sample)
  test.m1
 
[,1] [,2] [,3] [,4] [,5] [,6] [,7]
  [1,]12345   13   21
  [2,]   22   30   17   18   19   20   35
  [3,]   15   23   24   32   26   27   14
  [4,]   29   16   31   25   33   34   28
  [5,]89   10   11   1267
 
  In other words, I wanted to permute the content of the rows of test,
 and
  I expected to see in the output, well, the shuffled rows as rows, not as
  column!
 
  I would respectfully suggest to make this behavior more explicit in the
  documentation.
 
  It's is already very explicit:  If each call to FUN returns a vector of
  length n, then apply returns an array of dimension c(n, dim(X)[MARGIN])
  if n  1.  In your first case, sample is applied to columns, and
  returns length 7 results, so the shape of the final result is c(7, 5).
  In the second case it is applied to rows, and returns length 5 results,
  so the shape is c(5, 7).
 
  Duncan Murdoch

 And the (quite simple) practical implication of what Duncan points out is:

   test - array(1:35, dim=c(7, 5))
   test
   #  [,1] [,2] [,3] [,4] [,5]
   # [1,]18   15   22   29
   # [2,]29   16   23   30
   # [3,]3   10   17   24   31
   # [4,]4   11   18   25   32
   # [5,]5   12   19   26   33
   # [6,]6   13   20   27   34
   # [7,]7   14   21   28   35

 # To permute the rows:
   t(apply(t(test), 2, sample))
   #  [,1] [,2] [,3] [,4] [,5]
   # [1,]   22   298   151
   # [2,]   30   16   2329
   # [3,]   10   31   243   17
   # [4,]   114   25   32   18
   # [5,]   265   12   33   19
   # [6,]   27   34   20   136
   # [7,]   35   28   147   21

 which looks right!
 Ted.

 -
 E-Mail: (Ted Harding) ted.hard...@wlandres.net
 Date: 14-May-2013  Time: 11:07:46
 This message was sent by XFMail

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Tsjerk A. Wassenaar, Ph.D.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Unexpected behavior of apply when FUN=sample

2013-05-14 Thread Patrick Burns

This is Circle 8.1.47 of 'The R Inferno'.

http://www.burns-stat.com/documents/books/the-r-inferno/

Pat


On 14/05/2013 09:52, Luca Nanetti wrote:

Dear experts,

I wanted to signal a peculiar, unexpected behaviour of 'apply'. It is not a
bug, it is per spec, but it is so counterintuitive that I thought it could
be interesting.

I have an array, let's say test, dim=c(7,5).


test - array(1:35, dim=c(7, 5))
test


  [,1] [,2] [,3] [,4] [,5]
[1,]18   15   22   29
[2,]29   16   23   30
[3,]3   10   17   24   31
[4,]4   11   18   25   32
[5,]5   12   19   26   33
[6,]6   13   20   27   34
[7,]7   14   21   28   35

I want a new array where the content of the rows (columns) are permuted,
differently per row (per column)

Let's start with the columns, i.e. the second MARGIN of the array:

test.m2 - apply(test, 2, sample)
test.m2


  [,1] [,2] [,3] [,4] [,5]
[1,]1   10   18   23   32
[2,]79   16   25   30
[3,]6   14   17   22   33
[4,]4   11   15   24   34
[5,]2   12   21   28   31
[6,]58   20   26   29
[7,]3   13   19   27   35

perfect. That was exactly what I wanted: the content of each column is
shuffled, and differently for each column.
However, if I use the same with the rows (MARGIIN = 1), the output is
transposed!


test.m1 - apply(test, 1, sample)
test.m1


  [,1] [,2] [,3] [,4] [,5] [,6] [,7]
[1,]12345   13   21
[2,]   22   30   17   18   19   20   35
[3,]   15   23   24   32   26   27   14
[4,]   29   16   31   25   33   34   28
[5,]89   10   11   1267

In other words, I wanted to permute the content of the rows of test, and
I expected to see in the output, well, the shuffled rows as rows, not as
column!

I would respectfully suggest to make this behavior more explicit in the
documentation.

Kind regards,
Luca Nanetti



--
Patrick Burns
pbu...@pburns.seanet.com
twitter: @burnsstat @portfolioprobe
http://www.portfolioprobe.com/blog
http://www.burns-stat.com
(home of:
 'Impatient R'
 'The R Inferno'
 'Tao Te Programming')

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.