Re: [R] unexpected behavior in apply
If an oven expects fried potatoes and I put a cake in, I would hope it complains or does nothing rather than surreptitiously poisoning my cake. Jiefei's finding that "6" becomes " 6" during matrix coercion (apparently for aesthetic reasons only) feels more like the latter. But I appreciate the explanation and the solutions. -Original Message- From: PIKAL Petr Sent: Monday, October 11, 2021 5:15 AM To: Jiefei Wang ; Derickson, Ryan, VHA NCOD Cc: r-help@r-project.org Subject: [EXTERNAL] RE: [R] unexpected behavior in apply Hi it is not surprising at all. from apply documentation Arguments X an array, including a matrix. data.frame is not matrix or array (even if it rather resembles one) So if you put a cake into oven you cannot expect getting fried potatoes from it. For data frames sapply or lapply is preferable as it is designed for lists and data frame is (again from documentation) A data frame is a list of variables of the same number of rows with unique row names, given class "data.frame". > sapply(d,function(x) all(x[!is.na(x)]<=3)) d1d2d3 FALSE TRUE FALSE Cheers Petr > -Original Message- > From: R-help On Behalf Of Jiefei Wang > Sent: Friday, October 8, 2021 8:22 PM > To: Derickson, Ryan, VHA NCOD > Cc: r-help@r-project.org > Subject: Re: [R] unexpected behavior in apply > > Ok, it turns out that this is documented, even though it looks surprising. > > First of all, the apply function will try to convert any object with the dim > attribute to a matrix(my intuition agrees with you that there should be no > conversion), so the first step of the apply function is > > > as.matrix.data.frame(d) > d1 d2 d3 > [1,] "a" "1" NA > [2,] "b" "2" NA > [3,] "c" "3" " 6" > > Since the data frame `d` is a mixture of character and non-character values, > the non-character value will be converted to the character using the function > `format`. However, the problem is that the NA value will also be formatted to > the character > > > format(c(NA, 6)) > [1] "NA" " 6" > > That's where the space comes from. It is purely for making the result pretty... > The character NA will be removed later, but the space is not stripped. I would > say this is not a good design, and it might be worth not including the NA value > in the format function. At the current stage, I will suggest using the function > `lapply` to do what you want. > > > lapply(d, FUN=function(x)all(x[!is.na(x)] <= 3)) > $d1 > [1] FALSE > $d2 > [1] TRUE > $d3 > [1] FALSE > > Everything should work as you expect. > > Best, > Jiefei > > On Sat, Oct 9, 2021 at 2:03 AM Jiefei Wang wrote: > > > > Hi, > > > > I guess this can tell you what happens behind the scene > > > > > > > d<-data.frame(d1 = letters[1:3], > > + d2 = c(1,2,3), > > + d3 = c(NA,NA,6)) > > > apply(d, 2, FUN=function(x)x) > > d1 d2 d3 > > [1,] "a" "1" NA > > [2,] "b" "2" NA > > [3,] "c" "3" " 6" > > > "a"<=3 > > [1] FALSE > > > "2"<=3 > > [1] TRUE > > > "6"<=3 > > [1] FALSE > > > > Note that there is an additional space in the character value " 6", > > that's why your comparison fails. I do not understand why but this > > might be a bug in R > > > > Best, > > Jiefei > > > > On Sat, Oct 9, 2021 at 1:49 AM Derickson, Ryan, VHA NCOD via R-help > > wrote: > > > > > > Hello, > > > > > > I'm seeing unexpected behavior when using apply() compared to a for > loop when a character vector is part of the data subjected to the apply > statement. Below, I check whether all non-missing values are <= 3. If I > include a character column, apply incorrectly returns TRUE for d3. If I only > pass the numeric columns to apply, it is correct for d3. If I use a for loop, it is > correct. > > > > > > > d<-data.frame(d1 = letters[1:3], > > > + d2 = c(1,2,3), > > > + d3 = c(NA,NA,6)) > > > > > > > > d > > > d1 d2 d3 > > > 1 a 1 NA > > > 2 b 2 NA > > > 3 c 3 6 > > > > > > > > # results are incorrect > > > > apply(d, 2, FUN=function(x)all(x[!is.na(x)] <= 3)) > > >d1d2d3 > > > FALSE TRUE TRUE > > > > > >
Re: [R] unexpected behavior in apply
On Mon, 11 Oct 2021 09:15:27 + PIKAL Petr wrote: > > data.frame is not matrix or array (even if it rather resembles one) > > So if you put a cake into oven you cannot expect getting fried > potatoes from it. Another fortune nomination! cheers, Rolf -- Honorary Research Fellow Department of Statistics University of Auckland Phone: +64-9-373-7599 ext. 88276 __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] unexpected behavior in apply
Hi it is not surprising at all. from apply documentation Arguments X an array, including a matrix. data.frame is not matrix or array (even if it rather resembles one) So if you put a cake into oven you cannot expect getting fried potatoes from it. For data frames sapply or lapply is preferable as it is designed for lists and data frame is (again from documentation) A data frame is a list of variables of the same number of rows with unique row names, given class "data.frame". > sapply(d,function(x) all(x[!is.na(x)]<=3)) d1d2d3 FALSE TRUE FALSE Cheers Petr > -Original Message- > From: R-help On Behalf Of Jiefei Wang > Sent: Friday, October 8, 2021 8:22 PM > To: Derickson, Ryan, VHA NCOD > Cc: r-help@r-project.org > Subject: Re: [R] unexpected behavior in apply > > Ok, it turns out that this is documented, even though it looks surprising. > > First of all, the apply function will try to convert any object with the dim > attribute to a matrix(my intuition agrees with you that there should be no > conversion), so the first step of the apply function is > > > as.matrix.data.frame(d) > d1 d2 d3 > [1,] "a" "1" NA > [2,] "b" "2" NA > [3,] "c" "3" " 6" > > Since the data frame `d` is a mixture of character and non-character values, > the non-character value will be converted to the character using the function > `format`. However, the problem is that the NA value will also be formatted to > the character > > > format(c(NA, 6)) > [1] "NA" " 6" > > That's where the space comes from. It is purely for making the result pretty... > The character NA will be removed later, but the space is not stripped. I would > say this is not a good design, and it might be worth not including the NA value > in the format function. At the current stage, I will suggest using the function > `lapply` to do what you want. > > > lapply(d, FUN=function(x)all(x[!is.na(x)] <= 3)) > $d1 > [1] FALSE > $d2 > [1] TRUE > $d3 > [1] FALSE > > Everything should work as you expect. > > Best, > Jiefei > > On Sat, Oct 9, 2021 at 2:03 AM Jiefei Wang wrote: > > > > Hi, > > > > I guess this can tell you what happens behind the scene > > > > > > > d<-data.frame(d1 = letters[1:3], > > + d2 = c(1,2,3), > > + d3 = c(NA,NA,6)) > > > apply(d, 2, FUN=function(x)x) > > d1 d2 d3 > > [1,] "a" "1" NA > > [2,] "b" "2" NA > > [3,] "c" "3" " 6" > > > "a"<=3 > > [1] FALSE > > > "2"<=3 > > [1] TRUE > > > "6"<=3 > > [1] FALSE > > > > Note that there is an additional space in the character value " 6", > > that's why your comparison fails. I do not understand why but this > > might be a bug in R > > > > Best, > > Jiefei > > > > On Sat, Oct 9, 2021 at 1:49 AM Derickson, Ryan, VHA NCOD via R-help > > wrote: > > > > > > Hello, > > > > > > I'm seeing unexpected behavior when using apply() compared to a for > loop when a character vector is part of the data subjected to the apply > statement. Below, I check whether all non-missing values are <= 3. If I > include a character column, apply incorrectly returns TRUE for d3. If I only > pass the numeric columns to apply, it is correct for d3. If I use a for loop, it is > correct. > > > > > > > d<-data.frame(d1 = letters[1:3], > > > + d2 = c(1,2,3), > > > + d3 = c(NA,NA,6)) > > > > > > > > d > > > d1 d2 d3 > > > 1 a 1 NA > > > 2 b 2 NA > > > 3 c 3 6 > > > > > > > > # results are incorrect > > > > apply(d, 2, FUN=function(x)all(x[!is.na(x)] <= 3)) > > >d1d2d3 > > > FALSE TRUE TRUE > > > > > > > > # results are correct > > > > apply(d[,2:3], 2, FUN=function(x)all(x[!is.na(x)] <= 3)) > > >d2d3 > > > TRUE FALSE > > > > > > > > # results are correct > > > > for(i in names(d)){ > > > + print(all(d[!is.na(d[,i]),i] <= 3)) } > > > [1] FALSE > > > [1] TRUE > > > [1] FALSE > > > > > > > > > Finally, if I remove the NA values from d3 and include the character > column in apply, it is correct. > > > > > > > d<-data.frame(d1 = let
Re: [R] unexpected behavior in apply
Ok, it turns out that this is documented, even though it looks surprising. First of all, the apply function will try to convert any object with the dim attribute to a matrix(my intuition agrees with you that there should be no conversion), so the first step of the apply function is > as.matrix.data.frame(d) d1 d2 d3 [1,] "a" "1" NA [2,] "b" "2" NA [3,] "c" "3" " 6" Since the data frame `d` is a mixture of character and non-character values, the non-character value will be converted to the character using the function `format`. However, the problem is that the NA value will also be formatted to the character > format(c(NA, 6)) [1] "NA" " 6" That's where the space comes from. It is purely for making the result pretty... The character NA will be removed later, but the space is not stripped. I would say this is not a good design, and it might be worth not including the NA value in the format function. At the current stage, I will suggest using the function `lapply` to do what you want. > lapply(d, FUN=function(x)all(x[!is.na(x)] <= 3)) $d1 [1] FALSE $d2 [1] TRUE $d3 [1] FALSE Everything should work as you expect. Best, Jiefei On Sat, Oct 9, 2021 at 2:03 AM Jiefei Wang wrote: > > Hi, > > I guess this can tell you what happens behind the scene > > > > d<-data.frame(d1 = letters[1:3], > + d2 = c(1,2,3), > + d3 = c(NA,NA,6)) > > apply(d, 2, FUN=function(x)x) > d1 d2 d3 > [1,] "a" "1" NA > [2,] "b" "2" NA > [3,] "c" "3" " 6" > > "a"<=3 > [1] FALSE > > "2"<=3 > [1] TRUE > > "6"<=3 > [1] FALSE > > Note that there is an additional space in the character value " 6", > that's why your comparison fails. I do not understand why but this > might be a bug in R > > Best, > Jiefei > > On Sat, Oct 9, 2021 at 1:49 AM Derickson, Ryan, VHA NCOD via R-help > wrote: > > > > Hello, > > > > I'm seeing unexpected behavior when using apply() compared to a for loop > > when a character vector is part of the data subjected to the apply > > statement. Below, I check whether all non-missing values are <= 3. If I > > include a character column, apply incorrectly returns TRUE for d3. If I > > only pass the numeric columns to apply, it is correct for d3. If I use a > > for loop, it is correct. > > > > > d<-data.frame(d1 = letters[1:3], > > + d2 = c(1,2,3), > > + d3 = c(NA,NA,6)) > > > > > > d > > d1 d2 d3 > > 1 a 1 NA > > 2 b 2 NA > > 3 c 3 6 > > > > > > # results are incorrect > > > apply(d, 2, FUN=function(x)all(x[!is.na(x)] <= 3)) > >d1d2d3 > > FALSE TRUE TRUE > > > > > > # results are correct > > > apply(d[,2:3], 2, FUN=function(x)all(x[!is.na(x)] <= 3)) > >d2d3 > > TRUE FALSE > > > > > > # results are correct > > > for(i in names(d)){ > > + print(all(d[!is.na(d[,i]),i] <= 3)) > > + } > > [1] FALSE > > [1] TRUE > > [1] FALSE > > > > > > Finally, if I remove the NA values from d3 and include the character column > > in apply, it is correct. > > > > > d<-data.frame(d1 = letters[1:3], > > + d2 = c(1,2,3), > > + d3 = c(4,5,6)) > > > > > > d > > d1 d2 d3 > > 1 a 1 4 > > 2 b 2 5 > > 3 c 3 6 > > > > > > # results are correct > > > apply(d, 2, FUN=function(x)all(x[!is.na(x)] <= 3)) > >d1d2d3 > > FALSE TRUE FALSE > > > > > > Can someone help me understand what's happening? > > > > __ > > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] unexpected behavior in apply
Hello, The issue comes that 'apply' tries to coerce its argument to a matrix. This means that all your columns will become character class, and the result will not be what you wanted. I would suggest something more like: sapply(d, function(x) all(x[!is.na(x)] <= 3)) or vapply(d, function(x) all(x[!is.na(x)] <= 3), NA) Also, here is a different method that might look cleaner: sapply(d, function(x) all(x <= 3, na.rm = TRUE)) vapply(d, function(x) all(x <= 3, na.rm = TRUE), NA) It's up to you which you choose. I hope this helps! On Fri, Oct 8, 2021 at 1:50 PM Derickson, Ryan, VHA NCOD via R-help < r-help@r-project.org> wrote: > Hello, > > I'm seeing unexpected behavior when using apply() compared to a for loop > when a character vector is part of the data subjected to the apply > statement. Below, I check whether all non-missing values are <= 3. If I > include a character column, apply incorrectly returns TRUE for d3. If I > only pass the numeric columns to apply, it is correct for d3. If I use a > for loop, it is correct. > > > d<-data.frame(d1 = letters[1:3], > + d2 = c(1,2,3), > + d3 = c(NA,NA,6)) > > > > d > d1 d2 d3 > 1 a 1 NA > 2 b 2 NA > 3 c 3 6 > > > > # results are incorrect > > apply(d, 2, FUN=function(x)all(x[!is.na(x)] <= 3)) >d1d2d3 > FALSE TRUE TRUE > > > > # results are correct > > apply(d[,2:3], 2, FUN=function(x)all(x[!is.na(x)] <= 3)) >d2d3 > TRUE FALSE > > > > # results are correct > > for(i in names(d)){ > + print(all(d[!is.na(d[,i]),i] <= 3)) > + } > [1] FALSE > [1] TRUE > [1] FALSE > > > Finally, if I remove the NA values from d3 and include the character > column in apply, it is correct. > > > d<-data.frame(d1 = letters[1:3], > + d2 = c(1,2,3), > + d3 = c(4,5,6)) > > > > d > d1 d2 d3 > 1 a 1 4 > 2 b 2 5 > 3 c 3 6 > > > > # results are correct > > apply(d, 2, FUN=function(x)all(x[!is.na(x)] <= 3)) >d1d2d3 > FALSE TRUE FALSE > > > Can someone help me understand what's happening? > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] unexpected behavior in apply
Hi, I guess this can tell you what happens behind the scene > d<-data.frame(d1 = letters[1:3], + d2 = c(1,2,3), + d3 = c(NA,NA,6)) > apply(d, 2, FUN=function(x)x) d1 d2 d3 [1,] "a" "1" NA [2,] "b" "2" NA [3,] "c" "3" " 6" > "a"<=3 [1] FALSE > "2"<=3 [1] TRUE > "6"<=3 [1] FALSE Note that there is an additional space in the character value " 6", that's why your comparison fails. I do not understand why but this might be a bug in R Best, Jiefei On Sat, Oct 9, 2021 at 1:49 AM Derickson, Ryan, VHA NCOD via R-help wrote: > > Hello, > > I'm seeing unexpected behavior when using apply() compared to a for loop when > a character vector is part of the data subjected to the apply statement. > Below, I check whether all non-missing values are <= 3. If I include a > character column, apply incorrectly returns TRUE for d3. If I only pass the > numeric columns to apply, it is correct for d3. If I use a for loop, it is > correct. > > > d<-data.frame(d1 = letters[1:3], > + d2 = c(1,2,3), > + d3 = c(NA,NA,6)) > > > > d > d1 d2 d3 > 1 a 1 NA > 2 b 2 NA > 3 c 3 6 > > > > # results are incorrect > > apply(d, 2, FUN=function(x)all(x[!is.na(x)] <= 3)) >d1d2d3 > FALSE TRUE TRUE > > > > # results are correct > > apply(d[,2:3], 2, FUN=function(x)all(x[!is.na(x)] <= 3)) >d2d3 > TRUE FALSE > > > > # results are correct > > for(i in names(d)){ > + print(all(d[!is.na(d[,i]),i] <= 3)) > + } > [1] FALSE > [1] TRUE > [1] FALSE > > > Finally, if I remove the NA values from d3 and include the character column > in apply, it is correct. > > > d<-data.frame(d1 = letters[1:3], > + d2 = c(1,2,3), > + d3 = c(4,5,6)) > > > > d > d1 d2 d3 > 1 a 1 4 > 2 b 2 5 > 3 c 3 6 > > > > # results are correct > > apply(d, 2, FUN=function(x)all(x[!is.na(x)] <= 3)) >d1d2d3 > FALSE TRUE FALSE > > > Can someone help me understand what's happening? > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] unexpected behavior in apply
Hello, I'm seeing unexpected behavior when using apply() compared to a for loop when a character vector is part of the data subjected to the apply statement. Below, I check whether all non-missing values are <= 3. If I include a character column, apply incorrectly returns TRUE for d3. If I only pass the numeric columns to apply, it is correct for d3. If I use a for loop, it is correct. > d<-data.frame(d1 = letters[1:3], + d2 = c(1,2,3), + d3 = c(NA,NA,6)) > > d d1 d2 d3 1 a 1 NA 2 b 2 NA 3 c 3 6 > > # results are incorrect > apply(d, 2, FUN=function(x)all(x[!is.na(x)] <= 3)) d1d2d3 FALSE TRUE TRUE > > # results are correct > apply(d[,2:3], 2, FUN=function(x)all(x[!is.na(x)] <= 3)) d2d3 TRUE FALSE > > # results are correct > for(i in names(d)){ + print(all(d[!is.na(d[,i]),i] <= 3)) + } [1] FALSE [1] TRUE [1] FALSE Finally, if I remove the NA values from d3 and include the character column in apply, it is correct. > d<-data.frame(d1 = letters[1:3], + d2 = c(1,2,3), + d3 = c(4,5,6)) > > d d1 d2 d3 1 a 1 4 2 b 2 5 3 c 3 6 > > # results are correct > apply(d, 2, FUN=function(x)all(x[!is.na(x)] <= 3)) d1d2d3 FALSE TRUE FALSE Can someone help me understand what's happening? __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Unexpected behavior of apply when FUN=sample
Dear experts, I wanted to signal a peculiar, unexpected behaviour of 'apply'. It is not a bug, it is per spec, but it is so counterintuitive that I thought it could be interesting. I have an array, let's say test, dim=c(7,5). test - array(1:35, dim=c(7, 5)) test [,1] [,2] [,3] [,4] [,5] [1,]18 15 22 29 [2,]29 16 23 30 [3,]3 10 17 24 31 [4,]4 11 18 25 32 [5,]5 12 19 26 33 [6,]6 13 20 27 34 [7,]7 14 21 28 35 I want a new array where the content of the rows (columns) are permuted, differently per row (per column) Let's start with the columns, i.e. the second MARGIN of the array: test.m2 - apply(test, 2, sample) test.m2 [,1] [,2] [,3] [,4] [,5] [1,]1 10 18 23 32 [2,]79 16 25 30 [3,]6 14 17 22 33 [4,]4 11 15 24 34 [5,]2 12 21 28 31 [6,]58 20 26 29 [7,]3 13 19 27 35 perfect. That was exactly what I wanted: the content of each column is shuffled, and differently for each column. However, if I use the same with the rows (MARGIIN = 1), the output is transposed! test.m1 - apply(test, 1, sample) test.m1 [,1] [,2] [,3] [,4] [,5] [,6] [,7] [1,]12345 13 21 [2,] 22 30 17 18 19 20 35 [3,] 15 23 24 32 26 27 14 [4,] 29 16 31 25 33 34 28 [5,]89 10 11 1267 In other words, I wanted to permute the content of the rows of test, and I expected to see in the output, well, the shuffled rows as rows, not as column! I would respectfully suggest to make this behavior more explicit in the documentation. Kind regards, Luca Nanetti -- __ Luca Nanetti, MSc, MRI University Medical Center Groningen Neuroimaging Center Groningen Groningen, The Netherlands Tel: +31 50 363 4733 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Unexpected behavior of apply when FUN=sample
On 13-05-14 4:52 AM, Luca Nanetti wrote: Dear experts, I wanted to signal a peculiar, unexpected behaviour of 'apply'. It is not a bug, it is per spec, but it is so counterintuitive that I thought it could be interesting. I have an array, let's say test, dim=c(7,5). test - array(1:35, dim=c(7, 5)) test [,1] [,2] [,3] [,4] [,5] [1,]18 15 22 29 [2,]29 16 23 30 [3,]3 10 17 24 31 [4,]4 11 18 25 32 [5,]5 12 19 26 33 [6,]6 13 20 27 34 [7,]7 14 21 28 35 I want a new array where the content of the rows (columns) are permuted, differently per row (per column) Let's start with the columns, i.e. the second MARGIN of the array: test.m2 - apply(test, 2, sample) test.m2 [,1] [,2] [,3] [,4] [,5] [1,]1 10 18 23 32 [2,]79 16 25 30 [3,]6 14 17 22 33 [4,]4 11 15 24 34 [5,]2 12 21 28 31 [6,]58 20 26 29 [7,]3 13 19 27 35 perfect. That was exactly what I wanted: the content of each column is shuffled, and differently for each column. However, if I use the same with the rows (MARGIIN = 1), the output is transposed! test.m1 - apply(test, 1, sample) test.m1 [,1] [,2] [,3] [,4] [,5] [,6] [,7] [1,]12345 13 21 [2,] 22 30 17 18 19 20 35 [3,] 15 23 24 32 26 27 14 [4,] 29 16 31 25 33 34 28 [5,]89 10 11 1267 In other words, I wanted to permute the content of the rows of test, and I expected to see in the output, well, the shuffled rows as rows, not as column! I would respectfully suggest to make this behavior more explicit in the documentation. It's is already very explicit: If each call to FUN returns a vector of length n, then apply returns an array of dimension c(n, dim(X)[MARGIN]) if n 1. In your first case, sample is applied to columns, and returns length 7 results, so the shape of the final result is c(7, 5). In the second case it is applied to rows, and returns length 5 results, so the shape is c(5, 7). Duncan Murdoch __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Unexpected behavior of apply when FUN=sample
On Tue, 14 May 2013, Luca Nanetti luca.nane...@gmail.com writes: Dear experts, I wanted to signal a peculiar, unexpected behaviour of 'apply'. It is not a bug, it is per spec, but it is so counterintuitive that I thought it could be interesting. I have an array, let's say test, dim=c(7,5). test - array(1:35, dim=c(7, 5)) test [,1] [,2] [,3] [,4] [,5] [1,]18 15 22 29 [2,]29 16 23 30 [3,]3 10 17 24 31 [4,]4 11 18 25 32 [5,]5 12 19 26 33 [6,]6 13 20 27 34 [7,]7 14 21 28 35 I want a new array where the content of the rows (columns) are permuted, differently per row (per column) Let's start with the columns, i.e. the second MARGIN of the array: test.m2 - apply(test, 2, sample) test.m2 [,1] [,2] [,3] [,4] [,5] [1,]1 10 18 23 32 [2,]79 16 25 30 [3,]6 14 17 22 33 [4,]4 11 15 24 34 [5,]2 12 21 28 31 [6,]58 20 26 29 [7,]3 13 19 27 35 perfect. That was exactly what I wanted: the content of each column is shuffled, and differently for each column. However, if I use the same with the rows (MARGIIN = 1), the output is transposed! test.m1 - apply(test, 1, sample) test.m1 [,1] [,2] [,3] [,4] [,5] [,6] [,7] [1,]12345 13 21 [2,] 22 30 17 18 19 20 35 [3,] 15 23 24 32 26 27 14 [4,] 29 16 31 25 33 34 28 [5,]89 10 11 1267 In other words, I wanted to permute the content of the rows of test, and I expected to see in the output, well, the shuffled rows as rows, not as column! I would respectfully suggest to make this behavior more explicit in the documentation. As you said yourself, this behaviour is documented: If each call to ‘FUN’ returns a vector of length ‘n’, then ‘apply’ returns an array of dimension ‘c(n, dim(X)[MARGIN])’ [...] And it has nothing to do with 'sample'. Try: apply(test, 1, function(x) x) apply(test, 2, function(x) x) The result is only counterintuitive (or inconvenient, perhaps) in the special case in which apply is supposed to return an array that has the same dimension as its input. More generally, you will do something like apply(test, 1, median) apply(test, 1, function(x) list(sum = sum(x), values = x)) and in such cases, apply does not return an array. -- Enrico Schumann Lucerne, Switzerland http://enricoschumann.net __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Unexpected behavior of apply when FUN=sample
Hello, The problem is that apply returns the results vector by vector and in R vectors are column vectors. This is not exclusive of apply with sample as the function to be called, but of apply in general. Try, for instance apply(test, 1, identity) # transposes the array The rows are returned as column vectors. And you should expect this behavior from apply with MARGIN = 1. And this is in fact documented, in the Value section of ?apply: Value If each call to FUN returns a vector of length n, then apply returns an array of dimension c(n, dim(X)[MARGIN]) if n 1. The length of the returned vector is the number of rows and the number of columns is the dim corresponding to MARGIN... Hope this helps, Rui Barradas Em 14-05-2013 09:52, Luca Nanetti escreveu: Dear experts, I wanted to signal a peculiar, unexpected behaviour of 'apply'. It is not a bug, it is per spec, but it is so counterintuitive that I thought it could be interesting. I have an array, let's say test, dim=c(7,5). test - array(1:35, dim=c(7, 5)) test [,1] [,2] [,3] [,4] [,5] [1,]18 15 22 29 [2,]29 16 23 30 [3,]3 10 17 24 31 [4,]4 11 18 25 32 [5,]5 12 19 26 33 [6,]6 13 20 27 34 [7,]7 14 21 28 35 I want a new array where the content of the rows (columns) are permuted, differently per row (per column) Let's start with the columns, i.e. the second MARGIN of the array: test.m2 - apply(test, 2, sample) test.m2 [,1] [,2] [,3] [,4] [,5] [1,]1 10 18 23 32 [2,]79 16 25 30 [3,]6 14 17 22 33 [4,]4 11 15 24 34 [5,]2 12 21 28 31 [6,]58 20 26 29 [7,]3 13 19 27 35 perfect. That was exactly what I wanted: the content of each column is shuffled, and differently for each column. However, if I use the same with the rows (MARGIIN = 1), the output is transposed! test.m1 - apply(test, 1, sample) test.m1 [,1] [,2] [,3] [,4] [,5] [,6] [,7] [1,]12345 13 21 [2,] 22 30 17 18 19 20 35 [3,] 15 23 24 32 26 27 14 [4,] 29 16 31 25 33 34 28 [5,]89 10 11 1267 In other words, I wanted to permute the content of the rows of test, and I expected to see in the output, well, the shuffled rows as rows, not as column! I would respectfully suggest to make this behavior more explicit in the documentation. Kind regards, Luca Nanetti __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Unexpected behavior of apply when FUN=sample
On 14-May-2013 09:46:32 Duncan Murdoch wrote: On 13-05-14 4:52 AM, Luca Nanetti wrote: Dear experts, I wanted to signal a peculiar, unexpected behaviour of 'apply'. It is not a bug, it is per spec, but it is so counterintuitive that I thought it could be interesting. I have an array, let's say test, dim=c(7,5). test - array(1:35, dim=c(7, 5)) test [,1] [,2] [,3] [,4] [,5] [1,]18 15 22 29 [2,]29 16 23 30 [3,]3 10 17 24 31 [4,]4 11 18 25 32 [5,]5 12 19 26 33 [6,]6 13 20 27 34 [7,]7 14 21 28 35 I want a new array where the content of the rows (columns) are permuted, differently per row (per column) Let's start with the columns, i.e. the second MARGIN of the array: test.m2 - apply(test, 2, sample) test.m2 [,1] [,2] [,3] [,4] [,5] [1,]1 10 18 23 32 [2,]79 16 25 30 [3,]6 14 17 22 33 [4,]4 11 15 24 34 [5,]2 12 21 28 31 [6,]58 20 26 29 [7,]3 13 19 27 35 perfect. That was exactly what I wanted: the content of each column is shuffled, and differently for each column. However, if I use the same with the rows (MARGIIN = 1), the output is transposed! test.m1 - apply(test, 1, sample) test.m1 [,1] [,2] [,3] [,4] [,5] [,6] [,7] [1,]12345 13 21 [2,] 22 30 17 18 19 20 35 [3,] 15 23 24 32 26 27 14 [4,] 29 16 31 25 33 34 28 [5,]89 10 11 1267 In other words, I wanted to permute the content of the rows of test, and I expected to see in the output, well, the shuffled rows as rows, not as column! I would respectfully suggest to make this behavior more explicit in the documentation. It's is already very explicit: If each call to FUN returns a vector of length n, then apply returns an array of dimension c(n, dim(X)[MARGIN]) if n 1. In your first case, sample is applied to columns, and returns length 7 results, so the shape of the final result is c(7, 5). In the second case it is applied to rows, and returns length 5 results, so the shape is c(5, 7). Duncan Murdoch And the (quite simple) practical implication of what Duncan points out is: test - array(1:35, dim=c(7, 5)) test # [,1] [,2] [,3] [,4] [,5] # [1,]18 15 22 29 # [2,]29 16 23 30 # [3,]3 10 17 24 31 # [4,]4 11 18 25 32 # [5,]5 12 19 26 33 # [6,]6 13 20 27 34 # [7,]7 14 21 28 35 # To permute the rows: t(apply(t(test), 2, sample)) # [,1] [,2] [,3] [,4] [,5] # [1,] 22 298 151 # [2,] 30 16 2329 # [3,] 10 31 243 17 # [4,] 114 25 32 18 # [5,] 265 12 33 19 # [6,] 27 34 20 136 # [7,] 35 28 147 21 which looks right! Ted. - E-Mail: (Ted Harding) ted.hard...@wlandres.net Date: 14-May-2013 Time: 11:07:46 This message was sent by XFMail __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Unexpected behavior of apply when FUN=sample
On Tue, May 14, 2013 at 4:52 AM, Luca Nanetti luca.nane...@gmail.com wrote: Dear experts, I wanted to signal a peculiar, unexpected behaviour of 'apply'. It is not a bug, it is per spec, but it is so counterintuitive that I thought it could be interesting. I have an array, let's say test, dim=c(7,5). test - array(1:35, dim=c(7, 5)) test [,1] [,2] [,3] [,4] [,5] [1,]18 15 22 29 [2,]29 16 23 30 [3,]3 10 17 24 31 [4,]4 11 18 25 32 [5,]5 12 19 26 33 [6,]6 13 20 27 34 [7,]7 14 21 28 35 I want a new array where the content of the rows (columns) are permuted, differently per row (per column) Let's start with the columns, i.e. the second MARGIN of the array: test.m2 - apply(test, 2, sample) test.m2 [,1] [,2] [,3] [,4] [,5] [1,]1 10 18 23 32 [2,]79 16 25 30 [3,]6 14 17 22 33 [4,]4 11 15 24 34 [5,]2 12 21 28 31 [6,]58 20 26 29 [7,]3 13 19 27 35 perfect. That was exactly what I wanted: the content of each column is shuffled, and differently for each column. However, if I use the same with the rows (MARGIIN = 1), the output is transposed! test.m1 - apply(test, 1, sample) test.m1 [,1] [,2] [,3] [,4] [,5] [,6] [,7] [1,]12345 13 21 [2,] 22 30 17 18 19 20 35 [3,] 15 23 24 32 26 27 14 [4,] 29 16 31 25 33 34 28 [5,]89 10 11 1267 In other words, I wanted to permute the content of the rows of test, and I expected to see in the output, well, the shuffled rows as rows, not as column! I would respectfully suggest to make this behavior more explicit in the documentation. aaply in the plyr package works in the way you expected. -- Statistics Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Unexpected behavior of apply when FUN=sample
t(apply(test,1,sample)) will also do. As the OP noted, the results are simply transposed. So if an operation is to be applied to rows, yielding modified rows, simply transpose the results. Cheers, Tsjerk On Tue, May 14, 2013 at 12:07 PM, Ted Harding ted.hard...@wlandres.netwrote: On 14-May-2013 09:46:32 Duncan Murdoch wrote: On 13-05-14 4:52 AM, Luca Nanetti wrote: Dear experts, I wanted to signal a peculiar, unexpected behaviour of 'apply'. It is not a bug, it is per spec, but it is so counterintuitive that I thought it could be interesting. I have an array, let's say test, dim=c(7,5). test - array(1:35, dim=c(7, 5)) test [,1] [,2] [,3] [,4] [,5] [1,]18 15 22 29 [2,]29 16 23 30 [3,]3 10 17 24 31 [4,]4 11 18 25 32 [5,]5 12 19 26 33 [6,]6 13 20 27 34 [7,]7 14 21 28 35 I want a new array where the content of the rows (columns) are permuted, differently per row (per column) Let's start with the columns, i.e. the second MARGIN of the array: test.m2 - apply(test, 2, sample) test.m2 [,1] [,2] [,3] [,4] [,5] [1,]1 10 18 23 32 [2,]79 16 25 30 [3,]6 14 17 22 33 [4,]4 11 15 24 34 [5,]2 12 21 28 31 [6,]58 20 26 29 [7,]3 13 19 27 35 perfect. That was exactly what I wanted: the content of each column is shuffled, and differently for each column. However, if I use the same with the rows (MARGIIN = 1), the output is transposed! test.m1 - apply(test, 1, sample) test.m1 [,1] [,2] [,3] [,4] [,5] [,6] [,7] [1,]12345 13 21 [2,] 22 30 17 18 19 20 35 [3,] 15 23 24 32 26 27 14 [4,] 29 16 31 25 33 34 28 [5,]89 10 11 1267 In other words, I wanted to permute the content of the rows of test, and I expected to see in the output, well, the shuffled rows as rows, not as column! I would respectfully suggest to make this behavior more explicit in the documentation. It's is already very explicit: If each call to FUN returns a vector of length n, then apply returns an array of dimension c(n, dim(X)[MARGIN]) if n 1. In your first case, sample is applied to columns, and returns length 7 results, so the shape of the final result is c(7, 5). In the second case it is applied to rows, and returns length 5 results, so the shape is c(5, 7). Duncan Murdoch And the (quite simple) practical implication of what Duncan points out is: test - array(1:35, dim=c(7, 5)) test # [,1] [,2] [,3] [,4] [,5] # [1,]18 15 22 29 # [2,]29 16 23 30 # [3,]3 10 17 24 31 # [4,]4 11 18 25 32 # [5,]5 12 19 26 33 # [6,]6 13 20 27 34 # [7,]7 14 21 28 35 # To permute the rows: t(apply(t(test), 2, sample)) # [,1] [,2] [,3] [,4] [,5] # [1,] 22 298 151 # [2,] 30 16 2329 # [3,] 10 31 243 17 # [4,] 114 25 32 18 # [5,] 265 12 33 19 # [6,] 27 34 20 136 # [7,] 35 28 147 21 which looks right! Ted. - E-Mail: (Ted Harding) ted.hard...@wlandres.net Date: 14-May-2013 Time: 11:07:46 This message was sent by XFMail __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Tsjerk A. Wassenaar, Ph.D. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Unexpected behavior of apply when FUN=sample
This is Circle 8.1.47 of 'The R Inferno'. http://www.burns-stat.com/documents/books/the-r-inferno/ Pat On 14/05/2013 09:52, Luca Nanetti wrote: Dear experts, I wanted to signal a peculiar, unexpected behaviour of 'apply'. It is not a bug, it is per spec, but it is so counterintuitive that I thought it could be interesting. I have an array, let's say test, dim=c(7,5). test - array(1:35, dim=c(7, 5)) test [,1] [,2] [,3] [,4] [,5] [1,]18 15 22 29 [2,]29 16 23 30 [3,]3 10 17 24 31 [4,]4 11 18 25 32 [5,]5 12 19 26 33 [6,]6 13 20 27 34 [7,]7 14 21 28 35 I want a new array where the content of the rows (columns) are permuted, differently per row (per column) Let's start with the columns, i.e. the second MARGIN of the array: test.m2 - apply(test, 2, sample) test.m2 [,1] [,2] [,3] [,4] [,5] [1,]1 10 18 23 32 [2,]79 16 25 30 [3,]6 14 17 22 33 [4,]4 11 15 24 34 [5,]2 12 21 28 31 [6,]58 20 26 29 [7,]3 13 19 27 35 perfect. That was exactly what I wanted: the content of each column is shuffled, and differently for each column. However, if I use the same with the rows (MARGIIN = 1), the output is transposed! test.m1 - apply(test, 1, sample) test.m1 [,1] [,2] [,3] [,4] [,5] [,6] [,7] [1,]12345 13 21 [2,] 22 30 17 18 19 20 35 [3,] 15 23 24 32 26 27 14 [4,] 29 16 31 25 33 34 28 [5,]89 10 11 1267 In other words, I wanted to permute the content of the rows of test, and I expected to see in the output, well, the shuffled rows as rows, not as column! I would respectfully suggest to make this behavior more explicit in the documentation. Kind regards, Luca Nanetti -- Patrick Burns pbu...@pburns.seanet.com twitter: @burnsstat @portfolioprobe http://www.portfolioprobe.com/blog http://www.burns-stat.com (home of: 'Impatient R' 'The R Inferno' 'Tao Te Programming') __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.