Re: [R] data frame vs. matrix
On 2014-03-16 23:56, Duncan Murdoch wrote: On 14-03-16 2:57 PM, Göran Broström wrote: I have always known that matrices are faster than data frames, for instance this function: dumkoll - function(n = 1000, df = TRUE){ dfr - data.frame(x = rnorm(n), y = rnorm(n)) if (df){ for (i in 2:NROW(dfr)){ if (!(i %% 100)) cat(i = , i, \n) dfr$x[i] - dfr$x[i-1] } }else{ dm - as.matrix(dfr) for (i in 2:NROW(dm)){ if (!(i %% 100)) cat(i = , i, \n) dm[i, 1] - dm[i-1, 1] } dfr$x - dm[, 1] } } system.time(dumkoll()) user system elapsed 0.046 0.000 0.045 system.time(dumkoll(df = FALSE)) user system elapsed 0.007 0.000 0.008 -- OK, no big deal, but I stumbled over a data frame with one million records. Then, with df = TRUE, usersystem elapsed 44677.141 1271.544 46016.754 This is around 12 hours. With df = FALSE, it took only six seconds! About 7500 time faster. I was really surprised by the huge difference, and I wonder if this is to be expected, or if it is some peculiarity with my installation: I'm running Ubuntu 13.10 on a MacBook Pro with 8 Gb memory, R-3.0.3. I don't find it surprising. The line dfr$x[i] - dfr$x[i-1] will be executed about a million times. It does the following: Thanks for the explanation; I got the idea that dfr[1, i] - might be faster than dfr$x[i] - , but it is in fact significantly slower. Helpful experience. Göran 1. Get a pointer to the x element of dfr. This requires R to look through all the names of dfr to figure out which one is x. 2. Extract the i-1 element from it. Not particularly slow. 3. Get a pointer to the x element of dfr again. (R doesn't cache these things.) 4. Set the i element of it to a new value. This could require the entire column or even the entire dataframe to be copied, if R hasn't kept track of the fact that it is really being changed in place. In a complex assignment like that, I wouldn't be surprised if that took place. (In the matrix equivalent, it would be easier to recognize that it is safe to change the existing value.) Luke Tierney is making some changes in R-devel that might help a lot in cases like this, but I expect the matrix code will always be faster. Duncan Murdoch __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] data frame vs. matrix
On 2014-03-17 01:31, Jeff Newmiller wrote: Did you really intend to make all of the x values the same? Not at all; the code in the loop was in fact just nonsense. The point was to illustrate the huge difference in execution time. And that the relative difference seems to increase fast with the number of observations. If so, try one line instead of the for loop: dfr$x[ 2:n ] - dfr$x[ 1 ] If that was merely an error in your example, then you could use a different one-liner: dfr$x[ 2:n ] - dfr$x[ seq.int( n-1 ) ] In either case, the speedup is considerable. I know about all this, but sometimes you have situations where you cannot avoid an explicit loop. I use data frames far more than matrices and don't feel I am suffering for it, but then I also use creative indexing way more than for loops. I think that this example shows that you need both tools in your toolbox. Göran --- Jeff NewmillerThe . . Go Live... DCN:jdnew...@dcn.davis.ca.usBasics: ##.#. ##.#. Live Go... Live: OO#.. Dead: OO#.. Playing Research Engineer (Solar/BatteriesO.O#. #.O#. with /Software/Embedded Controllers) .OO#. .OO#. rocks...1k --- Sent from my phone. Please excuse my brevity. On March 16, 2014 11:57:33 AM PDT, Göran Broström goran.brost...@umu.se wrote: I have always known that matrices are faster than data frames, for instance this function: dumkoll - function(n = 1000, df = TRUE){ dfr - data.frame(x = rnorm(n), y = rnorm(n)) if (df){ for (i in 2:NROW(dfr)){ if (!(i %% 100)) cat(i = , i, \n) dfr$x[i] - dfr$x[i-1] } }else{ dm - as.matrix(dfr) for (i in 2:NROW(dm)){ if (!(i %% 100)) cat(i = , i, \n) dm[i, 1] - dm[i-1, 1] } dfr$x - dm[, 1] } } system.time(dumkoll()) user system elapsed 0.046 0.000 0.045 system.time(dumkoll(df = FALSE)) user system elapsed 0.007 0.000 0.008 -- OK, no big deal, but I stumbled over a data frame with one million records. Then, with df = TRUE, user system elapsed 44677.141 1271.544 46016.754 This is around 12 hours. With df = FALSE, it took only six seconds! About 7500 time faster. I was really surprised by the huge difference, and I wonder if this is to be expected, or if it is some peculiarity with my installation: I'm running Ubuntu 13.10 on a MacBook Pro with 8 Gb memory, R-3.0.3. Göran B. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] data frame vs. matrix
On 2014-03-17 00:36, William Dunlap wrote: Duncan's analysis suggests another way to do this: extract the 'x' vector, operate on that vector in a loop, then insert the result into the data.frame. Thanks Bill, that is a good improvement. Göran I added a df=quicker option to your df argument and made the test dataset deterministic so we could verify that the algorithms do the same thing: dumkoll - function(n = 1000, df = TRUE){ dfr - data.frame(x = log(seq_len(n)), y = sqrt(seq_len(n))) if (identical(df, quicker)) { x - dfr$x for(i in 2:length(x)) { x[i] - x[i-1] } dfr$x - x } else if (df){ for (i in 2:NROW(dfr)){ # if (!(i %% 100)) cat(i = , i, \n) dfr$x[i] - dfr$x[i-1] } }else{ dm - as.matrix(dfr) for (i in 2:NROW(dm)){ # if (!(i %% 100)) cat(i = , i, \n) dm[i, 1] - dm[i-1, 1] } dfr$x - dm[, 1] } dfr } Timings for 10^4, 2*10^4, and 4*10^4 show that the time is quadratic in n for the df=TRUE case and close to linear in the other cases, with the new method taking about 60% the time of the matrix method: n - c(10k=1e4, 20k=2e4, 40k=4e4) sapply(n, function(n)system.time(dumkoll(n, df=FALSE))[1:3]) 10k 20k 40k user.self 0.11 0.22 0.43 sys.self 0.02 0.00 0.00 elapsed 0.12 0.22 0.44 sapply(n, function(n)system.time(dumkoll(n, df=TRUE))[1:3]) 10k 20k 40k user.self 3.59 14.74 78.37 sys.self 0.00 0.11 0.16 elapsed 3.59 14.91 78.81 sapply(n, function(n)system.time(dumkoll(n, df=quicker))[1:3]) 10k 20k 40k user.self 0.06 0.12 0.26 sys.self 0.00 0.00 0.00 elapsed 0.07 0.13 0.27 I also timed the 2 faster cases for n=10^6 and the time still looks linear in n, with vector approach still taking about 60% the time of the matrix approach. system.time(dumkoll(n=10^6, df=FALSE)) user system elapsed 11.650.12 11.82 system.time(dumkoll(n=10^6, df=quicker)) user system elapsed 6.790.086.91 The results from each method are identical: identical(dumkoll(100,df=FALSE), dumkoll(100,df=TRUE)) [1] TRUE identical(dumkoll(100,df=FALSE), dumkoll(100,df=quicker)) [1] TRUE If your data.frame has columns of various types, then as.matrix will coerce them all to a common type (often character), so it may give you the wrong result in addition to being unnecessarily slow. Bill Dunlap TIBCO Software wdunlap tibco.com -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Duncan Murdoch Sent: Sunday, March 16, 2014 3:56 PM To: Göran Broström; r-help@r-project.org Subject: Re: [R] data frame vs. matrix On 14-03-16 2:57 PM, Göran Broström wrote: I have always known that matrices are faster than data frames, for instance this function: dumkoll - function(n = 1000, df = TRUE){ dfr - data.frame(x = rnorm(n), y = rnorm(n)) if (df){ for (i in 2:NROW(dfr)){ if (!(i %% 100)) cat(i = , i, \n) dfr$x[i] - dfr$x[i-1] } }else{ dm - as.matrix(dfr) for (i in 2:NROW(dm)){ if (!(i %% 100)) cat(i = , i, \n) dm[i, 1] - dm[i-1, 1] } dfr$x - dm[, 1] } } system.time(dumkoll()) user system elapsed 0.046 0.000 0.045 system.time(dumkoll(df = FALSE)) user system elapsed 0.007 0.000 0.008 -- OK, no big deal, but I stumbled over a data frame with one million records. Then, with df = TRUE, usersystem elapsed 44677.141 1271.544 46016.754 This is around 12 hours. With df = FALSE, it took only six seconds! About 7500 time faster. I was really surprised by the huge difference, and I wonder if this is to be expected, or if it is some peculiarity with my installation: I'm running Ubuntu 13.10 on a MacBook Pro with 8 Gb memory, R-3.0.3. I don't find it surprising. The line dfr$x[i] - dfr$x[i-1] will be executed about a million times. It does the following: 1. Get a pointer to the x element of dfr. This requires R to look through all the names of dfr to figure out which one is x. 2. Extract the i-1 element from it. Not particularly slow. 3. Get a pointer to the x element of dfr again. (R doesn't cache these things.) 4. Set the i element of it to a new value. This could require the entire column or even the entire dataframe to be copied, if R hasn't kept track of the fact that it is really being changed in place. In a complex assignment like that, I wouldn't be surprised if that took place. (In the matrix equivalent, it would be easier to recognize that it is safe to change the existing
Re: [R] data frame vs. matrix
Hello, This is to be expected. Matrices can hold only one type of data so the problem is solved once and for all, data frames can have many types of data so the code to handle them must determine which type to handle on every access. Hope this helps, Rui Barradas Em 16-03-2014 18:57, Göran Broström escreveu: I have always known that matrices are faster than data frames, for instance this function: dumkoll - function(n = 1000, df = TRUE){ dfr - data.frame(x = rnorm(n), y = rnorm(n)) if (df){ for (i in 2:NROW(dfr)){ if (!(i %% 100)) cat(i = , i, \n) dfr$x[i] - dfr$x[i-1] } }else{ dm - as.matrix(dfr) for (i in 2:NROW(dm)){ if (!(i %% 100)) cat(i = , i, \n) dm[i, 1] - dm[i-1, 1] } dfr$x - dm[, 1] } } system.time(dumkoll()) user system elapsed 0.046 0.000 0.045 system.time(dumkoll(df = FALSE)) user system elapsed 0.007 0.000 0.008 -- OK, no big deal, but I stumbled over a data frame with one million records. Then, with df = TRUE, usersystem elapsed 44677.141 1271.544 46016.754 This is around 12 hours. With df = FALSE, it took only six seconds! About 7500 time faster. I was really surprised by the huge difference, and I wonder if this is to be expected, or if it is some peculiarity with my installation: I'm running Ubuntu 13.10 on a MacBook Pro with 8 Gb memory, R-3.0.3. Göran B. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] data frame vs. matrix
On 14-03-16 2:57 PM, Göran Broström wrote: I have always known that matrices are faster than data frames, for instance this function: dumkoll - function(n = 1000, df = TRUE){ dfr - data.frame(x = rnorm(n), y = rnorm(n)) if (df){ for (i in 2:NROW(dfr)){ if (!(i %% 100)) cat(i = , i, \n) dfr$x[i] - dfr$x[i-1] } }else{ dm - as.matrix(dfr) for (i in 2:NROW(dm)){ if (!(i %% 100)) cat(i = , i, \n) dm[i, 1] - dm[i-1, 1] } dfr$x - dm[, 1] } } system.time(dumkoll()) user system elapsed 0.046 0.000 0.045 system.time(dumkoll(df = FALSE)) user system elapsed 0.007 0.000 0.008 -- OK, no big deal, but I stumbled over a data frame with one million records. Then, with df = TRUE, usersystem elapsed 44677.141 1271.544 46016.754 This is around 12 hours. With df = FALSE, it took only six seconds! About 7500 time faster. I was really surprised by the huge difference, and I wonder if this is to be expected, or if it is some peculiarity with my installation: I'm running Ubuntu 13.10 on a MacBook Pro with 8 Gb memory, R-3.0.3. I don't find it surprising. The line dfr$x[i] - dfr$x[i-1] will be executed about a million times. It does the following: 1. Get a pointer to the x element of dfr. This requires R to look through all the names of dfr to figure out which one is x. 2. Extract the i-1 element from it. Not particularly slow. 3. Get a pointer to the x element of dfr again. (R doesn't cache these things.) 4. Set the i element of it to a new value. This could require the entire column or even the entire dataframe to be copied, if R hasn't kept track of the fact that it is really being changed in place. In a complex assignment like that, I wouldn't be surprised if that took place. (In the matrix equivalent, it would be easier to recognize that it is safe to change the existing value.) Luke Tierney is making some changes in R-devel that might help a lot in cases like this, but I expect the matrix code will always be faster. Duncan Murdoch __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] data frame vs. matrix
Duncan's analysis suggests another way to do this: extract the 'x' vector, operate on that vector in a loop, then insert the result into the data.frame. I added a df=quicker option to your df argument and made the test dataset deterministic so we could verify that the algorithms do the same thing: dumkoll - function(n = 1000, df = TRUE){ dfr - data.frame(x = log(seq_len(n)), y = sqrt(seq_len(n))) if (identical(df, quicker)) { x - dfr$x for(i in 2:length(x)) { x[i] - x[i-1] } dfr$x - x } else if (df){ for (i in 2:NROW(dfr)){ # if (!(i %% 100)) cat(i = , i, \n) dfr$x[i] - dfr$x[i-1] } }else{ dm - as.matrix(dfr) for (i in 2:NROW(dm)){ # if (!(i %% 100)) cat(i = , i, \n) dm[i, 1] - dm[i-1, 1] } dfr$x - dm[, 1] } dfr } Timings for 10^4, 2*10^4, and 4*10^4 show that the time is quadratic in n for the df=TRUE case and close to linear in the other cases, with the new method taking about 60% the time of the matrix method: n - c(10k=1e4, 20k=2e4, 40k=4e4) sapply(n, function(n)system.time(dumkoll(n, df=FALSE))[1:3]) 10k 20k 40k user.self 0.11 0.22 0.43 sys.self 0.02 0.00 0.00 elapsed 0.12 0.22 0.44 sapply(n, function(n)system.time(dumkoll(n, df=TRUE))[1:3]) 10k 20k 40k user.self 3.59 14.74 78.37 sys.self 0.00 0.11 0.16 elapsed 3.59 14.91 78.81 sapply(n, function(n)system.time(dumkoll(n, df=quicker))[1:3]) 10k 20k 40k user.self 0.06 0.12 0.26 sys.self 0.00 0.00 0.00 elapsed 0.07 0.13 0.27 I also timed the 2 faster cases for n=10^6 and the time still looks linear in n, with vector approach still taking about 60% the time of the matrix approach. system.time(dumkoll(n=10^6, df=FALSE)) user system elapsed 11.650.12 11.82 system.time(dumkoll(n=10^6, df=quicker)) user system elapsed 6.790.086.91 The results from each method are identical: identical(dumkoll(100,df=FALSE), dumkoll(100,df=TRUE)) [1] TRUE identical(dumkoll(100,df=FALSE), dumkoll(100,df=quicker)) [1] TRUE If your data.frame has columns of various types, then as.matrix will coerce them all to a common type (often character), so it may give you the wrong result in addition to being unnecessarily slow. Bill Dunlap TIBCO Software wdunlap tibco.com -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Duncan Murdoch Sent: Sunday, March 16, 2014 3:56 PM To: Göran Broström; r-help@r-project.org Subject: Re: [R] data frame vs. matrix On 14-03-16 2:57 PM, Göran Broström wrote: I have always known that matrices are faster than data frames, for instance this function: dumkoll - function(n = 1000, df = TRUE){ dfr - data.frame(x = rnorm(n), y = rnorm(n)) if (df){ for (i in 2:NROW(dfr)){ if (!(i %% 100)) cat(i = , i, \n) dfr$x[i] - dfr$x[i-1] } }else{ dm - as.matrix(dfr) for (i in 2:NROW(dm)){ if (!(i %% 100)) cat(i = , i, \n) dm[i, 1] - dm[i-1, 1] } dfr$x - dm[, 1] } } system.time(dumkoll()) user system elapsed 0.046 0.000 0.045 system.time(dumkoll(df = FALSE)) user system elapsed 0.007 0.000 0.008 -- OK, no big deal, but I stumbled over a data frame with one million records. Then, with df = TRUE, usersystem elapsed 44677.141 1271.544 46016.754 This is around 12 hours. With df = FALSE, it took only six seconds! About 7500 time faster. I was really surprised by the huge difference, and I wonder if this is to be expected, or if it is some peculiarity with my installation: I'm running Ubuntu 13.10 on a MacBook Pro with 8 Gb memory, R-3.0.3. I don't find it surprising. The line dfr$x[i] - dfr$x[i-1] will be executed about a million times. It does the following: 1. Get a pointer to the x element of dfr. This requires R to look through all the names of dfr to figure out which one is x. 2. Extract the i-1 element from it. Not particularly slow. 3. Get a pointer to the x element of dfr again. (R doesn't cache these things.) 4. Set the i element of it to a new value. This could require the entire column or even the entire dataframe to be copied, if R hasn't kept track of the fact that it is really being changed in place. In a complex assignment like that, I wouldn't be surprised if that took place. (In the matrix equivalent, it would be easier to recognize that it is safe to change the existing value.) Luke Tierney is making some changes in R
Re: [R] data frame vs. matrix
Did you really intend to make all of the x values the same? If so, try one line instead of the for loop: dfr$x[ 2:n ] - dfr$x[ 1 ] If that was merely an error in your example, then you could use a different one-liner: dfr$x[ 2:n ] - dfr$x[ seq.int( n-1 ) ] In either case, the speedup is considerable. I use data frames far more than matrices and don't feel I am suffering for it, but then I also use creative indexing way more than for loops. --- Jeff NewmillerThe . . Go Live... DCN:jdnew...@dcn.davis.ca.usBasics: ##.#. ##.#. Live Go... Live: OO#.. Dead: OO#.. Playing Research Engineer (Solar/BatteriesO.O#. #.O#. with /Software/Embedded Controllers) .OO#. .OO#. rocks...1k --- Sent from my phone. Please excuse my brevity. On March 16, 2014 11:57:33 AM PDT, Göran Broström goran.brost...@umu.se wrote: I have always known that matrices are faster than data frames, for instance this function: dumkoll - function(n = 1000, df = TRUE){ dfr - data.frame(x = rnorm(n), y = rnorm(n)) if (df){ for (i in 2:NROW(dfr)){ if (!(i %% 100)) cat(i = , i, \n) dfr$x[i] - dfr$x[i-1] } }else{ dm - as.matrix(dfr) for (i in 2:NROW(dm)){ if (!(i %% 100)) cat(i = , i, \n) dm[i, 1] - dm[i-1, 1] } dfr$x - dm[, 1] } } system.time(dumkoll()) user system elapsed 0.046 0.000 0.045 system.time(dumkoll(df = FALSE)) user system elapsed 0.007 0.000 0.008 -- OK, no big deal, but I stumbled over a data frame with one million records. Then, with df = TRUE, usersystem elapsed 44677.141 1271.544 46016.754 This is around 12 hours. With df = FALSE, it took only six seconds! About 7500 time faster. I was really surprised by the huge difference, and I wonder if this is to be expected, or if it is some peculiarity with my installation: I'm running Ubuntu 13.10 on a MacBook Pro with 8 Gb memory, R-3.0.3. Göran B. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Data frame vs matrix quirk: Hinky error message?
Hi Bert, The failure itself is the documented behavior: ?'[.data.frame' says Matrix indexing ('x[i]' with a logical or a 2-column integer matrix 'i') using '[' is not recommended, and barely supported. For extraction, 'x' is first coerced to a matrix. For replacement, a logical matrix (only) can be used to select the elements to be replaced in the same way as for a matrix. The error message may be a bit hinky, as obviously data.frames can be indexed by things other than logical matricies. Or is there another reason this strikes you as odd? Best, Ista On Tue, May 1, 2012 at 1:33 PM, Bert Gunter gunter.ber...@gene.com wrote: AdvisoRs: Is the following a bug, feature, hinky error message, or dumb Bert? mtest - matrix(1:12,nr=4) dftest - data.frame(mtest) ix - cbind(1:2,2:3) mtest[ix] - NA mtest [,1] [,2] [,3] [1,] 1 NA 9 [2,] 2 6 NA [3,] 3 7 11 [4,] 4 8 12 ## But ... dftest[ix] - NA Error in `[-.data.frame`(`*tmp*`, ix, value = NA) : only logical matrix subscripts are allowed in replacement Obviously, I was expecting matrix indexing for replacement to work similarly in both cases; however, I can see why it would be problematic for data frames (mixed types), but was a bit nonplussed by the error message, which seems hinky to me. Cheers, Bert -- Bert Gunter Genentech Nonclinical Biostatistics Internal Contact Info: Phone: 467-7374 Website: http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Data frame vs matrix quirk: Hinky error message?
On 01-May-2012 17:33:23 Bert Gunter wrote: AdvisoRs: Is the following a bug, feature, hinky error message, or dumb Bert? mtest - matrix(1:12,nr=4) dftest - data.frame(mtest) ix - cbind(1:2,2:3) mtest[ix] - NA mtest [,1] [,2] [,3] [1,]1 NA9 [2,]26 NA [3,]37 11 [4,]48 12 ## But ... dftest[ix] - NA Error in `[-.data.frame`(`*tmp*`, ix, value = NA) : only logical matrix subscripts are allowed in replacement Obviously, I was expecting matrix indexing for replacement to work similarly in both cases; however, I can see why it would be problematic for data frames (mixed types), but was a bit nonplussed by the error message, which seems hinky to me. Cheers, Bert Also interesting is that, prior to the substitution commands, mtest[ix] # [1] 5 10 dftest[ix] # [1] 5 10 both as one would expect on Bert's naive gounds (which, I confess, I also share[d]). Ted. - E-Mail: (Ted Harding) ted.hard...@wlandres.net Date: 01-May-2012 Time: 19:03:14 This message was sent by XFMail __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Data frame vs matrix quirk: Hinky error message?
Many thanks, Ista: I only looked in ].default so the answer is: Alternative 4: dumb Bert. Rap knuckles with ruler. Actually, indexing by a logical matrix doesn't make much sense to me in either case, as it does not have the effect of selecting individual elements, which is what numeric matrix indices do. But that's a matter of usage, neither bug nor feature. If I had gotten something like the error message: Matrix indices not allowed for replacement in data frames, I would not have been surprised. But as you said, the behavior **IS** documented. Best, Bert On Tue, May 1, 2012 at 10:49 AM, Ista Zahn istaz...@gmail.com wrote: Hi Bert, The failure itself is the documented behavior: ?'[.data.frame' says Matrix indexing ('x[i]' with a logical or a 2-column integer matrix 'i') using '[' is not recommended, and barely supported. For extraction, 'x' is first coerced to a matrix. For replacement, a logical matrix (only) can be used to select the elements to be replaced in the same way as for a matrix. The error message may be a bit hinky, as obviously data.frames can be indexed by things other than logical matricies. Or is there another reason this strikes you as odd? Best, Ista On Tue, May 1, 2012 at 1:33 PM, Bert Gunter gunter.ber...@gene.com wrote: AdvisoRs: Is the following a bug, feature, hinky error message, or dumb Bert? mtest - matrix(1:12,nr=4) dftest - data.frame(mtest) ix - cbind(1:2,2:3) mtest[ix] - NA mtest [,1] [,2] [,3] [1,] 1 NA 9 [2,] 2 6 NA [3,] 3 7 11 [4,] 4 8 12 ## But ... dftest[ix] - NA Error in `[-.data.frame`(`*tmp*`, ix, value = NA) : only logical matrix subscripts are allowed in replacement Obviously, I was expecting matrix indexing for replacement to work similarly in both cases; however, I can see why it would be problematic for data frames (mixed types), but was a bit nonplussed by the error message, which seems hinky to me. Cheers, Bert -- Bert Gunter Genentech Nonclinical Biostatistics Internal Contact Info: Phone: 467-7374 Website: http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Bert Gunter Genentech Nonclinical Biostatistics Internal Contact Info: Phone: 467-7374 Website: http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Data frame vs matrix quirk: Hinky error message?
On 01/05/2012 2:12 PM, Bert Gunter wrote: Many thanks, Ista: I only looked in ].default so the answer is: Alternative 4: dumb Bert. Rap knuckles with ruler. Actually, indexing by a logical matrix doesn't make much sense to me in either case, as it does not have the effect of selecting individual elements, which is what numeric matrix indices do. But that's a matter of usage, neither bug nor feature. If I had gotten something like the error message: Matrix indices not allowed for replacement in data frames, I would not have been surprised. But as you said, the behavior **IS** documented. Your version is not correct: matrix indices *are* allowed for replacement, but only logical matrix indices, not two column numerical ones. The message might be clearer if instead of saying only logical matrix subscripts are allowed in replacement it said matrix subscripts must be logical matrices in replacement, but I think the basic problem is the limitation. I'll fix that. Duncan Murdoch Best, Bert On Tue, May 1, 2012 at 10:49 AM, Ista Zahnistaz...@gmail.com wrote: Hi Bert, The failure itself is the documented behavior: ?'[.data.frame' says Matrix indexing ('x[i]' with a logical or a 2-column integer matrix 'i') using '[' is not recommended, and barely supported. For extraction, 'x' is first coerced to a matrix. For replacement, a logical matrix (only) can be used to select the elements to be replaced in the same way as for a matrix. The error message may be a bit hinky, as obviously data.frames can be indexed by things other than logical matricies. Or is there another reason this strikes you as odd? Best, Ista On Tue, May 1, 2012 at 1:33 PM, Bert Guntergunter.ber...@gene.com wrote: AdvisoRs: Is the following a bug, feature, hinky error message, or dumb Bert? mtest- matrix(1:12,nr=4) dftest- data.frame(mtest) ix- cbind(1:2,2:3) mtest[ix]- NA mtest [,1] [,2] [,3] [1,]1 NA9 [2,]26 NA [3,]37 11 [4,]48 12 ## But ... dftest[ix]- NA Error in `[-.data.frame`(`*tmp*`, ix, value = NA) : only logical matrix subscripts are allowed in replacement Obviously, I was expecting matrix indexing for replacement to work similarly in both cases; however, I can see why it would be problematic for data frames (mixed types), but was a bit nonplussed by the error message, which seems hinky to me. Cheers, Bert -- Bert Gunter Genentech Nonclinical Biostatistics Internal Contact Info: Phone: 467-7374 Website: http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Data frame vs matrix quirk: Hinky error message?
Duncan: Maybe there **is** a bug, then. zmat - matrix(1:12,nr=4) zdf - data.frame(zmat) ix - cbind(c(FALSE,TRUE),c(TRUE,TRUE)) zmat[ix] [1] 2 3 4 6 7 8 10 11 12 zdf[ix] [1] 2 3 4 6 7 8 10 11 12 zmat[ix] - NA zmat [,1] [,2] [,3] [1,]159 [2,] NA NA NA [3,] NA NA NA [4,] NA NA NA ## ?? zdf[ix] - NA Error in `[-.data.frame`(`*tmp*`, ix, value = NA) : only logical matrix subscripts are allowed in replacement That matrix replacement should not work with (in general mixed type) data frames seems reasonable, actually. Trying to fix things may not be. But I leave this to you and your fellow expeRts, Cheers, Bert On Tue, May 1, 2012 at 11:30 AM, Duncan Murdoch murdoch.dun...@gmail.com wrote: On 01/05/2012 2:12 PM, Bert Gunter wrote: Many thanks, Ista: I only looked in ].default so the answer is: Alternative 4: dumb Bert. Rap knuckles with ruler. Actually, indexing by a logical matrix doesn't make much sense to me in either case, as it does not have the effect of selecting individual elements, which is what numeric matrix indices do. But that's a matter of usage, neither bug nor feature. If I had gotten something like the error message: Matrix indices not allowed for replacement in data frames, I would not have been surprised. But as you said, the behavior **IS** documented. Your version is not correct: matrix indices *are* allowed for replacement, but only logical matrix indices, not two column numerical ones. The message might be clearer if instead of saying only logical matrix subscripts are allowed in replacement it said matrix subscripts must be logical matrices in replacement, but I think the basic problem is the limitation. I'll fix that. Duncan Murdoch Best, Bert On Tue, May 1, 2012 at 10:49 AM, Ista Zahnistaz...@gmail.com wrote: Hi Bert, The failure itself is the documented behavior: ?'[.data.frame' says Matrix indexing ('x[i]' with a logical or a 2-column integer matrix 'i') using '[' is not recommended, and barely supported. For extraction, 'x' is first coerced to a matrix. For replacement, a logical matrix (only) can be used to select the elements to be replaced in the same way as for a matrix. The error message may be a bit hinky, as obviously data.frames can be indexed by things other than logical matricies. Or is there another reason this strikes you as odd? Best, Ista On Tue, May 1, 2012 at 1:33 PM, Bert Guntergunter.ber...@gene.com wrote: AdvisoRs: Is the following a bug, feature, hinky error message, or dumb Bert? mtest- matrix(1:12,nr=4) dftest- data.frame(mtest) ix- cbind(1:2,2:3) mtest[ix]- NA mtest [,1] [,2] [,3] [1,] 1 NA 9 [2,] 2 6 NA [3,] 3 7 11 [4,] 4 8 12 ## But ... dftest[ix]- NA Error in `[-.data.frame`(`*tmp*`, ix, value = NA) : only logical matrix subscripts are allowed in replacement Obviously, I was expecting matrix indexing for replacement to work similarly in both cases; however, I can see why it would be problematic for data frames (mixed types), but was a bit nonplussed by the error message, which seems hinky to me. Cheers, Bert -- Bert Gunter Genentech Nonclinical Biostatistics Internal Contact Info: Phone: 467-7374 Website: http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Bert Gunter Genentech Nonclinical Biostatistics Internal Contact Info: Phone: 467-7374 Website: http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Data frame vs matrix quirk: Hinky error message?
The difference is in recycling. If the logical matrix has the same dimensions, it seems to work: jx - cbind(c(FALSE, TRUE, FALSE, TRUE), c(TRUE, FALSE, TRUE, FALSE), c(FALSE, TRUE, FALSE, TRUE)) zmat[jx] - NA zmat [,1] [,2] [,3] [1,]1 NA9 [2,] NA6 NA [3,]3 NA 11 [4,] NA8 NA zdf[jx] - NA zdf X1 X2 X3 1 1 NA 9 2 NA 6 NA 3 3 NA 11 4 NA 8 NA -- David L Carlson Associate Professor of Anthropology Texas AM University College Station, TX 77843-4352 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-bounces@r- project.org] On Behalf Of Bert Gunter Sent: Tuesday, May 01, 2012 1:46 PM To: Duncan Murdoch Cc: r-help@r-project.org Subject: Re: [R] Data frame vs matrix quirk: Hinky error message? Duncan: Maybe there **is** a bug, then. zmat - matrix(1:12,nr=4) zdf - data.frame(zmat) ix - cbind(c(FALSE,TRUE),c(TRUE,TRUE)) zmat[ix] [1] 2 3 4 6 7 8 10 11 12 zdf[ix] [1] 2 3 4 6 7 8 10 11 12 zmat[ix] - NA zmat [,1] [,2] [,3] [1,]159 [2,] NA NA NA [3,] NA NA NA [4,] NA NA NA ## ?? zdf[ix] - NA Error in `[-.data.frame`(`*tmp*`, ix, value = NA) : only logical matrix subscripts are allowed in replacement That matrix replacement should not work with (in general mixed type) data frames seems reasonable, actually. Trying to fix things may not be. But I leave this to you and your fellow expeRts, Cheers, Bert On Tue, May 1, 2012 at 11:30 AM, Duncan Murdoch murdoch.dun...@gmail.com wrote: On 01/05/2012 2:12 PM, Bert Gunter wrote: Many thanks, Ista: I only looked in ].default so the answer is: Alternative 4: dumb Bert. Rap knuckles with ruler. Actually, indexing by a logical matrix doesn't make much sense to me in either case, as it does not have the effect of selecting individual elements, which is what numeric matrix indices do. But that's a matter of usage, neither bug nor feature. If I had gotten something like the error message: Matrix indices not allowed for replacement in data frames, I would not have been surprised. But as you said, the behavior **IS** documented. Your version is not correct: matrix indices *are* allowed for replacement, but only logical matrix indices, not two column numerical ones. The message might be clearer if instead of saying only logical matrix subscripts are allowed in replacement it said matrix subscripts must be logical matrices in replacement, but I think the basic problem is the limitation. I'll fix that. Duncan Murdoch Best, Bert On Tue, May 1, 2012 at 10:49 AM, Ista Zahnistaz...@gmail.com wrote: Hi Bert, The failure itself is the documented behavior: ?'[.data.frame' says Matrix indexing ('x[i]' with a logical or a 2-column integer matrix 'i') using '[' is not recommended, and barely supported. For extraction, 'x' is first coerced to a matrix. For replacement, a logical matrix (only) can be used to select the elements to be replaced in the same way as for a matrix. The error message may be a bit hinky, as obviously data.frames can be indexed by things other than logical matricies. Or is there another reason this strikes you as odd? Best, Ista On Tue, May 1, 2012 at 1:33 PM, Bert Guntergunter.ber...@gene.com wrote: AdvisoRs: Is the following a bug, feature, hinky error message, or dumb Bert? mtest- matrix(1:12,nr=4) dftest- data.frame(mtest) ix- cbind(1:2,2:3) mtest[ix]- NA mtest [,1] [,2] [,3] [1,] 1 NA 9 [2,] 2 6 NA [3,] 3 7 11 [4,] 4 8 12 ## But ... dftest[ix]- NA Error in `[-.data.frame`(`*tmp*`, ix, value = NA) : only logical matrix subscripts are allowed in replacement Obviously, I was expecting matrix indexing for replacement to work similarly in both cases; however, I can see why it would be problematic for data frames (mixed types), but was a bit nonplussed by the error message, which seems hinky to me. Cheers, Bert -- Bert Gunter Genentech Nonclinical Biostatistics Internal Contact Info: Phone: 467-7374 Website: http://pharmadevelopment.roche.com/index/pdb/pdb-functional- groups/pdb-biostatistics/pdb-ncb-home.htm __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Bert Gunter Genentech Nonclinical Biostatistics Internal Contact Info: Phone: 467-7374 Website: http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb
Re: [R] Data frame vs matrix quirk: Hinky error message?
On 01/05/2012 2:45 PM, Bert Gunter wrote: Duncan: Maybe there **is** a bug, then. zmat- matrix(1:12,nr=4) zdf- data.frame(zmat) ix- cbind(c(FALSE,TRUE),c(TRUE,TRUE)) zmat[ix] [1] 2 3 4 6 7 8 10 11 12 zdf[ix] [1] 2 3 4 6 7 8 10 11 12 zmat[ix]- NA zmat [,1] [,2] [,3] [1,]159 [2,] NA NA NA [3,] NA NA NA [4,] NA NA NA ## ?? zdf[ix]- NA Error in `[-.data.frame`(`*tmp*`, ix, value = NA) : only logical matrix subscripts are allowed in replacement That matrix replacement should not work with (in general mixed type) data frames seems reasonable, actually. Trying to fix things may not be. But I leave this to you and your fellow expeRts, My intention is to allow two column numeric indices, not to change the behaviour for logical matrix indices. I'm planning to leave the not recommended note in the help page, because there will still be surprises as above, but the error message will just say illegal matrix index in replacement The rule will remain that a logical matrix needs to be of the same dimension as the original dataframe. I'm not sure if this is documented currently, but it will be. Duncan Murdoch Cheers, Bert On Tue, May 1, 2012 at 11:30 AM, Duncan Murdoch murdoch.dun...@gmail.com wrote: On 01/05/2012 2:12 PM, Bert Gunter wrote: Many thanks, Ista: I only looked in ].default so the answer is: Alternative 4: dumb Bert. Rap knuckles with ruler. Actually, indexing by a logical matrix doesn't make much sense to me in either case, as it does not have the effect of selecting individual elements, which is what numeric matrix indices do. But that's a matter of usage, neither bug nor feature. If I had gotten something like the error message: Matrix indices not allowed for replacement in data frames, I would not have been surprised. But as you said, the behavior **IS** documented. Your version is not correct: matrix indices *are* allowed for replacement, but only logical matrix indices, not two column numerical ones. The message might be clearer if instead of saying only logical matrix subscripts are allowed in replacement it said matrix subscripts must be logical matrices in replacement, but I think the basic problem is the limitation. I'll fix that. Duncan Murdoch Best, Bert On Tue, May 1, 2012 at 10:49 AM, Ista Zahnistaz...@gmail.comwrote: Hi Bert, The failure itself is the documented behavior: ?'[.data.frame' says Matrix indexing ('x[i]' with a logical or a 2-column integer matrix 'i') using '[' is not recommended, and barely supported. For extraction, 'x' is first coerced to a matrix. For replacement, a logical matrix (only) can be used to select the elements to be replaced in the same way as for a matrix. The error message may be a bit hinky, as obviously data.frames can be indexed by things other than logical matricies. Or is there another reason this strikes you as odd? Best, Ista On Tue, May 1, 2012 at 1:33 PM, Bert Guntergunter.ber...@gene.com wrote: AdvisoRs: Is the following a bug, feature, hinky error message, or dumb Bert? mtest- matrix(1:12,nr=4) dftest- data.frame(mtest) ix- cbind(1:2,2:3) mtest[ix]- NA mtest [,1] [,2] [,3] [1,]1 NA9 [2,]26 NA [3,]37 11 [4,]48 12 ## But ... dftest[ix]- NA Error in `[-.data.frame`(`*tmp*`, ix, value = NA) : only logical matrix subscripts are allowed in replacement Obviously, I was expecting matrix indexing for replacement to work similarly in both cases; however, I can see why it would be problematic for data frames (mixed types), but was a bit nonplussed by the error message, which seems hinky to me. Cheers, Bert -- Bert Gunter Genentech Nonclinical Biostatistics Internal Contact Info: Phone: 467-7374 Website: http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Data frame vs matrix quirk: Hinky error message?
Bert, I think this is what is needed for the data frame ix - cbind(1:2,2:3) ixm - matrix(FALSE,4,3) ixm[ix] - TRUE zdf[ixm] - NA Hope this is helpful, Dan Daniel J. Nordlund Washington State Department of Social and Health Services Planning, Performance, and Accountability Research and Data Analysis Division Olympia, WA 98504-5204 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-bounces@r- project.org] On Behalf Of Bert Gunter Sent: Tuesday, May 01, 2012 11:46 AM To: Duncan Murdoch Cc: r-help@r-project.org Subject: Re: [R] Data frame vs matrix quirk: Hinky error message? Duncan: Maybe there **is** a bug, then. zmat - matrix(1:12,nr=4) zdf - data.frame(zmat) ix - cbind(c(FALSE,TRUE),c(TRUE,TRUE)) zmat[ix] [1] 2 3 4 6 7 8 10 11 12 zdf[ix] [1] 2 3 4 6 7 8 10 11 12 zmat[ix] - NA zmat [,1] [,2] [,3] [1,]159 [2,] NA NA NA [3,] NA NA NA [4,] NA NA NA ## ?? zdf[ix] - NA Error in `[-.data.frame`(`*tmp*`, ix, value = NA) : only logical matrix subscripts are allowed in replacement That matrix replacement should not work with (in general mixed type) data frames seems reasonable, actually. Trying to fix things may not be. But I leave this to you and your fellow expeRts, Cheers, Bert On Tue, May 1, 2012 at 11:30 AM, Duncan Murdoch murdoch.dun...@gmail.com wrote: On 01/05/2012 2:12 PM, Bert Gunter wrote: Many thanks, Ista: I only looked in ].default so the answer is: Alternative 4: dumb Bert. Rap knuckles with ruler. Actually, indexing by a logical matrix doesn't make much sense to me in either case, as it does not have the effect of selecting individual elements, which is what numeric matrix indices do. But that's a matter of usage, neither bug nor feature. If I had gotten something like the error message: Matrix indices not allowed for replacement in data frames, I would not have been surprised. But as you said, the behavior **IS** documented. Your version is not correct: matrix indices *are* allowed for replacement, but only logical matrix indices, not two column numerical ones. The message might be clearer if instead of saying only logical matrix subscripts are allowed in replacement it said matrix subscripts must be logical matrices in replacement, but I think the basic problem is the limitation. I'll fix that. Duncan Murdoch Best, Bert On Tue, May 1, 2012 at 10:49 AM, Ista Zahnistaz...@gmail.com wrote: Hi Bert, The failure itself is the documented behavior: ?'[.data.frame' says Matrix indexing ('x[i]' with a logical or a 2-column integer matrix 'i') using '[' is not recommended, and barely supported. For extraction, 'x' is first coerced to a matrix. For replacement, a logical matrix (only) can be used to select the elements to be replaced in the same way as for a matrix. The error message may be a bit hinky, as obviously data.frames can be indexed by things other than logical matricies. Or is there another reason this strikes you as odd? Best, Ista On Tue, May 1, 2012 at 1:33 PM, Bert Guntergunter.ber...@gene.com wrote: AdvisoRs: Is the following a bug, feature, hinky error message, or dumb Bert? mtest- matrix(1:12,nr=4) dftest- data.frame(mtest) ix- cbind(1:2,2:3) mtest[ix]- NA mtest [,1] [,2] [,3] [1,] 1 NA 9 [2,] 2 6 NA [3,] 3 7 11 [4,] 4 8 12 ## But ... dftest[ix]- NA Error in `[-.data.frame`(`*tmp*`, ix, value = NA) : only logical matrix subscripts are allowed in replacement Obviously, I was expecting matrix indexing for replacement to work similarly in both cases; however, I can see why it would be problematic for data frames (mixed types), but was a bit nonplussed by the error message, which seems hinky to me. Cheers, Bert -- Bert Gunter Genentech Nonclinical Biostatistics Internal Contact Info: Phone: 467-7374 Website: http://pharmadevelopment.roche.com/index/pdb/pdb-functional- groups/pdb-biostatistics/pdb-ncb-home.htm __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Bert Gunter Genentech Nonclinical Biostatistics Internal Contact Info: Phone: 467-7374 Website: http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb- biostatistics/pdb-ncb-home.htm __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http
Re: [R] Data frame vs matrix quirk: Hinky error message?
Many thanks to all. I appreciate your kindness and patience. The point is, of course, that matrix subscripting by logicals requires different semantics than by numeric indices, as it must. I'd still say this is a case of option 4, dumb Bert: I should have figured this out. Duncan's proposed changes to both behavior and documentation would certainly address all my points of confusion. However, I agree that numeric replacement indices for data frames may be a can of worms: presumably silent type conversion would be required when replacing values in mixed type columns. Keeping the warnings in -- and maybe issuing some more when the type conversion occurs -- is certainly a good idea. Best, Bert On Tue, May 1, 2012 at 12:57 PM, Nordlund, Dan (DSHS/RDA) nord...@dshs.wa.gov wrote: Bert, I think this is what is needed for the data frame ix - cbind(1:2,2:3) ixm - matrix(FALSE,4,3) ixm[ix] - TRUE zdf[ixm] - NA Hope this is helpful, Dan Daniel J. Nordlund Washington State Department of Social and Health Services Planning, Performance, and Accountability Research and Data Analysis Division Olympia, WA 98504-5204 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-bounces@r- project.org] On Behalf Of Bert Gunter Sent: Tuesday, May 01, 2012 11:46 AM To: Duncan Murdoch Cc: r-help@r-project.org Subject: Re: [R] Data frame vs matrix quirk: Hinky error message? Duncan: Maybe there **is** a bug, then. zmat - matrix(1:12,nr=4) zdf - data.frame(zmat) ix - cbind(c(FALSE,TRUE),c(TRUE,TRUE)) zmat[ix] [1] 2 3 4 6 7 8 10 11 12 zdf[ix] [1] 2 3 4 6 7 8 10 11 12 zmat[ix] - NA zmat [,1] [,2] [,3] [1,] 1 5 9 [2,] NA NA NA [3,] NA NA NA [4,] NA NA NA ## ?? zdf[ix] - NA Error in `[-.data.frame`(`*tmp*`, ix, value = NA) : only logical matrix subscripts are allowed in replacement That matrix replacement should not work with (in general mixed type) data frames seems reasonable, actually. Trying to fix things may not be. But I leave this to you and your fellow expeRts, Cheers, Bert On Tue, May 1, 2012 at 11:30 AM, Duncan Murdoch murdoch.dun...@gmail.com wrote: On 01/05/2012 2:12 PM, Bert Gunter wrote: Many thanks, Ista: I only looked in ].default so the answer is: Alternative 4: dumb Bert. Rap knuckles with ruler. Actually, indexing by a logical matrix doesn't make much sense to me in either case, as it does not have the effect of selecting individual elements, which is what numeric matrix indices do. But that's a matter of usage, neither bug nor feature. If I had gotten something like the error message: Matrix indices not allowed for replacement in data frames, I would not have been surprised. But as you said, the behavior **IS** documented. Your version is not correct: matrix indices *are* allowed for replacement, but only logical matrix indices, not two column numerical ones. The message might be clearer if instead of saying only logical matrix subscripts are allowed in replacement it said matrix subscripts must be logical matrices in replacement, but I think the basic problem is the limitation. I'll fix that. Duncan Murdoch Best, Bert On Tue, May 1, 2012 at 10:49 AM, Ista Zahnistaz...@gmail.com wrote: Hi Bert, The failure itself is the documented behavior: ?'[.data.frame' says Matrix indexing ('x[i]' with a logical or a 2-column integer matrix 'i') using '[' is not recommended, and barely supported. For extraction, 'x' is first coerced to a matrix. For replacement, a logical matrix (only) can be used to select the elements to be replaced in the same way as for a matrix. The error message may be a bit hinky, as obviously data.frames can be indexed by things other than logical matricies. Or is there another reason this strikes you as odd? Best, Ista On Tue, May 1, 2012 at 1:33 PM, Bert Guntergunter.ber...@gene.com wrote: AdvisoRs: Is the following a bug, feature, hinky error message, or dumb Bert? mtest- matrix(1:12,nr=4) dftest- data.frame(mtest) ix- cbind(1:2,2:3) mtest[ix]- NA mtest [,1] [,2] [,3] [1,] 1 NA 9 [2,] 2 6 NA [3,] 3 7 11 [4,] 4 8 12 ## But ... dftest[ix]- NA Error in `[-.data.frame`(`*tmp*`, ix, value = NA) : only logical matrix subscripts are allowed in replacement Obviously, I was expecting matrix indexing for replacement to work similarly in both cases; however, I can see why it would be problematic for data frames (mixed types), but was a bit nonplussed by the error message, which seems hinky to me. Cheers, Bert -- Bert Gunter Genentech Nonclinical Biostatistics Internal Contact Info: Phone: 467-7374 Website: http
Re: [R] Data frame vs matrix quirk: Hinky error message?
On 12-05-01 3:57 PM, Nordlund, Dan (DSHS/RDA) wrote: Bert, I think this is what is needed for the data frame ix- cbind(1:2,2:3) ixm- matrix(FALSE,4,3) ixm[ix]- TRUE zdf[ixm]- NA Hope this is helpful, That's essentially what I did in adding the numeric indexing. The only complication was handling the case where ix contains out of bound values; users don't want to hear that ixm[ix] - TRUE failed. Duncan Murdoch Dan Daniel J. Nordlund Washington State Department of Social and Health Services Planning, Performance, and Accountability Research and Data Analysis Division Olympia, WA 98504-5204 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-bounces@r- project.org] On Behalf Of Bert Gunter Sent: Tuesday, May 01, 2012 11:46 AM To: Duncan Murdoch Cc: r-help@r-project.org Subject: Re: [R] Data frame vs matrix quirk: Hinky error message? Duncan: Maybe there **is** a bug, then. zmat- matrix(1:12,nr=4) zdf- data.frame(zmat) ix- cbind(c(FALSE,TRUE),c(TRUE,TRUE)) zmat[ix] [1] 2 3 4 6 7 8 10 11 12 zdf[ix] [1] 2 3 4 6 7 8 10 11 12 zmat[ix]- NA zmat [,1] [,2] [,3] [1,]159 [2,] NA NA NA [3,] NA NA NA [4,] NA NA NA ## ?? zdf[ix]- NA Error in `[-.data.frame`(`*tmp*`, ix, value = NA) : only logical matrix subscripts are allowed in replacement That matrix replacement should not work with (in general mixed type) data frames seems reasonable, actually. Trying to fix things may not be. But I leave this to you and your fellow expeRts, Cheers, Bert On Tue, May 1, 2012 at 11:30 AM, Duncan Murdoch murdoch.dun...@gmail.com wrote: On 01/05/2012 2:12 PM, Bert Gunter wrote: Many thanks, Ista: I only looked in ].default so the answer is: Alternative 4: dumb Bert. Rap knuckles with ruler. Actually, indexing by a logical matrix doesn't make much sense to me in either case, as it does not have the effect of selecting individual elements, which is what numeric matrix indices do. But that's a matter of usage, neither bug nor feature. If I had gotten something like the error message: Matrix indices not allowed for replacement in data frames, I would not have been surprised. But as you said, the behavior **IS** documented. Your version is not correct: matrix indices *are* allowed for replacement, but only logical matrix indices, not two column numerical ones. The message might be clearer if instead of saying only logical matrix subscripts are allowed in replacement it said matrix subscripts must be logical matrices in replacement, but I think the basic problem is the limitation. I'll fix that. Duncan Murdoch Best, Bert On Tue, May 1, 2012 at 10:49 AM, Ista Zahnistaz...@gmail.com wrote: Hi Bert, The failure itself is the documented behavior: ?'[.data.frame' says Matrix indexing ('x[i]' with a logical or a 2-column integer matrix 'i') using '[' is not recommended, and barely supported. For extraction, 'x' is first coerced to a matrix. For replacement, a logical matrix (only) can be used to select the elements to be replaced in the same way as for a matrix. The error message may be a bit hinky, as obviously data.frames can be indexed by things other than logical matricies. Or is there another reason this strikes you as odd? Best, Ista On Tue, May 1, 2012 at 1:33 PM, Bert Guntergunter.ber...@gene.com wrote: AdvisoRs: Is the following a bug, feature, hinky error message, or dumb Bert? mtest- matrix(1:12,nr=4) dftest- data.frame(mtest) ix- cbind(1:2,2:3) mtest[ix]- NA mtest [,1] [,2] [,3] [1,]1 NA9 [2,]26 NA [3,]37 11 [4,]48 12 ## But ... dftest[ix]- NA Error in `[-.data.frame`(`*tmp*`, ix, value = NA) : only logical matrix subscripts are allowed in replacement Obviously, I was expecting matrix indexing for replacement to work similarly in both cases; however, I can see why it would be problematic for data frames (mixed types), but was a bit nonplussed by the error message, which seems hinky to me. Cheers, Bert -- Bert Gunter Genentech Nonclinical Biostatistics Internal Contact Info: Phone: 467-7374 Website: http://pharmadevelopment.roche.com/index/pdb/pdb-functional- groups/pdb-biostatistics/pdb-ncb-home.htm __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Bert Gunter Genentech Nonclinical Biostatistics Internal Contact Info: Phone: 467-7374 Website: http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb- biostatistics/pdb-ncb-home.htm __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman
Re: [R] Data frame vs matrix quirk: Hinky error message?
P.S. The way the logical matrix is constructed is NOT general purpose. Quoting myself quoting Bert, Actually, it works, as long as the logical index matrix has the same dimensions as the data frame. zmat - matrix(1:12,nr=4) zdf - data.frame(zmat) # Numeric index matrix. ix - cbind(1:2,2:3) # Logical index matrix. ix2 - row(zdf) == ix[, 1] col(zdf) == ix[, 2] Here the number of rows in zdf is a multiple of the vectors ix[, 1] and ix[ , 2] lengths. The recycling rules makes it work. But if the numeric index matrix has, say, 3 rows, another way of constructing the logical one would be needed. jx - cbind(1:3, c(2:3, 3)) row(zdf) == jx[, 1] col(zdf) == jx[, 2] [,1] [,2] [,3] [1,] FALSE FALSE FALSE [2,] FALSE FALSE FALSE [3,] FALSE FALSE FALSE [4,] FALSE FALSE FALSE (Anyway, I don't believe that was the point.) R.B. -- View this message in context: http://r.789695.n4.nabble.com/Data-frame-vs-matrix-quirk-Hinky-error-message-tp4601254p4601558.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Data frame vs matrix quirk: Hinky error message?
Hello, Bert Gunter wrote Duncan: Maybe there **is** a bug, then. zmat - matrix(1:12,nr=4) zdf - data.frame(zmat) ix - cbind(c(FALSE,TRUE),c(TRUE,TRUE)) zmat[ix] [1] 2 3 4 6 7 8 10 11 12 zdf[ix] [1] 2 3 4 6 7 8 10 11 12 zmat[ix] - NA zmat [,1] [,2] [,3] [1,]159 [2,] NA NA NA [3,] NA NA NA [4,] NA NA NA ## ?? zdf[ix] - NA Error in `[-.data.frame`(`*tmp*`, ix, value = NA) : only logical matrix subscripts are allowed in replacement That matrix replacement should not work with (in general mixed type) data frames seems reasonable, actually. Trying to fix things may not be. But I leave this to you and your fellow expeRts, Cheers, Bert On Tue, May 1, 2012 at 11:30 AM, Duncan Murdoch lt;murdoch.duncan@gt; wrote: On 01/05/2012 2:12 PM, Bert Gunter wrote: Many thanks, Ista: I only looked in ].default so the answer is: Alternative 4: dumb Bert. Rap knuckles with ruler. Actually, indexing by a logical matrix doesn't make much sense to me in either case, as it does not have the effect of selecting individual elements, which is what numeric matrix indices do. But that's a matter of usage, neither bug nor feature. If I had gotten something like the error message: Matrix indices not allowed for replacement in data frames, I would not have been surprised. But as you said, the behavior **IS** documented. Your version is not correct: matrix indices *are* allowed for replacement, but only logical matrix indices, not two column numerical ones. The message might be clearer if instead of saying only logical matrix subscripts are allowed in replacement it said matrix subscripts must be logical matrices in replacement, but I think the basic problem is the limitation. I'll fix that. Duncan Murdoch Best, Bert On Tue, May 1, 2012 at 10:49 AM, Ista Zahnlt;istazahn@gt; wrote: Hi Bert, The failure itself is the documented behavior: ?'[.data.frame' says Matrix indexing ('x[i]' with a logical or a 2-column integer matrix 'i') using '[' is not recommended, and barely supported. For extraction, 'x' is first coerced to a matrix. For replacement, a logical matrix (only) can be used to select the elements to be replaced in the same way as for a matrix. The error message may be a bit hinky, as obviously data.frames can be indexed by things other than logical matricies. Or is there another reason this strikes you as odd? Best, Ista On Tue, May 1, 2012 at 1:33 PM, Bert Gunterlt;gunter.berton@gt; wrote: AdvisoRs: Is the following a bug, feature, hinky error message, or dumb Bert? mtest- matrix(1:12,nr=4) dftest- data.frame(mtest) ix- cbind(1:2,2:3) mtest[ix]- NA mtest [,1] [,2] [,3] [1,] 1 NA 9 [2,] 2 6 NA [3,] 3 7 11 [4,] 4 8 12 ## But ... dftest[ix]- NA Error in `[-.data.frame`(`*tmp*`, ix, value = NA) : only logical matrix subscripts are allowed in replacement Obviously, I was expecting matrix indexing for replacement to work similarly in both cases; however, I can see why it would be problematic for data frames (mixed types), but was a bit nonplussed by the error message, which seems hinky to me. Cheers, Bert -- Bert Gunter Genentech Nonclinical Biostatistics Internal Contact Info: Phone: 467-7374 Website: http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm __ R-help@ mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Bert Gunter Genentech Nonclinical Biostatistics Internal Contact Info: Phone: 467-7374 Website: http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm __ R-help@ mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Actually, it works, as long as the logical index matrix has the same dimensions as the data frame. zmat - matrix(1:12,nr=4) zdf - data.frame(zmat) # Numeric index matrix. ix - cbind(1:2,2:3) # Logical index matrix. ix2 - row(zdf) == ix[, 1] col(zdf) == ix[, 2] zmat[ix] zmat[ix2] zdf[ix] zdf[ix2] zmat[ix] - NA zmat # So far so good, # But now, as already seen, error zdf[ix] - NA # Works zdf[ix2] - NA zdf It even makes sense... Rui Barradas -- View this message in context: http://r.789695.n4.nabble.com/Data-frame-vs-matrix-quirk-Hinky-error-message-tp4601254p4601507.html Sent from the R help mailing list archive at Nabble.com.
Re: [R] Data frame vs matrix quirk: Hinky error message?
On May 1, 2012, at 1:33 PM, Bert Gunter wrote: AdvisoRs: Is the following a bug, feature, hinky error message, or dumb Bert? mtest - matrix(1:12,nr=4) dftest - data.frame(mtest) ix - cbind(1:2,2:3) mtest[ix] - NA mtest [,1] [,2] [,3] [1,]1 NA9 [2,]26 NA [3,]37 11 [4,]48 12 ## But ... dftest[ix] - NA Error in `[-.data.frame`(`*tmp*`, ix, value = NA) : only logical matrix subscripts are allowed in replacement I'm not sure _I_ would have expected '[-.data.frame' to recognize that a matrix was being offered because the [.] formalism without a comma (called i-indexing on the help page) would generally be referencing only columns (i.e. list elements). I had not realized the possibilitiy of offering a logical matrix to df but it does succeed as predicted by ?[.data.frame For replacement, a logical matrix (only) can be used to select the elements to be replaced in the same way as for a matrix. So how you want to characterize documented behavior is your call. I would never choose the label you offered. mtest - matrix(FALSE, 4,4) ix - cbind(1:2,2:3) dftest - data.frame(mtest) mtest[ix] - TRUE dftest[mtest] - a dftest X1X2X3X4 1 FALSE a FALSE FALSE 2 FALSE FALSE a FALSE 3 FALSE FALSE FALSE FALSE 4 FALSE FALSE FALSE FALSE The nonassignment operation still succeeds: dftest[ix] [1] a a Obviously, I was expecting matrix indexing for replacement to work similarly in both cases; however, I can see why it would be problematic for data frames (mixed types), but was a bit nonplussed by the error message, which seems hinky to me. Cheers, Bert -- -- David Winsemius, MD Heritage Laboratories West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.