On Tue, Dec 6, 2016 at 5:10 PM, Chris Evans <chrish...@psyctc.org> wrote: > {{SIGH}} > > You are absolutely right. > > I wonder if I am losing some cognitive capacities that are needed to be part > of the evolving R community. It seems to me that if a tibble is designed to > be an enhanced replacement for a dataframe then it shouldn't quite so > radically change things.
Well, there are some things about data frames that are darn annoying, and tibbles exist partly as an attempt to eliminate some of the inconsistencies with data.frames. That necessarily means changing things. > > I notice that the documentation on tibble says "[ Never simplifies (drops), > so always returns data.frame" > That is much less explicit than I would have liked and actually doesn't seem > to be true. In fact, as you rightly say, it generally, but not quite always, > returns a tibble. In fact it can be fooled into a vector of length 1. Really? How? > >> tmpTibble[[1,]] > Error in `[[.data.frame`(tmpTibble, 1, ) : > argument "..2" is missing, with no default That doesn't have anything to do with tibbles: as.data.frame(tmpTibble)[[1, ]] gives the same thing. > >> tmpTibble[1] > # A tibble: 26 × 1 > ID > <chr> > 1 a > 2 b > 3 c > 4 d > 5 e > 6 f > 7 g > 8 h > 9 i > 10 j > # ... with 16 more rows Again, just what you expect from a data.frame (except for the print method). >> tmpTibble[,1] > # A tibble: 26 × 1 > ID > <chr> > 1 a > 2 b > 3 c > 4 d > 5 e > 6 f > 7 g > 8 h > 9 i > 10 j > # ... with 16 more rows That is different, and by design as you noted. It is different from data.frame indexing, but the data.frame behavior is needlessly complicated. Sometimes you get a vector, sometimes a data.frame. That hardly seems worth it given that we already have $ or [[ if you really wanted a vector. >> tmpTibble[1,] > Error in `[<-.data.frame`(`*tmp*`, , value = list(ID = c("a", "a", "a", : > replacement element 3 is a matrix/data frame of 26 rows, need 1 > In addition: Warning messages: > 1: In `[<-.data.frame`(`*tmp*`, , value = list(ID = c("a", "a", "a", : > replacement element 1 has 26 rows to replace 1 rows > 2: In `[<-.data.frame`(`*tmp*`, , value = list(ID = c("a", "a", "a", : > replacement element 2 has 26 rows to replace 1 rows That's not what I get. > tmpTibble[1,] # A tibble: 1 × 2 ID num <chr> <int> 1 a 1 works just as I would expect here. >> tmpTibble[1,1:26] > Error: Invalid column indexes: 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, > 17, 18, 19, 20, 21, 22, 23, 24, 25, 26 Other than providing more information about what went wrong this is the same as data.frame: > as.data.frame(tmpTibble)[1,1:26] Error in `[.data.frame`(as.data.frame(tmpTibble), 1, 1:26) : undefined columns selected >> tmpTibble[[1,2]] > [1] 1 Same as data.frame. (and not at odds with the documentations which says that [ (not [[ ) always returns a data.frame). >> str(tmpTibble[[1,2]]) > int 1 >> str(tmpTibble[[1:2,2]]) > Error in col[[i, exact = exact]] : > attempt to select more than one element in vectorIndex Same behavior as data.frame. >> >> tmpTibble[[1,1:2]] > [1] "b" >> Same behavior as data.frame. > > So [[a,b]] works if a and b are legal with the dimensions of the tibble and > if a is of length 1 but returns NOT a tibble but a vector of length 1 (I > think), I can see that's logical but not what it says in the documentation. In what documentation? The documentation that says [ always returns a data.frame? Note that [ and [[ are not the same, and only [ is documented to always return a data.frame. > > [[a]] and [[,a]] return the same result, that seems excessively tolerant to > me. Not for me: > tmpTibble[[1]] [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m" "n" "o" "p" "q" "r" "s" [20] "t" "u" "v" "w" "x" "y" "z" > tmpTibble[[, 1]] Error in `[[.data.frame`(tmpTibble, , 1) : argument "..1" is missing, with no default (this is the same thing that happens with a data.frame) > > [[a,b:c]] actually returns [[a,c]] and again as a single value, NOT a tibble. That is weird, but not different that data.frame. See above regarding "NOT a tibble". > > And row subsetting/indexing has gone. Whatever do you mean? > tmpTibble[tmpTibble$ID == "d", ] # A tibble: 1 × 2 ID num <chr> <int> 1 d 4 > > Why create replacement for a dataframe that has no row indexing and so > radically redefines column indexing, in fact redefines the whole of indexing > and subsetting? It has row indexing, and besides [, x] not dropping dimension it works pretty much the same. > > OK. I will go to sleep now and hope to feel less dumb(ed) when I wake. > Perhaps Prof. Wickham or someone can spell out a bit less tersely, and I > think incompletely, than the tibble documentation does, why all this is good. Most of the things you identify here are issues inherited from data.frame, and and not due differences between tibbles and data.frames. Best, Ista > > Thanks anyway Ista, you certainly hit the issue! > > Very best all, > > Chris > >> From: "Ista Zahn" <istaz...@gmail.com> >> To: "Chris Evans" <chrish...@psyctc.org> >> Cc: "r-helpr-project.org" <r-help@r-project.org> >> Sent: Tuesday, 6 December, 2016 21:40:41 >> Subject: Re: [R] Odd behaviour of mean() with a numeric column in a tibble > >> Not at a computer to check right now, but I believe single bracket indexing a >> tibble always returns a tibble. To extract a vector use [[ > >> On Dec 6, 2016 4:28 PM, "Chris Evans" < chrish...@psyctc.org > wrote: > >>> I hope I am obeying the list rules here. I am using a raw R IDE for this and >> > running 3.3.2 (2016-10-31) on x86_64-w64-mingw32/x64 (64-bit) > >> > Here is a reproducible example. Code only first > >> > require(tibble) >> > tmpTibble <- tibble(ID=letters,num=1:26) >> > min(tmpTibble[,2]) # fine >> > max(tmpTibble[,2]) # fine >> > median(tmpTibble[,2]) # not fine >> > mean(tmpTibble[,2]) # not fine > >> I think you want > >> mean(tmpTibble[[2]] > >> > newMeanFun <- function(x) {mean(as.numeric(unlist(x)))} >> > newMeanFun(tmpTibble[,2]) # solved problem but surely shouldn't be >> > necessary?! >> > newMedianFun <- function(x) {median(as.numeric(unlist(x)))} >> > newMedianFun(tmpTibble[,2]) # ditto >> > str(tmpTibble[,2]) > >> > ### then I tried this to make sure it wasn't about having fed in integers > >> > tmpTibble2 <- tibble(ID=letters,num=1:26,num2=(1:26)/10) >> > tmpTibble2 >> > mean(tmpTibble2[,3]) # not fine, not about integers! > > >>> ### before I just created tmpTibble2 I found myself trying to add a column >>> to >> > tmpTibble >> > tmpTibble$newNum <- tmpTibble[,2]/10 # NO! >> > tmpTibble[["newNum"]] <- tmpTibble[,2]/10 # NO! >> > ### and oddly enough ... >> > add_column(tmpTibble,newNum = tmpTibble[,2]/10) # NO! > >> > Now here it is with the output: > >> > > require(tibble) >> > Loading required package: tibble >> > > tmpTibble <- tibble(ID=letters,num=1:26) >> > > min(tmpTibble[,2]) # fine >> > [1] 1 >> > > max(tmpTibble[,2]) # fine >> > [1] 26 >> > > median(tmpTibble[,2]) # not fine >> > Error in median.default(tmpTibble[, 2]) : need numeric data >> > > mean(tmpTibble[,2]) # not fine >> > [1] NA >> > Warning message: >> > In mean.default(tmpTibble[, 2]) : >> > argument is not numeric or logical: returning NA >> > > newMeanFun <- function(x) {mean(as.numeric(unlist(x)))} >> > > newMeanFun(tmpTibble[,2]) # solved problem but surely shouldn't be >> > > necessary?! >> > [1] 13.5 >> > > newMedianFun <- function(x) {median(as.numeric(unlist(x)))} >> > > newMedianFun(tmpTibble[,2]) # ditto >> > [1] 13.5 >> > > str(tmpTibble[,2]) >> > Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 26 obs. of 1 variable: >> > $ num: int 1 2 3 4 5 6 7 8 9 10 ... > >> > > ### then I tried this to make sure it wasn't about having fed in integers > >> > > tmpTibble2 <- tibble(ID=letters,num=1:26,num2=(1:26)/10) >> > > tmpTibble2 >> > # A tibble: 26 × 3 >> > ID num num2 >> > <chr> <int> <dbl> >> > 1 a 1 0.1 >> > 2 b 2 0.2 >> > 3 c 3 0.3 >> > 4 d 4 0.4 >> > 5 e 5 0.5 >> > 6 f 6 0.6 >> > 7 g 7 0.7 >> > 8 h 8 0.8 >> > 9 i 9 0.9 >> > 10 j 10 1.0 >> > # ... with 16 more rows >> > > mean(tmpTibble2[,3]) # not fine, not about integers! >> > [1] NA >> > Warning message: >> > In mean.default(tmpTibble2[, 3]) : >> > argument is not numeric or logical: returning NA > > >>> > ### before I just created tmpTibble2 I found myself trying to add a >>> > column to >> > > tmpTibble >> > > tmpTibble$newNum <- tmpTibble[,2]/10 # NO! >> > > tmpTibble[["newNum"]] <- tmpTibble[,2]/10 # NO! >> > > ### and oddly enough ... >> > > add_column(tmpTibble,newNum = tmpTibble[,2]/10) # NO! >> > Error: Each variable must be a 1d atomic vector or list. >> > Problem variables: 'newNum' > > > >>> I discovered this when I hit odd behaviour after using read_spss() from the >>> haven package for the first time as it seemed to be offering a step forward >>> over good old read.spss() from the excellent foreign package. I am >>> reporting it >>> here not directly to Prof. Wickham as the issues seem rather general though >>> I'm >>> guessing that it needs to be fixed with a fix to tibble. Or perhaps I've >> > completely missed something. > >> > TIA, > >> > Chris > >> > ______________________________________________ >> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> > https://stat.ethz.ch/mailman/listinfo/r-help >> > PLEASE do read the posting guide >> > http://www.R-project.org/posting-guide.html >> > and provide commented, minimal, self-contained, reproducible code. > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.