>>>>> Heinz Tuechler <tuech...@gmx.at> >>>>> on Sat, 07 Aug 2010 01:01:24 +0100 writes:
> Also Surv objects are matrices and they share the same problem when > rbind-ing data.frames. > If contained in a data.frame, Surv objects loose their class after > rbind and therefore do not more represent Surv objects afterwards. > Using rbind with Surv objects outside of data.frames shows a similar > problem, but not the same column names. > In conclusion, yes, matrices are common in data.frames, but not > without problems. My understanding (> 20 yr long S and R experience) has been that a dataframe definitely can have matrix-like "components", and as Bill Dunlap (with equal S & R experience) has just explained, that's actually more common than you have thought. To have *data frame*s instead of simple matrices, should be much less common, I'm not sure if it's a good idea. But getting back to 'matrices', I think they should work "without problems", at least for basic R operations such as rbind(). I don't have time to analyze the Surv - example below, but at the moment think, that we'd be interested in "fixing" the problems.. Martin Maechler, ETH Zurich > Heinz > ## example > library(survival) > ## create example data > starttime <- rep(0,5) > stoptime <- 1:5 > event <- c(1,0,1,1,1) > group <- c(1,1,1,2,2) > ## build Surv object > survobj <- Surv(starttime, stoptime, event) > ## build data.frame with Surv object > df.test <- data.frame(survobj, group) > df.test > ## rbind data.frames > rbind(df.test, df.test) > ## rbind Surv objects > rbind(survobj, survobj) > At 06.08.2010 09:34 -0700, William Dunlap wrote: >> > -----Original Message----- >> > From: r-devel-boun...@r-project.org >> > [mailto:r-devel-boun...@r-project.org] On Behalf Of Nicholas >> > L Crookston >> > Sent: Friday, August 06, 2010 8:35 AM >> > To: Michael Lachmann >> > Cc: r-devel-boun...@r-project.org; r-devel@r-project.org >> > Subject: Re: [Rd] rbind on data.frame that contains a column >> > that is also a data.frame >> > >> > OK...I'll put in my 2 cents worth. >> > >> > It seems to me that the problem is with this line: >> > >> > b$a=a , where "s" is something other than a vector with >> > length equal to nrow(b). >> > >> > I had no idea that a dataframe could hold a dataframe. It is not just >> > rbind(b,b) that fails, apply(b,1,sum) fails and so does plot(b). I'll >> > bet other R commands fail as well. >> > >> > My point of view is that a dataframe is a list of vectors >> > of equal length and various types (this is not exactly what the help >> > page says, but it is what it suggests to me). >> > >> > Hum, I wonder how much code is based on the idea that a >> > dataframe can hold >> > a dataframe. >> >> I used to think that non-vectors in data.frames were >> pretty rare things but when I started looking into >> the details of the modelling code I discovered that >> matrices in data.frames are common. E.g., >> > library(splines) >> > sapply(model.frame(data=mtcars, mpg~ns(hp)+poly(disp,2)), class) >> $mpg >> [1] "numeric" >> >> $`ns(hp)` >> [1] "ns" "basis" "matrix" >> >> $`poly(disp, 2)` >> [1] "poly" "matrix" >> You may not see these things because you don't call model.frame() >> directly, but most modelling functions (e.g., lm() and glm()) >> do call it and use the grouping provided by the matrices to encode >> how the columns of the design matrix are related to one another. >> >> If matrices are allowed, shouldn't data.frames be allowed as well? >> >> Bill Dunlap >> Spotfire, TIBCO Software >> wdunlap tibco.com >> >> > 15 years of using R just isn't enough! But, I can >> > say that not >> > one >> > line of code I've written expects a dataframe to hold a dataframe. >> > >> > > Hi, >> > >> > > The following was already a topic on r-help, but after >> > understanding >> > what is >> > > going on, I think it fits better in r-devel. >> > >> > > The problem is this: >> > > When a data.frame has another data.frame in it, rbind >> > doesn't work well. >> > > Here is an example: >> > > -- >> > > > a=data.frame(x=1:10,y=1:10) >> > > > b=data.frame(z=1:10) >> > > > b$a=a >> > > > b >> > > z a.x a.y >> > > 1 1 1 1 >> > > 2 2 2 2 >> > > 3 3 3 3 >> > > 4 4 4 4 >> > > 5 5 5 5 >> > > 6 6 6 6 >> > > 7 7 7 7 >> > > 8 8 8 8 >> > > 9 9 9 9 >> > > 10 10 10 10 >> > > > rbind(b,b) >> > > Error in `row.names<-.data.frame`(`*tmp*`, value = c("1", >> > "2", "3", "4", >> > : >> > > duplicate 'row.names' are not allowed >> > > In addition: Warning message: >> > > non-unique values when setting 'row.names': ?1?, ?10?, ?2?, >> > ?3?, ?4?, >> > ?5?, >> > > ?6?, ?7?, ?8?, ?9? >> > > -- >> > >> > > >> > > Looking at the code of rbind.data.frame, the error comes from the >> > > lines: >> > > -- >> > > xij <- xi[[j]] >> > > if (has.dim[jj]) { >> > > value[[jj]][ri, ] <- xij >> > > rownames(value[[jj]])[ri] <- rownames(xij) # <-- problem is here >> > > } >> > > -- >> > > if the rownames() line is dropped, all works well. What this line >> > > tries to do is to join the rownames of internal elements of the >> > > data.frames I try to rbind. So the result, in my case should have a >> > > column 'a', whose rownames are the rownames of the original >> > column 'a'. >> > It >> > > isn't totally clear to me why this is needed. When would a >> > data.frame >> > > have different rownames on the inside vs. the outside? >> > >> > > Notice also that rbind takes into account whether the >> > rownames of the >> > > data.frames to be joined are simply 1:n, or they are something else. >> > > If they are 1:n, then the result will have rownames 1:(n+m). If not, >> > > then the rownames might be kept. >> > >> > > I think, more consistent would be to replace the lines above with >> > > something like: >> > > if (has.dim[jj]) { >> > > value[[jj]][ri, ] <- xij >> > > rnj = rownames(value[[jj]]) >> > > rnj[ri] = rownames(xij) >> > > rnj = make.unique(as.character(unlist(rnj)), sep = "") >> > > rownames(value[[jj]]) <- rnj >> > > } >> > >> > > In this case, the rownames of inside elements will also be >> > joined, but >> > > in case they overlap, they will be made unique - just as >> > they are for >> > > the overall result of rbind. A side effect here would be that the >> > > rownames of matrices will also be made unique, which till now didn't >> > > happen, and which also doesn't happen when one rbinds matrices that >> > > have rownames. So it would be better to test above if we are dealing >> > > with a matrix or a data.frame. >> > >> > > But most people don't have different rownames inside and outside. >> > > Maybe it would be best to add a flag as to whether you care or don't >> > > care about the rownames of internal data.frames... >> > >> > > But maybe data.frames aren't meant to contain other data.frames? >> > >> > > If instead I do >> > > b=data.frame( z=1:10, a=a) >> > > then rbind(b,b) works well. In this case the data.frame was >> > converted to >> > its >> > > columns. Maybe >> > > b$a = a >> > > should do the same? >> > >> > > Michael >> > > -- >> > > View this message in context: http://r.789695.n4.nabble.com/rbind- >> > > on-data-frame-that-contains-a-column-that-is-also-a-data-frame- >> > > tp2315682p2315682.html >> > > Sent from the R devel mailing list archive at Nabble.com. >> > >> > > ______________________________________________ >> > > R-devel@r-project.org mailing list >> > > https://stat.ethz.ch/mailman/listinfo/r-devel >> > [[alternative HTML version deleted]] >> > >> > ______________________________________________ >> > R-devel@r-project.org mailing list >> > https://stat.ethz.ch/mailman/listinfo/r-devel >> > >> >> ______________________________________________ >> R-devel@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-devel > ______________________________________________ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel