Re: [Rd] Quiz: How to get a named column from a data frame
On 12-08-18 12:33 PM, Martin Maechler wrote: Joshua Ulrich josh.m.ulr...@gmail.com on Sat, 18 Aug 2012 10:16:09 -0500 writes: I don't know if this is better, but it's the most obvious/shortest I could come up with. Transpose the data.frame column to a 'row' vector and drop the dimensions. R identical(nv, drop(t(df))) [1] TRUE Yes, that's definitely shorter, congratulations! One gotta is that I'd want a solution that also works when the df has more columns than just one... Your idea to use t(.) is nice and perfect insofar as it coerces the data frame to a matrix, and that's really the clue: Where as df[,1] is losing the names, the matrix indexing is not. So your solution can be changed into t(df)[1,] which is even shorter... and slightly less efficient, at least conceptually, than mine, which has been as.matrix(df)[,1] Now, the remaining question is: Shouldn't there be something more natural to achieve that? (There is not, currently, AFAIK). I've been offline, so I'm a bit late to this game, but the examples above fail when df contains a character column as well as the desired one, because everything gets coerced to a character matrix. You need to select the column first, then convert to a matrix, e.g. drop(t(df[,1,drop=FALSE])) Duncan Murdoch Martin Best, -- Joshua Ulrich | about.me/joshuaulrich FOSS Trading | www.fosstrading.com On Sat, Aug 18, 2012 at 10:03 AM, Martin Maechler maech...@stat.math.ethz.ch wrote: Today, I was looking for an elegant (and efficient) way to get a named (atomic) vector by selecting one column of a data frame. Of course, the vector names must be the rownames of the data frame. Ok, here is the quiz, I know one quite cute/slick answer, but was wondering if there are obvious better ones, and also if this should not become more idiomatic (hence R-devel): Consider this toy example, where the dataframe already has only one column : nv - c(a=1, d=17, e=101); nv a d e 1 17 101 df - as.data.frame(cbind(VAR = nv)); df VAR a 1 d 17 e 101 Now how, can I get 'nv' back from 'df' ? I.e., how to get identical(nv, ...) [1] TRUE where .. only uses 'df' (and no non-standard R packages)? As said, I know a simple solution (*), but I'm sure it is not obvious to most R users and probably not even to the majority of R-devel readers... OTOH, people like Bill Dunlap will not take long to provide it or a better one. (*) In my solution, the above '...' consists of 17 letters. I'll post it later today (CEST time) ... or confirm that someone else has done so. Martin __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Quiz: How to get a named column from a data frame
On Tue, Aug 21, 2012 at 2:34 PM, Duncan Murdoch murdoch.dun...@gmail.com wrote: On 12-08-18 12:33 PM, Martin Maechler wrote: Joshua Ulrich josh.m.ulr...@gmail.com on Sat, 18 Aug 2012 10:16:09 -0500 writes: I don't know if this is better, but it's the most obvious/shortest I could come up with. Transpose the data.frame column to a 'row' vector and drop the dimensions. R identical(nv, drop(t(df))) [1] TRUE Yes, that's definitely shorter, congratulations! One gotta is that I'd want a solution that also works when the df has more columns than just one... Your idea to use t(.) is nice and perfect insofar as it coerces the data frame to a matrix, and that's really the clue: Where as df[,1] is losing the names, the matrix indexing is not. So your solution can be changed into t(df)[1,] which is even shorter... and slightly less efficient, at least conceptually, than mine, which has been as.matrix(df)[,1] Now, the remaining question is: Shouldn't there be something more natural to achieve that? (There is not, currently, AFAIK). I've been offline, so I'm a bit late to this game, but the examples above fail when df contains a character column as well as the desired one, because everything gets coerced to a character matrix. You need to select the column first, then convert to a matrix, e.g. drop(t(df[,1,drop=FALSE])) That's true, but I was assuming a one-column data.frame, which can be achieved via: df - data.frame(VAR=nv,CHAR=letters[1:3],stringsAsFactors=FALSE) drop(t(df[1])) That said, I prefer the setNames() solution for its efficiency. Best, Josh Duncan Murdoch Martin Best, -- Joshua Ulrich | about.me/joshuaulrich FOSS Trading | www.fosstrading.com On Sat, Aug 18, 2012 at 10:03 AM, Martin Maechler maech...@stat.math.ethz.ch wrote: Today, I was looking for an elegant (and efficient) way to get a named (atomic) vector by selecting one column of a data frame. Of course, the vector names must be the rownames of the data frame. Ok, here is the quiz, I know one quite cute/slick answer, but was wondering if there are obvious better ones, and also if this should not become more idiomatic (hence R-devel): Consider this toy example, where the dataframe already has only one column : nv - c(a=1, d=17, e=101); nv a d e 1 17 101 df - as.data.frame(cbind(VAR = nv)); df VAR a 1 d 17 e 101 Now how, can I get 'nv' back from 'df' ? I.e., how to get identical(nv, ...) [1] TRUE where .. only uses 'df' (and no non-standard R packages)? As said, I know a simple solution (*), but I'm sure it is not obvious to most R users and probably not even to the majority of R-devel readers... OTOH, people like Bill Dunlap will not take long to provide it or a better one. (*) In my solution, the above '...' consists of 17 letters. I'll post it later today (CEST time) ... or confirm that someone else has done so. Martin __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Quiz: How to get a named column from a data frame
Yes, but either drop(t(df[,1,drop=TRUE])) or t(df[,1,drop=TRUE])[1,] does work. My minimal effort to check timings found that the first version was a hair faster. -- Bert On Sat, Aug 18, 2012 at 9:01 AM, Rui Barradas ruipbarra...@sapo.pt wrote: Hello, A bit more general nv - c(a=1, d=17, e=101); nv nv2 - c(a=a, d=d, e=e) df2 - data.frame(VAR = nv, CHAR = nv2); df2 identical( nv, drop(t( df2[1] )) ) # TRUE identical( nv, drop(t( df2[[1]] )) ) # FALSE Rui Barradas Em 18-08-2012 16:16, Joshua Ulrich escreveu: I don't know if this is better, but it's the most obvious/shortest I could come up with. Transpose the data.frame column to a 'row' vector and drop the dimensions. R identical(nv, drop(t(df))) [1] TRUE Best, -- Joshua Ulrich | about.me/joshuaulrich FOSS Trading | www.fosstrading.com On Sat, Aug 18, 2012 at 10:03 AM, Martin Maechler maech...@stat.math.ethz.ch wrote: Today, I was looking for an elegant (and efficient) way to get a named (atomic) vector by selecting one column of a data frame. Of course, the vector names must be the rownames of the data frame. Ok, here is the quiz, I know one quite cute/slick answer, but was wondering if there are obvious better ones, and also if this should not become more idiomatic (hence R-devel): Consider this toy example, where the dataframe already has only one column : nv - c(a=1, d=17, e=101); nv a d e 1 17 101 df - as.data.frame(cbind(VAR = nv)); df VAR a 1 d 17 e 101 Now how, can I get 'nv' back from 'df' ? I.e., how to get identical(nv, ...) [1] TRUE where .. only uses 'df' (and no non-standard R packages)? As said, I know a simple solution (*), but I'm sure it is not obvious to most R users and probably not even to the majority of R-devel readers... OTOH, people like Bill Dunlap will not take long to provide it or a better one. (*) In my solution, the above '...' consists of 17 letters. I'll post it later today (CEST time) ... or confirm that someone else has done so. Martin __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel -- Bert Gunter Genentech Nonclinical Biostatistics Internal Contact Info: Phone: 467-7374 Website: http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Quiz: How to get a named column from a data frame
Or to expand just a hair on Joshua's suggestion, is the following what you want: x - 1:10 names(x) - letters[1:10] x a b c d e f g h i j 1 2 3 4 5 6 7 8 9 10 df - data.frame(x=x,y=LETTERS[1:10],row.names=names(x)) df x y a 1 A b 2 B c 3 C d 4 D e 5 E f 6 F g 7 G h 8 H i 9 I j 10 J y - t(df[,1,drop=FALSE])[1,] y a b c d e f g h i j 1 2 3 4 5 6 7 8 9 10 identical(x,y) [1] TRUE Cheers, Bert On Sat, Aug 18, 2012 at 8:16 AM, Joshua Ulrich josh.m.ulr...@gmail.com wrote: I don't know if this is better, but it's the most obvious/shortest I could come up with. Transpose the data.frame column to a 'row' vector and drop the dimensions. R identical(nv, drop(t(df))) [1] TRUE Best, -- Joshua Ulrich | about.me/joshuaulrich FOSS Trading | www.fosstrading.com On Sat, Aug 18, 2012 at 10:03 AM, Martin Maechler maech...@stat.math.ethz.ch wrote: Today, I was looking for an elegant (and efficient) way to get a named (atomic) vector by selecting one column of a data frame. Of course, the vector names must be the rownames of the data frame. Ok, here is the quiz, I know one quite cute/slick answer, but was wondering if there are obvious better ones, and also if this should not become more idiomatic (hence R-devel): Consider this toy example, where the dataframe already has only one column : nv - c(a=1, d=17, e=101); nv a d e 1 17 101 df - as.data.frame(cbind(VAR = nv)); df VAR a 1 d 17 e 101 Now how, can I get 'nv' back from 'df' ? I.e., how to get identical(nv, ...) [1] TRUE where .. only uses 'df' (and no non-standard R packages)? As said, I know a simple solution (*), but I'm sure it is not obvious to most R users and probably not even to the majority of R-devel readers... OTOH, people like Bill Dunlap will not take long to provide it or a better one. (*) In my solution, the above '...' consists of 17 letters. I'll post it later today (CEST time) ... or confirm that someone else has done so. Martin __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel -- Bert Gunter Genentech Nonclinical Biostatistics Internal Contact Info: Phone: 467-7374 Website: http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Quiz: How to get a named column from a data frame
Sorry! -- Change that to drop = FALSE ! drop(t(df[,1,drop=FALSE])) t(df[,1,drop=FALSE])[1,] -- Bert On Sat, Aug 18, 2012 at 9:37 AM, Bert Gunter bgun...@gene.com wrote: Yes, but either drop(t(df[,1,drop=TRUE])) or t(df[,1,drop=TRUE])[1,] does work. My minimal effort to check timings found that the first version was a hair faster. -- Bert On Sat, Aug 18, 2012 at 9:01 AM, Rui Barradas ruipbarra...@sapo.pt wrote: Hello, A bit more general nv - c(a=1, d=17, e=101); nv nv2 - c(a=a, d=d, e=e) df2 - data.frame(VAR = nv, CHAR = nv2); df2 identical( nv, drop(t( df2[1] )) ) # TRUE identical( nv, drop(t( df2[[1]] )) ) # FALSE Rui Barradas Em 18-08-2012 16:16, Joshua Ulrich escreveu: I don't know if this is better, but it's the most obvious/shortest I could come up with. Transpose the data.frame column to a 'row' vector and drop the dimensions. R identical(nv, drop(t(df))) [1] TRUE Best, -- Joshua Ulrich | about.me/joshuaulrich FOSS Trading | www.fosstrading.com On Sat, Aug 18, 2012 at 10:03 AM, Martin Maechler maech...@stat.math.ethz.ch wrote: Today, I was looking for an elegant (and efficient) way to get a named (atomic) vector by selecting one column of a data frame. Of course, the vector names must be the rownames of the data frame. Ok, here is the quiz, I know one quite cute/slick answer, but was wondering if there are obvious better ones, and also if this should not become more idiomatic (hence R-devel): Consider this toy example, where the dataframe already has only one column : nv - c(a=1, d=17, e=101); nv a d e 1 17 101 df - as.data.frame(cbind(VAR = nv)); df VAR a 1 d 17 e 101 Now how, can I get 'nv' back from 'df' ? I.e., how to get identical(nv, ...) [1] TRUE where .. only uses 'df' (and no non-standard R packages)? As said, I know a simple solution (*), but I'm sure it is not obvious to most R users and probably not even to the majority of R-devel readers... OTOH, people like Bill Dunlap will not take long to provide it or a better one. (*) In my solution, the above '...' consists of 17 letters. I'll post it later today (CEST time) ... or confirm that someone else has done so. Martin __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel -- Bert Gunter Genentech Nonclinical Biostatistics Internal Contact Info: Phone: 467-7374 Website: http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm -- Bert Gunter Genentech Nonclinical Biostatistics Internal Contact Info: Phone: 467-7374 Website: http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Quiz: How to get a named column from a data frame
On 2012-08-18 11:03, Martin Maechler wrote: Today, I was looking for an elegant (and efficient) way to get a named (atomic) vector by selecting one column of a data frame. Of course, the vector names must be the rownames of the data frame. Ok, here is the quiz, I know one quite cute/slick answer, but was wondering if there are obvious better ones, and also if this should not become more idiomatic (hence R-devel): Consider this toy example, where the dataframe already has only one column : nv- c(a=1, d=17, e=101); nv a d e 1 17 101 df- as.data.frame(cbind(VAR = nv)); df VAR a 1 d 17 e 101 Now how, can I get 'nv' back from 'df' ? I.e., how to get identical(nv, ...) [1] TRUE where .. only uses 'df' (and no non-standard R packages)? As said, I know a simple solution (*), but I'm sure it is not obvious to most R users and probably not even to the majority of R-devel readers... OTOH, people like Bill Dunlap will not take long to provide it or a better one. (*) In my solution, the above '...' consists of 17 letters. I'll post it later today (CEST time) ... or confirm that someone else has done so. Martin For this purpose my private function library has a function withnames(): withnames(): Extract from data frame as a named vector Description: Extracts data from a data frame; if the result is a vector (i.e. we extracted a single column and did not specify 'drop=FALSE') it is assigned names derived from the row names of the data frame. Usage: withnames(expr) Arguments: expr: R expression. Details: 'expr' is evaluated in an environment in which the extractor functions '$.data.frame', '[.data.frame', and '[[.data.frame' are replaced by versions that attach the data frame's row names to an extracted vector. Value: 'expr', evaluated as described above. ## Code withnames-function(expr) { eval(substitute(expr), list( `[.data.frame` = function(x,i,...) { out-x[i,...] if (is.null(dim(out))) names(out)-row.names(x)[i] return(out)}, `[[.data.frame` = function(x,...) { out-x[[...]] if (is.null(dim(out))) names(out)-row.names(x) return(out)}, `$.data.frame` = function(x,name) { out-x[[name, exact=FALSE]] if (is.null(dim(out))) names(out)-row.names(x) return(out)} ), enclos=parent.frame()) } ## Examples dd - data.frame(aa=1:6, bb=letters[c(1,3,2,3,3,1)], row.names=LETTERS[1:6]) dd dd$aa # Unnamed vector withnames(dd$aa) # Named vector withnames(dd[[aa]]) # Named vector withnames(dd[2:4,aa])# Named vector withnames(dd$bb) # Factor with names withnames(outer(dd$a,dd$a))# Both dimensions have names ## But now I am looking for a version that will play nicely with with(): withnames(with(dd, aa)) # No names! with(dd, withnames(aa)) # No names! __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Quiz: How to get a named column from a data frame
On Sat, Aug 18, 2012 at 02:13:20PM -0400, Christian Brechb?hler wrote: On 8/18/12, Martin Maechler maech...@stat.math.ethz.ch wrote: On Sat, Aug 18, 2012 at 5:14 PM, Christian Brechb?hler wrote: On Sat, Aug 18, 2012 at 11:03 AM, Martin Maechler maech...@stat.math.ethz.ch wrote: Consider this toy example, where the dataframe already has only one column : nv - c(a=1, d=17, e=101); nv a d e 1 17 101 df - as.data.frame(cbind(VAR = nv)); df VAR a 1 d 17 e 101 Now how, can I get 'nv' back from 'df' ? I.e., how to get identical(nv, df[,1]) [1] TRUE But it is not a solution in a current version of R! though it's still interesting that df[,1] worked in some incantation of R. My mistake! We disliked some quirks of indexing, so we've long had our own patch for [.data.frame in place, which I used inadvertently. As I understand it, when when doing 'df[,1]' on a data frame, Bell Labs S and all versions of S-Plus prior to 3.4 always retained the data frame's row names as the names on the result vector. 'df[,1]' gave you a named vector identical to your 'nv' above. Then in 1996 with S-Plus 3.4, Insightful broke that behavior, after which 'df[,1]' returned a vector without any names. I believe R copied that late-1990s S-Plus behavior, but I don't know why exactly. When subscripting objects, R sometimes retains the object's dimnames as names in the result, and sometimes not, which I find frustrating. Personally, I think it would make much more sense if subscripting ALWAYS retained any names it could, and worked as similarly as possible across data frames, matrices, arrays, vectors, etc. After all, explicitly dropping names afterwards is trivial, while adding them back on is not. Back on 2005-10-19 with R 2.2.0, I gave a simple test of 15 cases; 4 of them dropped names during subscripting, the other 11 preseved them. That's towards the end of the discussion here: https://bugs.r-project.org/bugzilla3/show_bug.cgi?id=8192 Contrary to the initial tone of my old 2005 bug report, current R subscripting behavior is of course NOT a bug, as AFAIK it's working as the R Core Team intended. However, I definitely consider the current behavior a design infelicity. Just now on stock R 2.15.1 (with --vanilla), I ran an updated version of those same simple tests. Of 22 subscripting test cases, 7 lose names and 15 preserve them. (If anyone's interested in the specific tests, I can send them, or try to append them to that old 8192 feature request.) For what it's worth, at work, for years we ran various versions of pre-namespace R using some ugly patches of [ and [.data.frame to force name retention during subscripting. Since we were not using namespaces at all, those keep names subscripting hacks were affecting ALL R code we ran, not just our own custom code which needed and expected the names to be retained. Yet perhaps surprisingly, I don't think I ever ran into a single case where the forced retention of names broke any code. We of course ran only a tiny sample of the huge amount of code on CRAN, but that experience suggests that most R code which expects un-named objects doesn't mind at all if names are present. If anyone would genuinely like to add an option for name-preserving subscripting to R, I'm willing to work on it, so please do let me know your thoughts. So far though, I've never dug into the guts of the .Primitive([) and [.data.frame functions to see how/why they sometimes keep and sometime discard names during subscripting. -- Andrew Piskorski a...@piskorski.com __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Quiz: How to get a named column from a data frame
I don't know if this is better, but it's the most obvious/shortest I could come up with. Transpose the data.frame column to a 'row' vector and drop the dimensions. R identical(nv, drop(t(df))) [1] TRUE Best, -- Joshua Ulrich | about.me/joshuaulrich FOSS Trading | www.fosstrading.com On Sat, Aug 18, 2012 at 10:03 AM, Martin Maechler maech...@stat.math.ethz.ch wrote: Today, I was looking for an elegant (and efficient) way to get a named (atomic) vector by selecting one column of a data frame. Of course, the vector names must be the rownames of the data frame. Ok, here is the quiz, I know one quite cute/slick answer, but was wondering if there are obvious better ones, and also if this should not become more idiomatic (hence R-devel): Consider this toy example, where the dataframe already has only one column : nv - c(a=1, d=17, e=101); nv a d e 1 17 101 df - as.data.frame(cbind(VAR = nv)); df VAR a 1 d 17 e 101 Now how, can I get 'nv' back from 'df' ? I.e., how to get identical(nv, ...) [1] TRUE where .. only uses 'df' (and no non-standard R packages)? As said, I know a simple solution (*), but I'm sure it is not obvious to most R users and probably not even to the majority of R-devel readers... OTOH, people like Bill Dunlap will not take long to provide it or a better one. (*) In my solution, the above '...' consists of 17 letters. I'll post it later today (CEST time) ... or confirm that someone else has done so. Martin __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Quiz: How to get a named column from a data frame
Hello, A bit more general nv - c(a=1, d=17, e=101); nv nv2 - c(a=a, d=d, e=e) df2 - data.frame(VAR = nv, CHAR = nv2); df2 identical( nv, drop(t( df2[1] )) ) # TRUE identical( nv, drop(t( df2[[1]] )) ) # FALSE Rui Barradas Em 18-08-2012 16:16, Joshua Ulrich escreveu: I don't know if this is better, but it's the most obvious/shortest I could come up with. Transpose the data.frame column to a 'row' vector and drop the dimensions. R identical(nv, drop(t(df))) [1] TRUE Best, -- Joshua Ulrich | about.me/joshuaulrich FOSS Trading | www.fosstrading.com On Sat, Aug 18, 2012 at 10:03 AM, Martin Maechler maech...@stat.math.ethz.ch wrote: Today, I was looking for an elegant (and efficient) way to get a named (atomic) vector by selecting one column of a data frame. Of course, the vector names must be the rownames of the data frame. Ok, here is the quiz, I know one quite cute/slick answer, but was wondering if there are obvious better ones, and also if this should not become more idiomatic (hence R-devel): Consider this toy example, where the dataframe already has only one column : nv - c(a=1, d=17, e=101); nv a d e 1 17 101 df - as.data.frame(cbind(VAR = nv)); df VAR a 1 d 17 e 101 Now how, can I get 'nv' back from 'df' ? I.e., how to get identical(nv, ...) [1] TRUE where .. only uses 'df' (and no non-standard R packages)? As said, I know a simple solution (*), but I'm sure it is not obvious to most R users and probably not even to the majority of R-devel readers... OTOH, people like Bill Dunlap will not take long to provide it or a better one. (*) In my solution, the above '...' consists of 17 letters. I'll post it later today (CEST time) ... or confirm that someone else has done so. Martin __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Quiz: How to get a named column from a data frame
On Sat, Aug 18, 2012 at 5:14 PM, Christian Brechbühler wrote: On Sat, Aug 18, 2012 at 11:03 AM, Martin Maechler maech...@stat.math.ethz.ch wrote: Today, I was looking for an elegant (and efficient) way to get a named (atomic) vector by selecting one column of a data frame. Of course, the vector names must be the rownames of the data frame. Ok, here is the quiz, I know one quite cute/slick answer, but was wondering if there are obvious better ones, and also if this should not become more idiomatic (hence R-devel): Consider this toy example, where the dataframe already has only one column : nv - c(a=1, d=17, e=101); nv a d e 1 17 101 df - as.data.frame(cbind(VAR = nv)); df VAR a 1 d 17 e 101 Now how, can I get 'nv' back from 'df' ? I.e., how to get identical(nv, ...) [1] TRUE where .. only uses 'df' (and no non-standard R packages)? identical(nv, df[,1]) [1] TRUE In my solution, the above '...' consists of 17 letters. I count 6 in mine But it is not a solution in a current version of R! though it's still interesting that df[,1] worked in some incantation of R. What's your sessionInfo()? Martin /Christian __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Quiz: How to get a named column from a data frame
Joshua Ulrich josh.m.ulr...@gmail.com on Sat, 18 Aug 2012 10:16:09 -0500 writes: I don't know if this is better, but it's the most obvious/shortest I could come up with. Transpose the data.frame column to a 'row' vector and drop the dimensions. R identical(nv, drop(t(df))) [1] TRUE Yes, that's definitely shorter, congratulations! One gotta is that I'd want a solution that also works when the df has more columns than just one... Your idea to use t(.) is nice and perfect insofar as it coerces the data frame to a matrix, and that's really the clue: Where as df[,1] is losing the names, the matrix indexing is not. So your solution can be changed into t(df)[1,] which is even shorter... and slightly less efficient, at least conceptually, than mine, which has been as.matrix(df)[,1] Now, the remaining question is: Shouldn't there be something more natural to achieve that? (There is not, currently, AFAIK). Martin Best, -- Joshua Ulrich | about.me/joshuaulrich FOSS Trading | www.fosstrading.com On Sat, Aug 18, 2012 at 10:03 AM, Martin Maechler maech...@stat.math.ethz.ch wrote: Today, I was looking for an elegant (and efficient) way to get a named (atomic) vector by selecting one column of a data frame. Of course, the vector names must be the rownames of the data frame. Ok, here is the quiz, I know one quite cute/slick answer, but was wondering if there are obvious better ones, and also if this should not become more idiomatic (hence R-devel): Consider this toy example, where the dataframe already has only one column : nv - c(a=1, d=17, e=101); nv a d e 1 17 101 df - as.data.frame(cbind(VAR = nv)); df VAR a 1 d 17 e 101 Now how, can I get 'nv' back from 'df' ? I.e., how to get identical(nv, ...) [1] TRUE where .. only uses 'df' (and no non-standard R packages)? As said, I know a simple solution (*), but I'm sure it is not obvious to most R users and probably not even to the majority of R-devel readers... OTOH, people like Bill Dunlap will not take long to provide it or a better one. (*) In my solution, the above '...' consists of 17 letters. I'll post it later today (CEST time) ... or confirm that someone else has done so. Martin __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Quiz: How to get a named column from a data frame
On Sat, Aug 18, 2012 at 9:33 AM, Martin Maechler maech...@stat.math.ethz.ch wrote: Joshua Ulrich josh.m.ulr...@gmail.com on Sat, 18 Aug 2012 10:16:09 -0500 writes: I don't know if this is better, but it's the most obvious/shortest I could come up with. Transpose the data.frame column to a 'row' vector and drop the dimensions. R identical(nv, drop(t(df))) [1] TRUE Yes, that's definitely shorter, congratulations! One gotta is that I'd want a solution that also works when the df has more columns than just one... Your idea to use t(.) is nice and perfect insofar as it coerces the data frame to a matrix, and that's really the clue: Where as df[,1] is losing the names, the matrix indexing is not. So your solution can be changed into t(df)[1,] which is even shorter... and slightly less efficient, at least conceptually, than mine, which has been as.matrix(df)[,1] Now, the remaining question is: Shouldn't there be something more natural to achieve that? (There is not, currently, AFAIK). Perhaps a data frame method for as.vector? as.vector.data.frame - function(x, ...) as.matrix(x)[,1] as.vector(df[1]) or an additional argument to `[.data.frame` like keep.names, which defaults to FALSE to maintain current behavior but can optionally be TRUE. Cheers, Josh Martin Best, -- Joshua Ulrich | about.me/joshuaulrich FOSS Trading | www.fosstrading.com On Sat, Aug 18, 2012 at 10:03 AM, Martin Maechler maech...@stat.math.ethz.ch wrote: Today, I was looking for an elegant (and efficient) way to get a named (atomic) vector by selecting one column of a data frame. Of course, the vector names must be the rownames of the data frame. Ok, here is the quiz, I know one quite cute/slick answer, but was wondering if there are obvious better ones, and also if this should not become more idiomatic (hence R-devel): Consider this toy example, where the dataframe already has only one column : nv - c(a=1, d=17, e=101); nv a d e 1 17 101 df - as.data.frame(cbind(VAR = nv)); df VAR a 1 d 17 e 101 Now how, can I get 'nv' back from 'df' ? I.e., how to get identical(nv, ...) [1] TRUE where .. only uses 'df' (and no non-standard R packages)? As said, I know a simple solution (*), but I'm sure it is not obvious to most R users and probably not even to the majority of R-devel readers... OTOH, people like Bill Dunlap will not take long to provide it or a better one. (*) In my solution, the above '...' consists of 17 letters. I'll post it later today (CEST time) ... or confirm that someone else has done so. Martin __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel -- Joshua Wiley Ph.D. Student, Health Psychology Programmer Analyst II, Statistical Consulting Group University of California, Los Angeles https://joshuawiley.com/ __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Quiz: How to get a named column from a data frame
On 8/18/12, Martin Maechler maech...@stat.math.ethz.ch wrote: On Sat, Aug 18, 2012 at 5:14 PM, Christian Brechbühler wrote: On Sat, Aug 18, 2012 at 11:03 AM, Martin Maechler maech...@stat.math.ethz.ch wrote: Consider this toy example, where the dataframe already has only one column : nv - c(a=1, d=17, e=101); nv a d e 1 17 101 df - as.data.frame(cbind(VAR = nv)); df VAR a 1 d 17 e 101 Now how, can I get 'nv' back from 'df' ? I.e., how to get identical(nv, ...) [1] TRUE where .. only uses 'df' (and no non-standard R packages)? identical(nv, df[,1]) [1] TRUE But it is not a solution in a current version of R! though it's still interesting that df[,1] worked in some incantation of R. My mistake! We disliked some quirks of indexing, so we've long had our own patch for [.data.frame in place, which I used inadvertently. In essence, it does this: result - base::[.data.frame(df,,1, drop=F) if (drop length(ncol(result) 0) ncol(result)==1) { save.names - dimnames(result)[[1]] result - result[[1]] names(result) - save.names } That obviously violated your constraint no non-standard R packages. I apologize. Still, maybe the behavior of getting the named column would be desirable in general? /Christian __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Quiz: How to get a named column from a data frame
This isn't super-concise, but has the virtue of being clear: nv - c(a=1, d=17, e=101) df - as.data.frame(cbind(VAR = nv)) identical(nv, setNames(df$VAR, rownames(df))) # TRUE It seems to be more efficient than the other methods as well: f1 - function() setNames(df$VAR, rownames(df)) f2 - function() t(df)[1,] f3 - function() as.matrix(df)[,1] r - microbenchmark(f1(), f2(), f3(), times=1000) r # Unit: microseconds # exprmin lq median uq max # 1 f1() 14.589 17.0315 18.608 19.3220 89.388 # 2 f2() 68.057 70.8735 72.240 75.8065 3707.012 # 3 f3() 58.153 61.2600 62.521 65.0380 238.483 -Winston On Sat, Aug 18, 2012 at 10:03 AM, Martin Maechler maech...@stat.math.ethz.ch wrote: Today, I was looking for an elegant (and efficient) way to get a named (atomic) vector by selecting one column of a data frame. Of course, the vector names must be the rownames of the data frame. Ok, here is the quiz, I know one quite cute/slick answer, but was wondering if there are obvious better ones, and also if this should not become more idiomatic (hence R-devel): Consider this toy example, where the dataframe already has only one column : nv - c(a=1, d=17, e=101); nv a d e 1 17 101 df - as.data.frame(cbind(VAR = nv)); df VAR a 1 d 17 e 101 Now how, can I get 'nv' back from 'df' ? I.e., how to get identical(nv, ...) [1] TRUE where .. only uses 'df' (and no non-standard R packages)? As said, I know a simple solution (*), but I'm sure it is not obvious to most R users and probably not even to the majority of R-devel readers... OTOH, people like Bill Dunlap will not take long to provide it or a better one. (*) In my solution, the above '...' consists of 17 letters. I'll post it later today (CEST time) ... or confirm that someone else has done so. Martin __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Quiz: How to get a named column from a data frame
On Sat, Aug 18, 2012 at 10:03 AM, Martin Maechler maech...@stat.math.ethz.ch wrote: Today, I was looking for an elegant (and efficient) way to get a named (atomic) vector by selecting one column of a data frame. Of course, the vector names must be the rownames of the data frame. Ok, here is the quiz, I know one quite cute/slick answer, but was wondering if there are obvious better ones, and also if this should not become more idiomatic (hence R-devel): Consider this toy example, where the dataframe already has only one column : nv - c(a=1, d=17, e=101); nv a d e 1 17 101 df - as.data.frame(cbind(VAR = nv)); df VAR a 1 d 17 e 101 Now how, can I get 'nv' back from 'df' ? I.e., how to get identical(nv, ...) [1] TRUE where .. only uses 'df' (and no non-standard R packages)? As said, I know a simple solution (*), but I'm sure it is not obvious to most R users and probably not even to the majority of R-devel readers... OTOH, people like Bill Dunlap will not take long to provide it or a better one. But aren't you making life difficult for yourself by not using I ? df - data.frame(VAR = I(nv)) str(df[[1]]) (which isn't quite identically because it now has the AsIs class) Hadley -- Assistant Professor Department of Statistics / Rice University http://had.co.nz/ __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Quiz: How to get a named column from a data frame
That would have been essentially my suggestion as well. I prefer its clarity (and speed). I didn't know if you wanted your solution to also apply to matrices embedded in data.frames. In S+ rownames-() works on vectors (because it calls the generic rowId-()) so the following works: f4 - function(df, column) { tmp - df[[column]] ; rownames(tmp) - rownames(df) ; tmp} nv - c(a=1,d=17,e=101) df - data.frame(VAR=nv, Two=3^(1:3)) f4(df, 2) a d e 3 9 27 df$Matrix - matrix(1001:1006, ncol=2, nrow=3) f4(df, Matrix) [,1] [,2] a 1001 1004 d 1002 1005 e 1003 1006 I forget if R has something like rowIds() (it is to names and rownames as NROW is to length and nrow). Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com -Original Message- From: r-devel-boun...@r-project.org [mailto:r-devel-boun...@r-project.org] On Behalf Of Winston Chang Sent: Saturday, August 18, 2012 11:54 AM To: Martin Maechler Cc: R. Devel List Subject: Re: [Rd] Quiz: How to get a named column from a data frame This isn't super-concise, but has the virtue of being clear: nv - c(a=1, d=17, e=101) df - as.data.frame(cbind(VAR = nv)) identical(nv, setNames(df$VAR, rownames(df))) # TRUE It seems to be more efficient than the other methods as well: f1 - function() setNames(df$VAR, rownames(df)) f2 - function() t(df)[1,] f3 - function() as.matrix(df)[,1] r - microbenchmark(f1(), f2(), f3(), times=1000) r # Unit: microseconds # exprmin lq median uq max # 1 f1() 14.589 17.0315 18.608 19.3220 89.388 # 2 f2() 68.057 70.8735 72.240 75.8065 3707.012 # 3 f3() 58.153 61.2600 62.521 65.0380 238.483 -Winston On Sat, Aug 18, 2012 at 10:03 AM, Martin Maechler maech...@stat.math.ethz.ch wrote: Today, I was looking for an elegant (and efficient) way to get a named (atomic) vector by selecting one column of a data frame. Of course, the vector names must be the rownames of the data frame. Ok, here is the quiz, I know one quite cute/slick answer, but was wondering if there are obvious better ones, and also if this should not become more idiomatic (hence R-devel): Consider this toy example, where the dataframe already has only one column : nv - c(a=1, d=17, e=101); nv a d e 1 17 101 df - as.data.frame(cbind(VAR = nv)); df VAR a 1 d 17 e 101 Now how, can I get 'nv' back from 'df' ? I.e., how to get identical(nv, ...) [1] TRUE where .. only uses 'df' (and no non-standard R packages)? As said, I know a simple solution (*), but I'm sure it is not obvious to most R users and probably not even to the majority of R-devel readers... OTOH, people like Bill Dunlap will not take long to provide it or a better one. (*) In my solution, the above '...' consists of 17 letters. I'll post it later today (CEST time) ... or confirm that someone else has done so. Martin __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel