Re: [Rd] Quiz: How to get a named column from a data frame

2012-08-22 Thread Duncan Murdoch

On 12-08-18 12:33 PM, Martin Maechler wrote:

Joshua Ulrich josh.m.ulr...@gmail.com
 on Sat, 18 Aug 2012 10:16:09 -0500 writes:


  I don't know if this is better, but it's the most obvious/shortest I
  could come up with.  Transpose the data.frame column to a 'row' vector
  and drop the dimensions.

 R identical(nv, drop(t(df)))
  [1] TRUE

Yes, that's definitely shorter,
congratulations!

One gotta is that I'd want a solution that also works when the
df has more columns than just one...

Your idea to use  t(.) is nice and perfect insofar as it
coerces the data frame to a matrix, and that's really the clue:

Where as  df[,1]  is losing the names,
the matrix indexing is not.
So your solution can be changed into

  t(df)[1,]

which is even shorter...
and slightly less efficient, at least conceptually, than mine, which has
been

as.matrix(df)[,1]

Now, the remaining question is:  Shouldn't there be something
more natural to achieve that?
(There is not, currently, AFAIK).


I've been offline, so I'm a bit late to this game, but the examples 
above fail when df contains a character column as well as the desired 
one, because everything gets coerced to a character matrix.  You need to 
select the column first, then convert to a matrix, e.g.


drop(t(df[,1,drop=FALSE]))

Duncan Murdoch



Martin


  Best,
  --
  Joshua Ulrich  |  about.me/joshuaulrich
  FOSS Trading  |  www.fosstrading.com


  On Sat, Aug 18, 2012 at 10:03 AM, Martin Maechler
  maech...@stat.math.ethz.ch wrote:
  Today, I was looking for an elegant (and efficient) way to get a named
  (atomic) vector by selecting one column of a data frame.  Of course,
  the vector names must be the rownames of the data frame.
 
  Ok, here is the quiz, I know one quite cute/slick answer, but was
  wondering if there are obvious better ones, and also if this should
  not become more idiomatic (hence R-devel):
 
  Consider this toy example, where the dataframe already has only one
  column :
 
  nv - c(a=1, d=17, e=101); nv
  a   d   e
  1  17 101
 
  df - as.data.frame(cbind(VAR = nv)); df
  VAR
  a   1
  d  17
  e 101
 
  Now how, can I get 'nv' back from 'df' ?   I.e., how to get
 
  identical(nv, ...)
  [1] TRUE
 
  where .. only uses 'df' (and no non-standard R packages)?
 
  As said, I know a simple solution (*), but I'm sure it is not
  obvious to most R users and probably not even to the majority of
  R-devel readers... OTOH, people like Bill Dunlap will not take
  long to provide it or a better one.
 
  (*) In my solution, the above '...' consists of 17 letters.
  I'll post it later today (CEST time) ... or confirm
  that someone else has done so.
 
  Martin
 
  __
  R-devel@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel



__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Quiz: How to get a named column from a data frame

2012-08-22 Thread Joshua Ulrich
On Tue, Aug 21, 2012 at 2:34 PM, Duncan Murdoch
murdoch.dun...@gmail.com wrote:
 On 12-08-18 12:33 PM, Martin Maechler wrote:

 Joshua Ulrich josh.m.ulr...@gmail.com
  on Sat, 18 Aug 2012 10:16:09 -0500 writes:


   I don't know if this is better, but it's the most obvious/shortest
 I
   could come up with.  Transpose the data.frame column to a 'row'
 vector
   and drop the dimensions.

  R identical(nv, drop(t(df)))
   [1] TRUE

 Yes, that's definitely shorter,
 congratulations!

 One gotta is that I'd want a solution that also works when the
 df has more columns than just one...

 Your idea to use  t(.) is nice and perfect insofar as it
 coerces the data frame to a matrix, and that's really the clue:

 Where as  df[,1]  is losing the names,
 the matrix indexing is not.
 So your solution can be changed into

   t(df)[1,]

 which is even shorter...
 and slightly less efficient, at least conceptually, than mine, which has
 been

 as.matrix(df)[,1]

 Now, the remaining question is:  Shouldn't there be something
 more natural to achieve that?
 (There is not, currently, AFAIK).


 I've been offline, so I'm a bit late to this game, but the examples above
 fail when df contains a character column as well as the desired one, because
 everything gets coerced to a character matrix.  You need to select the
 column first, then convert to a matrix, e.g.

 drop(t(df[,1,drop=FALSE]))


That's true, but I was assuming a one-column data.frame, which can be
achieved via:
df - data.frame(VAR=nv,CHAR=letters[1:3],stringsAsFactors=FALSE)
drop(t(df[1]))

That said, I prefer the setNames() solution for its efficiency.

Best,
Josh

 Duncan Murdoch



 Martin


   Best,
   --
   Joshua Ulrich  |  about.me/joshuaulrich
   FOSS Trading  |  www.fosstrading.com


   On Sat, Aug 18, 2012 at 10:03 AM, Martin Maechler
   maech...@stat.math.ethz.ch wrote:
   Today, I was looking for an elegant (and efficient) way to get a
 named
   (atomic) vector by selecting one column of a data frame.  Of
 course,
   the vector names must be the rownames of the data frame.
  
   Ok, here is the quiz, I know one quite cute/slick answer, but
 was
   wondering if there are obvious better ones, and also if this
 should
   not become more idiomatic (hence R-devel):
  
   Consider this toy example, where the dataframe already has only
 one
   column :
  
   nv - c(a=1, d=17, e=101); nv
   a   d   e
   1  17 101
  
   df - as.data.frame(cbind(VAR = nv)); df
   VAR
   a   1
   d  17
   e 101
  
   Now how, can I get 'nv' back from 'df' ?   I.e., how to get
  
   identical(nv, ...)
   [1] TRUE
  
   where .. only uses 'df' (and no non-standard R packages)?
  
   As said, I know a simple solution (*), but I'm sure it is not
   obvious to most R users and probably not even to the majority of
   R-devel readers... OTOH, people like Bill Dunlap will not take
   long to provide it or a better one.
  
   (*) In my solution, the above '...' consists of 17 letters.
   I'll post it later today (CEST time) ... or confirm
   that someone else has done so.
  
   Martin
  
   __
   R-devel@r-project.org mailing list
   https://stat.ethz.ch/mailman/listinfo/r-devel

 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel



__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Quiz: How to get a named column from a data frame

2012-08-19 Thread Bert Gunter
Yes, but either

drop(t(df[,1,drop=TRUE]))

or

t(df[,1,drop=TRUE])[1,]

does work. My minimal effort to check timings found that the first
version was a hair faster.

-- Bert

On Sat, Aug 18, 2012 at 9:01 AM, Rui Barradas ruipbarra...@sapo.pt wrote:
 Hello,

 A bit more general

 nv - c(a=1, d=17, e=101); nv
 nv2 - c(a=a, d=d, e=e)
 df2 - data.frame(VAR = nv, CHAR = nv2); df2

 identical( nv, drop(t( df2[1] )) )   # TRUE
 identical( nv, drop(t( df2[[1]] )) ) # FALSE

 Rui Barradas

 Em 18-08-2012 16:16, Joshua Ulrich escreveu:

 I don't know if this is better, but it's the most obvious/shortest I
 could come up with.  Transpose the data.frame column to a 'row' vector
 and drop the dimensions.

 R identical(nv, drop(t(df)))
 [1] TRUE

 Best,
 --
 Joshua Ulrich  |  about.me/joshuaulrich
 FOSS Trading  |  www.fosstrading.com


 On Sat, Aug 18, 2012 at 10:03 AM, Martin Maechler
 maech...@stat.math.ethz.ch wrote:

 Today, I was looking for an elegant (and efficient) way
 to get a named (atomic) vector by selecting one column of a data frame.
 Of course, the vector names must be the rownames of the data frame.

 Ok, here is the quiz, I know one quite cute/slick answer, but was
 wondering if there are obvious better ones, and
 also if this should not become more idiomatic (hence R-devel):

 Consider this toy example, where the dataframe already has only
 one column :

 nv - c(a=1, d=17, e=101); nv

a   d   e
1  17 101

 df - as.data.frame(cbind(VAR = nv)); df

VAR
 a   1
 d  17
 e 101

 Now how, can I get 'nv' back from 'df' ?   I.e., how to get

 identical(nv, ...)

 [1] TRUE

 where .. only uses 'df' (and no non-standard R packages)?

 As said, I know a simple solution (*), but I'm sure it is not
 obvious to most R users and probably not even to the majority of
 R-devel readers... OTOH, people like Bill Dunlap will not take
 long to provide it or a better one.

 (*) In my solution, the above '...' consists of 17 letters.
 I'll post it later today (CEST time) ... or confirm
 that someone else has done so.

 Martin

 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel

 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel


 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel



-- 

Bert Gunter
Genentech Nonclinical Biostatistics

Internal Contact Info:
Phone: 467-7374
Website:
http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Quiz: How to get a named column from a data frame

2012-08-19 Thread Bert Gunter
Or to expand just a hair on Joshua's suggestion, is the following what you want:

 x - 1:10
 names(x) - letters[1:10]
 x
 a  b  c  d  e  f  g  h  i  j
 1  2  3  4  5  6  7  8  9 10
 df - data.frame(x=x,y=LETTERS[1:10],row.names=names(x))
 df
   x y
a  1 A
b  2 B
c  3 C
d  4 D
e  5 E
f  6 F
g  7 G
h  8 H
i  9 I
j 10 J
 y - t(df[,1,drop=FALSE])[1,]
 y
 a  b  c  d  e  f  g  h  i  j
 1  2  3  4  5  6  7  8  9 10
 identical(x,y)
[1] TRUE

Cheers,
Bert


On Sat, Aug 18, 2012 at 8:16 AM, Joshua Ulrich josh.m.ulr...@gmail.com wrote:
 I don't know if this is better, but it's the most obvious/shortest I
 could come up with.  Transpose the data.frame column to a 'row' vector
 and drop the dimensions.

 R identical(nv, drop(t(df)))
 [1] TRUE

 Best,
 --
 Joshua Ulrich  |  about.me/joshuaulrich
 FOSS Trading  |  www.fosstrading.com


 On Sat, Aug 18, 2012 at 10:03 AM, Martin Maechler
 maech...@stat.math.ethz.ch wrote:
 Today, I was looking for an elegant (and efficient) way
 to get a named (atomic) vector by selecting one column of a data frame.
 Of course, the vector names must be the rownames of the data frame.

 Ok, here is the quiz, I know one quite cute/slick answer, but was
 wondering if there are obvious better ones, and
 also if this should not become more idiomatic (hence R-devel):

 Consider this toy example, where the dataframe already has only
 one column :

 nv - c(a=1, d=17, e=101); nv
   a   d   e
   1  17 101

 df - as.data.frame(cbind(VAR = nv)); df
   VAR
 a   1
 d  17
 e 101

 Now how, can I get 'nv' back from 'df' ?   I.e., how to get

 identical(nv, ...)
 [1] TRUE

 where .. only uses 'df' (and no non-standard R packages)?

 As said, I know a simple solution (*), but I'm sure it is not
 obvious to most R users and probably not even to the majority of
 R-devel readers... OTOH, people like Bill Dunlap will not take
 long to provide it or a better one.

 (*) In my solution, the above '...' consists of 17 letters.
 I'll post it later today (CEST time) ... or confirm
 that someone else has done so.

 Martin

 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel

 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel



-- 

Bert Gunter
Genentech Nonclinical Biostatistics

Internal Contact Info:
Phone: 467-7374
Website:
http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Quiz: How to get a named column from a data frame

2012-08-19 Thread Bert Gunter
Sorry! -- Change that to drop = FALSE  !

 drop(t(df[,1,drop=FALSE]))
 t(df[,1,drop=FALSE])[1,]

-- Bert

On Sat, Aug 18, 2012 at 9:37 AM, Bert Gunter bgun...@gene.com wrote:
 Yes, but either

 drop(t(df[,1,drop=TRUE]))

 or

 t(df[,1,drop=TRUE])[1,]

 does work. My minimal effort to check timings found that the first
 version was a hair faster.

 -- Bert

 On Sat, Aug 18, 2012 at 9:01 AM, Rui Barradas ruipbarra...@sapo.pt wrote:
 Hello,

 A bit more general

 nv - c(a=1, d=17, e=101); nv
 nv2 - c(a=a, d=d, e=e)
 df2 - data.frame(VAR = nv, CHAR = nv2); df2

 identical( nv, drop(t( df2[1] )) )   # TRUE
 identical( nv, drop(t( df2[[1]] )) ) # FALSE

 Rui Barradas

 Em 18-08-2012 16:16, Joshua Ulrich escreveu:

 I don't know if this is better, but it's the most obvious/shortest I
 could come up with.  Transpose the data.frame column to a 'row' vector
 and drop the dimensions.

 R identical(nv, drop(t(df)))
 [1] TRUE

 Best,
 --
 Joshua Ulrich  |  about.me/joshuaulrich
 FOSS Trading  |  www.fosstrading.com


 On Sat, Aug 18, 2012 at 10:03 AM, Martin Maechler
 maech...@stat.math.ethz.ch wrote:

 Today, I was looking for an elegant (and efficient) way
 to get a named (atomic) vector by selecting one column of a data frame.
 Of course, the vector names must be the rownames of the data frame.

 Ok, here is the quiz, I know one quite cute/slick answer, but was
 wondering if there are obvious better ones, and
 also if this should not become more idiomatic (hence R-devel):

 Consider this toy example, where the dataframe already has only
 one column :

 nv - c(a=1, d=17, e=101); nv

a   d   e
1  17 101

 df - as.data.frame(cbind(VAR = nv)); df

VAR
 a   1
 d  17
 e 101

 Now how, can I get 'nv' back from 'df' ?   I.e., how to get

 identical(nv, ...)

 [1] TRUE

 where .. only uses 'df' (and no non-standard R packages)?

 As said, I know a simple solution (*), but I'm sure it is not
 obvious to most R users and probably not even to the majority of
 R-devel readers... OTOH, people like Bill Dunlap will not take
 long to provide it or a better one.

 (*) In my solution, the above '...' consists of 17 letters.
 I'll post it later today (CEST time) ... or confirm
 that someone else has done so.

 Martin

 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel

 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel


 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel



 --

 Bert Gunter
 Genentech Nonclinical Biostatistics

 Internal Contact Info:
 Phone: 467-7374
 Website:
 http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm



-- 

Bert Gunter
Genentech Nonclinical Biostatistics

Internal Contact Info:
Phone: 467-7374
Website:
http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Quiz: How to get a named column from a data frame

2012-08-19 Thread J. R. M. Hosking

On 2012-08-18 11:03, Martin Maechler wrote:

Today, I was looking for an elegant (and efficient) way
to get a named (atomic) vector by selecting one column of a data frame.
Of course, the vector names must be the rownames of the data frame.

Ok, here is the quiz, I know one quite cute/slick answer, but was
wondering if there are obvious better ones, and
also if this should not become more idiomatic (hence R-devel):

Consider this toy example, where the dataframe already has only
one column :


nv- c(a=1, d=17, e=101); nv

   a   d   e
   1  17 101


df- as.data.frame(cbind(VAR = nv)); df

   VAR
a   1
d  17
e 101

Now how, can I get 'nv' back from 'df' ?   I.e., how to get


identical(nv, ...)

[1] TRUE

where .. only uses 'df' (and no non-standard R packages)?

As said, I know a simple solution (*), but I'm sure it is not
obvious to most R users and probably not even to the majority of
R-devel readers... OTOH, people like Bill Dunlap will not take
long to provide it or a better one.

(*) In my solution, the above '...' consists of 17 letters.
I'll post it later today (CEST time) ... or confirm
that someone else has done so.

Martin


For this purpose my private function library has a function withnames():

withnames(): Extract from data frame as a named vector

Description: Extracts data from a data frame; if the result is a vector
(i.e. we extracted a single column and did not specify 'drop=FALSE')
it is assigned names derived from the row names of the data frame.

Usage: withnames(expr)

Arguments: expr: R expression.

Details: 'expr' is evaluated in an environment in which the extractor
functions '$.data.frame', '[.data.frame', and '[[.data.frame' are
replaced by versions that attach the data frame's row names to an
extracted vector.

Value: 'expr', evaluated as described above.

## Code

withnames-function(expr) {
  eval(substitute(expr),
  list(
`[.data.frame` = function(x,i,...) {
  out-x[i,...]
  if (is.null(dim(out))) names(out)-row.names(x)[i]
  return(out)},
`[[.data.frame` = function(x,...) {
  out-x[[...]]
  if (is.null(dim(out))) names(out)-row.names(x)
  return(out)},
`$.data.frame` = function(x,name) {
  out-x[[name, exact=FALSE]]
  if (is.null(dim(out))) names(out)-row.names(x)
  return(out)}
),
  enclos=parent.frame())
}

## Examples

dd - data.frame(aa=1:6, bb=letters[c(1,3,2,3,3,1)],
  row.names=LETTERS[1:6])
dd
dd$aa  # Unnamed vector
withnames(dd$aa)   # Named vector
withnames(dd[[aa]])  # Named vector
withnames(dd[2:4,aa])# Named vector
withnames(dd$bb)   # Factor with names
withnames(outer(dd$a,dd$a))# Both dimensions have names

## But now I am looking for a version that will play nicely with with():

withnames(with(dd, aa))  # No names!
with(dd, withnames(aa))  # No names!

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Quiz: How to get a named column from a data frame

2012-08-19 Thread Andrew Piskorski
On Sat, Aug 18, 2012 at 02:13:20PM -0400, Christian Brechb?hler wrote:
 On 8/18/12, Martin Maechler maech...@stat.math.ethz.ch wrote:
  On Sat, Aug 18, 2012 at 5:14 PM, Christian Brechb?hler  wrote:
  On Sat, Aug 18, 2012 at 11:03 AM, Martin Maechler
  maech...@stat.math.ethz.ch wrote:
 
  Consider this toy example, where the dataframe already has only
  one column :
 
   nv - c(a=1, d=17, e=101); nv
a   d   e
1  17 101
 
   df - as.data.frame(cbind(VAR = nv)); df
VAR
  a   1
  d  17
  e 101
 
  Now how, can I get 'nv' back from 'df' ?   I.e., how to get

  identical(nv, df[,1])
  [1] TRUE
 
  But it is not a solution in a current version of R!
  though it's still interesting that   df[,1]  worked in some incantation of
  R.
 
 My mistake!  We disliked some quirks of indexing, so we've long had
 our own patch for [.data.frame in place, which I used inadvertently.

As I understand it, when when doing 'df[,1]' on a data frame, Bell
Labs S and all versions of S-Plus prior to 3.4 always retained the
data frame's row names as the names on the result vector.  'df[,1]'
gave you a named vector identical to your 'nv' above.  Then in 1996
with S-Plus 3.4, Insightful broke that behavior, after which 'df[,1]'
returned a vector without any names.  I believe R copied that
late-1990s S-Plus behavior, but I don't know why exactly.

When subscripting objects, R sometimes retains the object's dimnames
as names in the result, and sometimes not, which I find frustrating.
Personally, I think it would make much more sense if subscripting
ALWAYS retained any names it could, and worked as similarly as
possible across data frames, matrices, arrays, vectors, etc.  After
all, explicitly dropping names afterwards is trivial, while adding
them back on is not.

Back on 2005-10-19 with R 2.2.0, I gave a simple test of 15 cases; 4
of them dropped names during subscripting, the other 11 preseved them.
That's towards the end of the discussion here:

  https://bugs.r-project.org/bugzilla3/show_bug.cgi?id=8192

Contrary to the initial tone of my old 2005 bug report, current R
subscripting behavior is of course NOT a bug, as AFAIK it's working as
the R Core Team intended.  However, I definitely consider the current
behavior a design infelicity.

Just now on stock R 2.15.1 (with --vanilla), I ran an updated version
of those same simple tests.  Of 22 subscripting test cases, 7 lose
names and 15 preserve them.  (If anyone's interested in the specific
tests, I can send them, or try to append them to that old 8192 feature
request.)

For what it's worth, at work, for years we ran various versions of
pre-namespace R using some ugly patches of [ and [.data.frame to
force name retention during subscripting.  Since we were not using
namespaces at all, those keep names subscripting hacks were
affecting ALL R code we ran, not just our own custom code which needed
and expected the names to be retained.  Yet perhaps surprisingly, I
don't think I ever ran into a single case where the forced retention
of names broke any code.  We of course ran only a tiny sample of the
huge amount of code on CRAN, but that experience suggests that most R
code which expects un-named objects doesn't mind at all if names are
present.

If anyone would genuinely like to add an option for name-preserving
subscripting to R, I'm willing to work on it, so please do let me know
your thoughts.  So far though, I've never dug into the guts of the
.Primitive([) and [.data.frame functions to see how/why they
sometimes keep and sometime discard names during subscripting.

-- 
Andrew Piskorski a...@piskorski.com

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Quiz: How to get a named column from a data frame

2012-08-18 Thread Martin Maechler
Today, I was looking for an elegant (and efficient) way
to get a named (atomic) vector by selecting one column of a data frame.
Of course, the vector names must be the rownames of the data frame.

Ok, here is the quiz, I know one quite cute/slick answer, but was
wondering if there are obvious better ones, and
also if this should not become more idiomatic (hence R-devel):

Consider this toy example, where the dataframe already has only
one column :

 nv - c(a=1, d=17, e=101); nv
  a   d   e 
  1  17 101 

 df - as.data.frame(cbind(VAR = nv)); df
  VAR
a   1
d  17
e 101

Now how, can I get 'nv' back from 'df' ?   I.e., how to get

 identical(nv, ...)
[1] TRUE

where .. only uses 'df' (and no non-standard R packages)?

As said, I know a simple solution (*), but I'm sure it is not
obvious to most R users and probably not even to the majority of
R-devel readers... OTOH, people like Bill Dunlap will not take
long to provide it or a better one.

(*) In my solution, the above '...' consists of 17 letters.
I'll post it later today (CEST time) ... or confirm
that someone else has done so.

Martin

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Quiz: How to get a named column from a data frame

2012-08-18 Thread Joshua Ulrich
I don't know if this is better, but it's the most obvious/shortest I
could come up with.  Transpose the data.frame column to a 'row' vector
and drop the dimensions.

R identical(nv, drop(t(df)))
[1] TRUE

Best,
--
Joshua Ulrich  |  about.me/joshuaulrich
FOSS Trading  |  www.fosstrading.com


On Sat, Aug 18, 2012 at 10:03 AM, Martin Maechler
maech...@stat.math.ethz.ch wrote:
 Today, I was looking for an elegant (and efficient) way
 to get a named (atomic) vector by selecting one column of a data frame.
 Of course, the vector names must be the rownames of the data frame.

 Ok, here is the quiz, I know one quite cute/slick answer, but was
 wondering if there are obvious better ones, and
 also if this should not become more idiomatic (hence R-devel):

 Consider this toy example, where the dataframe already has only
 one column :

 nv - c(a=1, d=17, e=101); nv
   a   d   e
   1  17 101

 df - as.data.frame(cbind(VAR = nv)); df
   VAR
 a   1
 d  17
 e 101

 Now how, can I get 'nv' back from 'df' ?   I.e., how to get

 identical(nv, ...)
 [1] TRUE

 where .. only uses 'df' (and no non-standard R packages)?

 As said, I know a simple solution (*), but I'm sure it is not
 obvious to most R users and probably not even to the majority of
 R-devel readers... OTOH, people like Bill Dunlap will not take
 long to provide it or a better one.

 (*) In my solution, the above '...' consists of 17 letters.
 I'll post it later today (CEST time) ... or confirm
 that someone else has done so.

 Martin

 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Quiz: How to get a named column from a data frame

2012-08-18 Thread Rui Barradas

Hello,

A bit more general

nv - c(a=1, d=17, e=101); nv
nv2 - c(a=a, d=d, e=e)
df2 - data.frame(VAR = nv, CHAR = nv2); df2

identical( nv, drop(t( df2[1] )) )   # TRUE
identical( nv, drop(t( df2[[1]] )) ) # FALSE

Rui Barradas

Em 18-08-2012 16:16, Joshua Ulrich escreveu:

I don't know if this is better, but it's the most obvious/shortest I
could come up with.  Transpose the data.frame column to a 'row' vector
and drop the dimensions.

R identical(nv, drop(t(df)))
[1] TRUE

Best,
--
Joshua Ulrich  |  about.me/joshuaulrich
FOSS Trading  |  www.fosstrading.com


On Sat, Aug 18, 2012 at 10:03 AM, Martin Maechler
maech...@stat.math.ethz.ch wrote:

Today, I was looking for an elegant (and efficient) way
to get a named (atomic) vector by selecting one column of a data frame.
Of course, the vector names must be the rownames of the data frame.

Ok, here is the quiz, I know one quite cute/slick answer, but was
wondering if there are obvious better ones, and
also if this should not become more idiomatic (hence R-devel):

Consider this toy example, where the dataframe already has only
one column :


nv - c(a=1, d=17, e=101); nv

   a   d   e
   1  17 101


df - as.data.frame(cbind(VAR = nv)); df

   VAR
a   1
d  17
e 101

Now how, can I get 'nv' back from 'df' ?   I.e., how to get


identical(nv, ...)

[1] TRUE

where .. only uses 'df' (and no non-standard R packages)?

As said, I know a simple solution (*), but I'm sure it is not
obvious to most R users and probably not even to the majority of
R-devel readers... OTOH, people like Bill Dunlap will not take
long to provide it or a better one.

(*) In my solution, the above '...' consists of 17 letters.
I'll post it later today (CEST time) ... or confirm
that someone else has done so.

Martin

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Quiz: How to get a named column from a data frame

2012-08-18 Thread Martin Maechler
On Sat, Aug 18, 2012 at 5:14 PM, Christian Brechbühler  wrote:
 On Sat, Aug 18, 2012 at 11:03 AM, Martin Maechler
 maech...@stat.math.ethz.ch wrote:

 Today, I was looking for an elegant (and efficient) way
 to get a named (atomic) vector by selecting one column of a data frame.
 Of course, the vector names must be the rownames of the data frame.

 Ok, here is the quiz, I know one quite cute/slick answer, but was
 wondering if there are obvious better ones, and
 also if this should not become more idiomatic (hence R-devel):

 Consider this toy example, where the dataframe already has only
 one column :

  nv - c(a=1, d=17, e=101); nv
   a   d   e
   1  17 101

  df - as.data.frame(cbind(VAR = nv)); df
   VAR
 a   1
 d  17
 e 101

 Now how, can I get 'nv' back from 'df' ?   I.e., how to get

  identical(nv, ...)
 [1] TRUE

 where .. only uses 'df' (and no non-standard R packages)?


 identical(nv, df[,1])
 [1] TRUE

 In my solution, the above '...' consists of 17 letters.


 I count 6 in mine

But it is not a solution in a current version of R!
though it's still interesting that   df[,1]  worked in some incantation of R.

What's your sessionInfo()?
Martin


 /Christian

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Quiz: How to get a named column from a data frame

2012-08-18 Thread Martin Maechler
 Joshua Ulrich josh.m.ulr...@gmail.com
 on Sat, 18 Aug 2012 10:16:09 -0500 writes:

 I don't know if this is better, but it's the most obvious/shortest I
 could come up with.  Transpose the data.frame column to a 'row' vector
 and drop the dimensions.

R identical(nv, drop(t(df)))
 [1] TRUE

Yes, that's definitely shorter,
congratulations!

One gotta is that I'd want a solution that also works when the
df has more columns than just one...

Your idea to use  t(.) is nice and perfect insofar as it
coerces the data frame to a matrix, and that's really the clue:

Where as  df[,1]  is losing the names,  
the matrix indexing is not.
So your solution can be changed into

 t(df)[1,]

which is even shorter...
and slightly less efficient, at least conceptually, than mine, which has
been

   as.matrix(df)[,1]

Now, the remaining question is:  Shouldn't there be something
more natural to achieve that?
(There is not, currently, AFAIK).

Martin


 Best,
 --
 Joshua Ulrich  |  about.me/joshuaulrich
 FOSS Trading  |  www.fosstrading.com


 On Sat, Aug 18, 2012 at 10:03 AM, Martin Maechler
 maech...@stat.math.ethz.ch wrote:
 Today, I was looking for an elegant (and efficient) way to get a named
 (atomic) vector by selecting one column of a data frame.  Of course,
 the vector names must be the rownames of the data frame.
 
 Ok, here is the quiz, I know one quite cute/slick answer, but was
 wondering if there are obvious better ones, and also if this should
 not become more idiomatic (hence R-devel):
 
 Consider this toy example, where the dataframe already has only one
 column :
 
 nv - c(a=1, d=17, e=101); nv
 a   d   e
 1  17 101
 
 df - as.data.frame(cbind(VAR = nv)); df
 VAR
 a   1
 d  17
 e 101
 
 Now how, can I get 'nv' back from 'df' ?   I.e., how to get
 
 identical(nv, ...)
 [1] TRUE
 
 where .. only uses 'df' (and no non-standard R packages)?
 
 As said, I know a simple solution (*), but I'm sure it is not
 obvious to most R users and probably not even to the majority of
 R-devel readers... OTOH, people like Bill Dunlap will not take
 long to provide it or a better one.
 
 (*) In my solution, the above '...' consists of 17 letters.
 I'll post it later today (CEST time) ... or confirm
 that someone else has done so.
 
 Martin
 
 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Quiz: How to get a named column from a data frame

2012-08-18 Thread Joshua Wiley
On Sat, Aug 18, 2012 at 9:33 AM, Martin Maechler
maech...@stat.math.ethz.ch wrote:
 Joshua Ulrich josh.m.ulr...@gmail.com
 on Sat, 18 Aug 2012 10:16:09 -0500 writes:

  I don't know if this is better, but it's the most obvious/shortest I
  could come up with.  Transpose the data.frame column to a 'row' vector
  and drop the dimensions.

 R identical(nv, drop(t(df)))
  [1] TRUE

 Yes, that's definitely shorter,
 congratulations!

 One gotta is that I'd want a solution that also works when the
 df has more columns than just one...

 Your idea to use  t(.) is nice and perfect insofar as it
 coerces the data frame to a matrix, and that's really the clue:

 Where as  df[,1]  is losing the names,
 the matrix indexing is not.
 So your solution can be changed into

  t(df)[1,]

 which is even shorter...
 and slightly less efficient, at least conceptually, than mine, which has
 been

as.matrix(df)[,1]

 Now, the remaining question is:  Shouldn't there be something
 more natural to achieve that?
 (There is not, currently, AFAIK).

Perhaps a data frame method for as.vector?

as.vector.data.frame - function(x, ...) as.matrix(x)[,1]
as.vector(df[1])

or an additional argument to `[.data.frame` like keep.names, which
defaults to FALSE to maintain current behavior but can optionally be
TRUE.

Cheers,

Josh


 Martin


  Best,
  --
  Joshua Ulrich  |  about.me/joshuaulrich
  FOSS Trading  |  www.fosstrading.com


  On Sat, Aug 18, 2012 at 10:03 AM, Martin Maechler
  maech...@stat.math.ethz.ch wrote:
  Today, I was looking for an elegant (and efficient) way to get a named
  (atomic) vector by selecting one column of a data frame.  Of course,
  the vector names must be the rownames of the data frame.
 
  Ok, here is the quiz, I know one quite cute/slick answer, but was
  wondering if there are obvious better ones, and also if this should
  not become more idiomatic (hence R-devel):
 
  Consider this toy example, where the dataframe already has only one
  column :
 
  nv - c(a=1, d=17, e=101); nv
  a   d   e
  1  17 101
 
  df - as.data.frame(cbind(VAR = nv)); df
  VAR
  a   1
  d  17
  e 101
 
  Now how, can I get 'nv' back from 'df' ?   I.e., how to get
 
  identical(nv, ...)
  [1] TRUE
 
  where .. only uses 'df' (and no non-standard R packages)?
 
  As said, I know a simple solution (*), but I'm sure it is not
  obvious to most R users and probably not even to the majority of
  R-devel readers... OTOH, people like Bill Dunlap will not take
  long to provide it or a better one.
 
  (*) In my solution, the above '...' consists of 17 letters.
  I'll post it later today (CEST time) ... or confirm
  that someone else has done so.
 
  Martin
 
  __
  R-devel@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-devel

 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel



-- 
Joshua Wiley
Ph.D. Student, Health Psychology
Programmer Analyst II, Statistical Consulting Group
University of California, Los Angeles
https://joshuawiley.com/

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Quiz: How to get a named column from a data frame

2012-08-18 Thread Christian Brechbühler
On 8/18/12, Martin Maechler maech...@stat.math.ethz.ch wrote:
 On Sat, Aug 18, 2012 at 5:14 PM, Christian Brechbühler  wrote:
 On Sat, Aug 18, 2012 at 11:03 AM, Martin Maechler
 maech...@stat.math.ethz.ch wrote:

 Consider this toy example, where the dataframe already has only
 one column :

  nv - c(a=1, d=17, e=101); nv
   a   d   e
   1  17 101

  df - as.data.frame(cbind(VAR = nv)); df
   VAR
 a   1
 d  17
 e 101

 Now how, can I get 'nv' back from 'df' ?   I.e., how to get

  identical(nv, ...)
 [1] TRUE

 where .. only uses 'df' (and no non-standard R packages)?


 identical(nv, df[,1])
 [1] TRUE

 But it is not a solution in a current version of R!
 though it's still interesting that   df[,1]  worked in some incantation of
 R.

My mistake!  We disliked some quirks of indexing, so we've long had
our own patch for [.data.frame in place, which I used inadvertently.
 In essence, it does this:

result - base::[.data.frame(df,,1, drop=F)
if (drop  length(ncol(result)  0)  ncol(result)==1) {
  save.names - dimnames(result)[[1]]
  result - result[[1]]
  names(result) - save.names
}

That obviously violated your constraint no non-standard R packages.
I apologize.

Still, maybe the behavior of getting the named column would be
desirable in general?

/Christian

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Quiz: How to get a named column from a data frame

2012-08-18 Thread Winston Chang
This isn't super-concise, but has the virtue of being clear:

nv - c(a=1, d=17, e=101)
df - as.data.frame(cbind(VAR = nv))

identical(nv, setNames(df$VAR, rownames(df)))
# TRUE


It seems to be more efficient than the other methods as well:

f1 - function() setNames(df$VAR, rownames(df))
f2 - function() t(df)[1,]
f3 - function() as.matrix(df)[,1]

r - microbenchmark(f1(), f2(), f3(), times=1000)
r
# Unit: microseconds
#   exprmin  lq median  uq  max
# 1 f1() 14.589 17.0315 18.608 19.3220   89.388
# 2 f2() 68.057 70.8735 72.240 75.8065 3707.012
# 3 f3() 58.153 61.2600 62.521 65.0380  238.483

-Winston



On Sat, Aug 18, 2012 at 10:03 AM, Martin Maechler
maech...@stat.math.ethz.ch wrote:
 Today, I was looking for an elegant (and efficient) way
 to get a named (atomic) vector by selecting one column of a data frame.
 Of course, the vector names must be the rownames of the data frame.

 Ok, here is the quiz, I know one quite cute/slick answer, but was
 wondering if there are obvious better ones, and
 also if this should not become more idiomatic (hence R-devel):

 Consider this toy example, where the dataframe already has only
 one column :

 nv - c(a=1, d=17, e=101); nv
   a   d   e
   1  17 101

 df - as.data.frame(cbind(VAR = nv)); df
   VAR
 a   1
 d  17
 e 101

 Now how, can I get 'nv' back from 'df' ?   I.e., how to get

 identical(nv, ...)
 [1] TRUE

 where .. only uses 'df' (and no non-standard R packages)?

 As said, I know a simple solution (*), but I'm sure it is not
 obvious to most R users and probably not even to the majority of
 R-devel readers... OTOH, people like Bill Dunlap will not take
 long to provide it or a better one.

 (*) In my solution, the above '...' consists of 17 letters.
 I'll post it later today (CEST time) ... or confirm
 that someone else has done so.

 Martin

 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Quiz: How to get a named column from a data frame

2012-08-18 Thread Hadley Wickham
On Sat, Aug 18, 2012 at 10:03 AM, Martin Maechler
maech...@stat.math.ethz.ch wrote:
 Today, I was looking for an elegant (and efficient) way
 to get a named (atomic) vector by selecting one column of a data frame.
 Of course, the vector names must be the rownames of the data frame.

 Ok, here is the quiz, I know one quite cute/slick answer, but was
 wondering if there are obvious better ones, and
 also if this should not become more idiomatic (hence R-devel):

 Consider this toy example, where the dataframe already has only
 one column :

 nv - c(a=1, d=17, e=101); nv
   a   d   e
   1  17 101

 df - as.data.frame(cbind(VAR = nv)); df
   VAR
 a   1
 d  17
 e 101

 Now how, can I get 'nv' back from 'df' ?   I.e., how to get

 identical(nv, ...)
 [1] TRUE

 where .. only uses 'df' (and no non-standard R packages)?

 As said, I know a simple solution (*), but I'm sure it is not
 obvious to most R users and probably not even to the majority of
 R-devel readers... OTOH, people like Bill Dunlap will not take
 long to provide it or a better one.

But aren't you making life difficult for yourself by not using I ?

df - data.frame(VAR = I(nv))
str(df[[1]])

(which isn't quite identically because it now has the AsIs class)

Hadley

-- 
Assistant Professor
Department of Statistics / Rice University
http://had.co.nz/

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Quiz: How to get a named column from a data frame

2012-08-18 Thread William Dunlap
That would have been essentially my suggestion as well.  I prefer its clarity
(and speed).  I didn't know if you wanted your solution to also apply
to matrices embedded in data.frames.  In S+ rownames-() works on vectors
(because it calls the generic rowId-()) so the following works:
   f4 - function(df, column) { tmp - df[[column]] ; rownames(tmp) - 
rownames(df) ; tmp}
   nv - c(a=1,d=17,e=101)
   df - data.frame(VAR=nv, Two=3^(1:3))
   f4(df, 2)
   a d  e 
   3 9 27
   df$Matrix - matrix(1001:1006, ncol=2, nrow=3)
   f4(df, Matrix)
[,1] [,2] 
  a 1001 1004
  d 1002 1005
  e 1003 1006

I forget if R has something like rowIds() (it is to names and rownames as
NROW is to length and nrow).

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com


 -Original Message-
 From: r-devel-boun...@r-project.org [mailto:r-devel-boun...@r-project.org] On 
 Behalf
 Of Winston Chang
 Sent: Saturday, August 18, 2012 11:54 AM
 To: Martin Maechler
 Cc: R. Devel List
 Subject: Re: [Rd] Quiz: How to get a named column from a data frame
 
 This isn't super-concise, but has the virtue of being clear:
 
 nv - c(a=1, d=17, e=101)
 df - as.data.frame(cbind(VAR = nv))
 
 identical(nv, setNames(df$VAR, rownames(df)))
 # TRUE
 
 
 It seems to be more efficient than the other methods as well:
 
 f1 - function() setNames(df$VAR, rownames(df))
 f2 - function() t(df)[1,]
 f3 - function() as.matrix(df)[,1]
 
 r - microbenchmark(f1(), f2(), f3(), times=1000)
 r
 # Unit: microseconds
 #   exprmin  lq median  uq  max
 # 1 f1() 14.589 17.0315 18.608 19.3220   89.388
 # 2 f2() 68.057 70.8735 72.240 75.8065 3707.012
 # 3 f3() 58.153 61.2600 62.521 65.0380  238.483
 
 -Winston
 
 
 
 On Sat, Aug 18, 2012 at 10:03 AM, Martin Maechler
 maech...@stat.math.ethz.ch wrote:
  Today, I was looking for an elegant (and efficient) way
  to get a named (atomic) vector by selecting one column of a data frame.
  Of course, the vector names must be the rownames of the data frame.
 
  Ok, here is the quiz, I know one quite cute/slick answer, but was
  wondering if there are obvious better ones, and
  also if this should not become more idiomatic (hence R-devel):
 
  Consider this toy example, where the dataframe already has only
  one column :
 
  nv - c(a=1, d=17, e=101); nv
a   d   e
1  17 101
 
  df - as.data.frame(cbind(VAR = nv)); df
VAR
  a   1
  d  17
  e 101
 
  Now how, can I get 'nv' back from 'df' ?   I.e., how to get
 
  identical(nv, ...)
  [1] TRUE
 
  where .. only uses 'df' (and no non-standard R packages)?
 
  As said, I know a simple solution (*), but I'm sure it is not
  obvious to most R users and probably not even to the majority of
  R-devel readers... OTOH, people like Bill Dunlap will not take
  long to provide it or a better one.
 
  (*) In my solution, the above '...' consists of 17 letters.
  I'll post it later today (CEST time) ... or confirm
  that someone else has done so.
 
  Martin
 
  __
  R-devel@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-devel
 
 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel