[R] factorise variables in a data.frame

2009-04-03 Thread baptiste auguie

Dear list,

I often need to convert several variables from numeric or integer into  
factors (before plotting, for instance), as in the following example,



d - data.frame(
x = seq(1, 10),
y = seq(1, 10),
z = rnorm(10),
a = letters[1:10])


d2 -
within(d, {
x = factor(x)
y = factor(y)
 })

str(d)
str(d2)


I'd like to write a function factorise() which takes a data.frame and  
a vector of variable names, and returns the original data.frame with  
the desired variables converted to factor,


factorise - function(d, f)
***ply(d, f, factor) # some apply function

also, perhaps a defactorise() function doing the reverse operation  
with as.numeric.


I played with the plyr package and the base apply family for a while  
but can't find any concise construct.


Best regards,

baptiste



_

Baptiste AuguiƩ

School of Physics
University of Exeter
Stocker Road,
Exeter, Devon,
EX4 4QL, UK

Phone: +44 1392 264187

http://newton.ex.ac.uk/research/emag

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] factorise variables in a data.frame

2009-04-03 Thread Wacek Kusnierczyk
baptiste auguie wrote:
 Dear list,

 I often need to convert several variables from numeric or integer into
 factors (before plotting, for instance), as in the following example,


 d - data.frame(
 x = seq(1, 10),
 y = seq(1, 10),
 z = rnorm(10),
 a = letters[1:10])
 

 d2 -
 within(d, {
 x = factor(x)
 y = factor(y)
  })
 
 str(d)
 str(d2)


 I'd like to write a function factorise() which takes a data.frame and
 a vector of variable names, and returns the original data.frame with
 the desired variables converted to factor,


would this not be good enough:

# dummy data
data = data.frame(x=1:10, y=1:10)

# a factorizer
factorize = function(data, columns=names(data)) {
   data[columns] = lapply(data[columns], as.factor)
   data }

sapply(factorize(data, 'x'), is)
# $x factor ...
# $y integer ...
lapply(factorize(data), is)
# $x factor ...
# $y factor ...
  

 factorise - function(d, f)
 ***ply(d, f, factor) # some apply function

 also, perhaps a defactorise() function doing the reverse operation
 with as.numeric.

then, perhaps,

# an izer
ize = function(data, columns=names(data), izer=as.factor) {
   data[columns] = lapply(data[columns], izer)
   data }

ize(data, 'x', as.logical)
   
or even

ize = function(izer)
   function(data, columns=names(data)) {
  data[columns] = lapply(data[columns], izer)
  data }

logicalize = ize(as.logical)
characterize = ize(as.character)
   
lapply(logicalize(data), is)
# $x logical ...
# $y logical ...
lapply(characterize(data, 'x'), is)
# $x character ...
# $y integer ...

etc.
   
vQ

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] factorise variables in a data.frame

2009-04-03 Thread baptiste auguie

Excellent!

I felt it was fairly trivial but i can be quite dense on Friday  
mornings.


I really like the generalisation.

Many thanks,

baptiste


On 3 Apr 2009, at 12:11, Wacek Kusnierczyk wrote:


baptiste auguie wrote:

Dear list,

I often need to convert several variables from numeric or integer  
into

factors (before plotting, for instance), as in the following example,


d - data.frame(
   x = seq(1, 10),
   y = seq(1, 10),
   z = rnorm(10),
   a = letters[1:10])


d2 -
within(d, {
   x = factor(x)
   y = factor(y)
})

str(d)
str(d2)


I'd like to write a function factorise() which takes a data.frame and
a vector of variable names, and returns the original data.frame with
the desired variables converted to factor,



would this not be good enough:

   # dummy data
   data = data.frame(x=1:10, y=1:10)

   # a factorizer
   factorize = function(data, columns=names(data)) {
  data[columns] = lapply(data[columns], as.factor)
  data }

   sapply(factorize(data, 'x'), is)
   # $x factor ...
   # $y integer ...
   lapply(factorize(data), is)
   # $x factor ...
   # $y factor ...



factorise - function(d, f)
***ply(d, f, factor) # some apply function

also, perhaps a defactorise() function doing the reverse operation
with as.numeric.


then, perhaps,

   # an izer
   ize = function(data, columns=names(data), izer=as.factor) {
  data[columns] = lapply(data[columns], izer)
  data }

   ize(data, 'x', as.logical)

or even

   ize = function(izer)
  function(data, columns=names(data)) {
 data[columns] = lapply(data[columns], izer)
 data }

   logicalize = ize(as.logical)
   characterize = ize(as.character)

   lapply(logicalize(data), is)
   # $x logical ...
   # $y logical ...
   lapply(characterize(data, 'x'), is)
   # $x character ...
   # $y integer ...

etc.

vQ

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


_

Baptiste AuguiƩ

School of Physics
University of Exeter
Stocker Road,
Exeter, Devon,
EX4 4QL, UK

Phone: +44 1392 264187

http://newton.ex.ac.uk/research/emag

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.