[R] factorise variables in a data.frame
Dear list, I often need to convert several variables from numeric or integer into factors (before plotting, for instance), as in the following example, d - data.frame( x = seq(1, 10), y = seq(1, 10), z = rnorm(10), a = letters[1:10]) d2 - within(d, { x = factor(x) y = factor(y) }) str(d) str(d2) I'd like to write a function factorise() which takes a data.frame and a vector of variable names, and returns the original data.frame with the desired variables converted to factor, factorise - function(d, f) ***ply(d, f, factor) # some apply function also, perhaps a defactorise() function doing the reverse operation with as.numeric. I played with the plyr package and the base apply family for a while but can't find any concise construct. Best regards, baptiste _ Baptiste AuguiƩ School of Physics University of Exeter Stocker Road, Exeter, Devon, EX4 4QL, UK Phone: +44 1392 264187 http://newton.ex.ac.uk/research/emag __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] factorise variables in a data.frame
baptiste auguie wrote: Dear list, I often need to convert several variables from numeric or integer into factors (before plotting, for instance), as in the following example, d - data.frame( x = seq(1, 10), y = seq(1, 10), z = rnorm(10), a = letters[1:10]) d2 - within(d, { x = factor(x) y = factor(y) }) str(d) str(d2) I'd like to write a function factorise() which takes a data.frame and a vector of variable names, and returns the original data.frame with the desired variables converted to factor, would this not be good enough: # dummy data data = data.frame(x=1:10, y=1:10) # a factorizer factorize = function(data, columns=names(data)) { data[columns] = lapply(data[columns], as.factor) data } sapply(factorize(data, 'x'), is) # $x factor ... # $y integer ... lapply(factorize(data), is) # $x factor ... # $y factor ... factorise - function(d, f) ***ply(d, f, factor) # some apply function also, perhaps a defactorise() function doing the reverse operation with as.numeric. then, perhaps, # an izer ize = function(data, columns=names(data), izer=as.factor) { data[columns] = lapply(data[columns], izer) data } ize(data, 'x', as.logical) or even ize = function(izer) function(data, columns=names(data)) { data[columns] = lapply(data[columns], izer) data } logicalize = ize(as.logical) characterize = ize(as.character) lapply(logicalize(data), is) # $x logical ... # $y logical ... lapply(characterize(data, 'x'), is) # $x character ... # $y integer ... etc. vQ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] factorise variables in a data.frame
Excellent! I felt it was fairly trivial but i can be quite dense on Friday mornings. I really like the generalisation. Many thanks, baptiste On 3 Apr 2009, at 12:11, Wacek Kusnierczyk wrote: baptiste auguie wrote: Dear list, I often need to convert several variables from numeric or integer into factors (before plotting, for instance), as in the following example, d - data.frame( x = seq(1, 10), y = seq(1, 10), z = rnorm(10), a = letters[1:10]) d2 - within(d, { x = factor(x) y = factor(y) }) str(d) str(d2) I'd like to write a function factorise() which takes a data.frame and a vector of variable names, and returns the original data.frame with the desired variables converted to factor, would this not be good enough: # dummy data data = data.frame(x=1:10, y=1:10) # a factorizer factorize = function(data, columns=names(data)) { data[columns] = lapply(data[columns], as.factor) data } sapply(factorize(data, 'x'), is) # $x factor ... # $y integer ... lapply(factorize(data), is) # $x factor ... # $y factor ... factorise - function(d, f) ***ply(d, f, factor) # some apply function also, perhaps a defactorise() function doing the reverse operation with as.numeric. then, perhaps, # an izer ize = function(data, columns=names(data), izer=as.factor) { data[columns] = lapply(data[columns], izer) data } ize(data, 'x', as.logical) or even ize = function(izer) function(data, columns=names(data)) { data[columns] = lapply(data[columns], izer) data } logicalize = ize(as.logical) characterize = ize(as.character) lapply(logicalize(data), is) # $x logical ... # $y logical ... lapply(characterize(data, 'x'), is) # $x character ... # $y integer ... etc. vQ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. _ Baptiste AuguiƩ School of Physics University of Exeter Stocker Road, Exeter, Devon, EX4 4QL, UK Phone: +44 1392 264187 http://newton.ex.ac.uk/research/emag __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.