You ask about generic methods for introducing alternate values for factors, and some of the other responses address this quite efficiently.

However, a factor has meaning only within one vector at a time, since
another vector may have additional values or missing values relative to
the first vector. For example, you used the "sample" function which
is not guaranteed to select at least one of each of the four letters in L4. Or, what if the data has values the mapping doesn't address?

For any work in which I am dealing with categorical data in multiple
places (e.g. your "d" data frame and whatever data structure you use
to define your mapping) I prefer NOT to work with factors until all of
my categories of data are moved into one vector (typically a column
in a data frame). Rather, I work with character vectors during the
data manipulation phase and only convert to factor when I start
analyzing or displaying the data.

With this in mind, I use a general flow something like:

d <- data.frame( x = 1, y = 1:10, fac = fac, stringsAsFactors=FALSE )
mp <- data.frame( fac=LETTERS[1:4], value=c(8,11,3,2) )
d2 <- merge( d, mp, all.x=TRUE )
d2$fac <- factor( d2$fac ) # optional

If you actually are in the analysis phase and are not pulling data from multiple external sources, then you may have already confirmed the completeness and range of values you have to work with then one of the other more efficient methods may still be a better choice for this specific task.

Hadley Wickham's "tidy data" [1] principles address this concern more thoroughly than I have.

[1] Google this phrase... paper seems to be a work in progress.

On Thu, 17 Jul 2014, Gang Chen wrote:

Suppose I have the following dataframe:

L4 <- LETTERS[1:4]
fac <- sample(L4, 10, replace = TRUE)
(d <- data.frame(x = 1, y = 1:10, fac = fac))

    x  y  fac
1  1  1   B
2  1  2   B
3  1  3   D
4  1  4   A
5  1  5   C
6  1  6   D
7  1  7   C
8  1  8   B
9  1  9   B
10 1 10   B

I'd like to add another column 'var' that is defined based on the
following mapping of column 'fac':

A -> 8
B -> 11
C -> 3
D -> 2

How can I achieve this in an elegant way (with a generic approach for
any length)?

Thanks,
Gang

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


---------------------------------------------------------------------------
Jeff Newmiller                        The     .....       .....  Go Live...
DCN:<jdnew...@dcn.davis.ca.us>        Basics: ##.#.       ##.#.  Live Go...
                                      Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
/Software/Embedded Controllers)               .OO#.       .OO#.  rocks...1k

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to