Jeff,

Even though the solutions from the previous responders are good enough
for my current situation, the principle you just raised will be
definitely beneficial to your future work. Thanks a lot for sharing
the insights!

Gang

On Thu, Jul 17, 2014 at 12:06 PM, Jeff Newmiller
<jdnew...@dcn.davis.ca.us> wrote:
> You ask about generic methods for introducing alternate values for factors,
> and some of the other responses address this quite efficiently.
>
> However, a factor has meaning only within one vector at a time, since
> another vector may have additional values or missing values relative to
> the first vector. For example, you used the "sample" function which
> is not guaranteed to select at least one of each of the four letters in L4.
> Or, what if the data has values the mapping doesn't address?
>
> For any work in which I am dealing with categorical data in multiple
> places (e.g. your "d" data frame and whatever data structure you use
> to define your mapping) I prefer NOT to work with factors until all of
> my categories of data are moved into one vector (typically a column
> in a data frame). Rather, I work with character vectors during the
> data manipulation phase and only convert to factor when I start
> analyzing or displaying the data.
>
> With this in mind, I use a general flow something like:
>
> d <- data.frame( x = 1, y = 1:10, fac = fac, stringsAsFactors=FALSE )
> mp <- data.frame( fac=LETTERS[1:4], value=c(8,11,3,2) )
> d2 <- merge( d, mp, all.x=TRUE )
> d2$fac <- factor( d2$fac ) # optional
>
> If you actually are in the analysis phase and are not pulling data from
> multiple external sources, then you may have already confirmed the
> completeness and range of values you have to work with then one of the other
> more efficient methods may still be a better choice for this specific task.
>
> Hadley Wickham's "tidy data" [1] principles address this concern more
> thoroughly than I have.
>
> [1] Google this phrase... paper seems to be a work in progress.
>
>
> On Thu, 17 Jul 2014, Gang Chen wrote:
>
>> Suppose I have the following dataframe:
>>
>> L4 <- LETTERS[1:4]
>> fac <- sample(L4, 10, replace = TRUE)
>> (d <- data.frame(x = 1, y = 1:10, fac = fac))
>>
>>     x  y  fac
>> 1  1  1   B
>> 2  1  2   B
>> 3  1  3   D
>> 4  1  4   A
>> 5  1  5   C
>> 6  1  6   D
>> 7  1  7   C
>> 8  1  8   B
>> 9  1  9   B
>> 10 1 10   B
>>
>> I'd like to add another column 'var' that is defined based on the
>> following mapping of column 'fac':
>>
>> A -> 8
>> B -> 11
>> C -> 3
>> D -> 2
>>
>> How can I achieve this in an elegant way (with a generic approach for
>> any length)?
>>
>> Thanks,
>> Gang
>>
>> ______________________________________________
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
> ---------------------------------------------------------------------------
> Jeff Newmiller                        The     .....       .....  Go Live...
> DCN:<jdnew...@dcn.davis.ca.us>        Basics: ##.#.       ##.#.  Live Go...
>                                       Live:   OO#.. Dead: OO#..  Playing
> Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
> /Software/Embedded Controllers)               .OO#.       .OO#.  rocks...1k
> ---------------------------------------------------------------------------

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to