On Apr 22, 2010, at 5:16 AM, Michael Haenlein wrote:

Dear all,

I have several character strings with a high number of different levels.
unique(x) gives me values in the range of 100-200.
This creates problems as I would like to use them as predictors in a coxph
model.

I therefore would like to convert each of these strings to a new string
(x_new).
x_new should be equal to x for the top n categories (i.e. the top n levels
with the highest occurrence) and NAN elsewhere.
For example, for n=3 x_new would have three levels: The three most common
levels of x + NAN.

Is there some convenient way of doing this?

 x <- sample(c("top", "three", "levels", "0ther", "strings"), 30,
                 replace=TRUE, prob=c(.3,.3,.3,.1,.1))
 y <- c("top", "three", "levels")
 xnew <- x
 xnew[ !xnew %in% y ] <- "NAN"  # not same as NaN
 table(xnew)

#--------
xnew
levels    NAN  three    top
     5      5      9     11

--
David.


Thanks in advance,

Michael


Michael Haenlein
Associate Professor of Marketing
ESCP Europe
Paris, France

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

David Winsemius, MD
West Hartford, CT

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to