Re: [R] Convert character string to top levels + NAN

David Winsemius Thu, 22 Apr 2010 06:23:26 -0700


On Apr 22, 2010, at 5:16 AM, Michael Haenlein wrote:

Dear all,
I have several character strings with a high number of differentlevels.
unique(x) gives me values in the range of 100-200.
This creates problems as I would like to use them as predictors in acoxph
model.
I therefore would like to convert each of these strings to a newstring
(x_new).
x_new should be equal to x for the top n categories (i.e. the top nlevels
with the highest occurrence) and NAN elsewhere.
For example, for n=3 x_new would have three levels: The three mostcommon
levels of x + NAN.

Is there some convenient way of doing this?


 x <- sample(c("top", "three", "levels", "0ther", "strings"), 30,
                 replace=TRUE, prob=c(.3,.3,.3,.1,.1))
 y <- c("top", "three", "levels")
 xnew <- x
 xnew[ !xnew %in% y ] <- "NAN"  # not same as NaN
 table(xnew)

#--------
xnew
levels    NAN  three    top
     5      5      9     11

--
David.


Thanks in advance,

Michael


Michael Haenlein
Associate Professor of Marketing
ESCP Europe
Paris, France

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
West Hartford, CT

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Convert character string to top levels + NAN

Reply via email to