This is a function I use for these kinds of situations. Assuming the delimiter within the column is consistent and the spelling is consistent, it is pretty useful.
The function returns a vector of 0/1 values, 1 if the text in level is found, 0 otherwise. var=the variable level=The value of interest in var 'split_levels' <- function(var, level, sep=","){ #*** identify level in var. f <- function(v){ v <- unlist(strsplit(v,sep)) ifelse(level %in% v, return(1), return(0)) } #*** split the variable new.var <- unlist(sapply(var,f)) names(new.var) <- NULL #*** assign NA's where they were in original variable new.var[is.na(var)] <- NA return(new.var) } Benjamin Nutter | Biostatistician | Quantitative Health Sciences Cleveland Clinic | 9500 Euclid Ave. | Cleveland, OH 44195 | (216) 445-1365 -----Original Message----- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Mark Grimes Sent: Friday, April 06, 2012 11:16 AM To: John D. Muccigrosso Cc: r-help@r-project.org Subject: Re: [R] multiple values in one column John I have to deal with this kind of thing too for my class. # Some functions # for ad$Full.name = "Mark Grimes" get.first.name <- function(cell){ x<-unlist(strsplit(as.character(cell), " ")) return(x[1]) } get.last.name <- function(cell){ x<-unlist(strsplit(as.character(cell), " ")) return(x[2]) } # For roster$Name = "Grimes, Mark L" get.first.namec <- function(cell){ x<-unlist(strsplit(as.character(cell), ", ")) y <- get.first.name(x[2]) return(y) } get.last.namec <- function(cell){ x<-unlist(strsplit(as.character(cell), ", ")) return(x[1]) } Use these functions with the apply family for processing class files. Hope this helps, Mark On Apr 6, 2012, at 9:09 AM, John D. Muccigrosso wrote: > I have some data files in which some fields have multiple values. For example > > first last sex major > John Smith M ANTH > Jane Doe F HIST,BIOL > > What's the best R-like way to handle these data (Jane's major in my example), > so that I can do things like summarize the other fields by them (e.g., sex by > major)? > > Right now I'm processing the files (in excel since they're spreadsheets) by > duplicating lines with two values in the major field, eliminating one value > per row. I suspect there's a nifty R way to do this. > > Thanks in advance! > > John Muccigrosso > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. =================================== Please consider the environment before printing this e-mail Cleveland Clinic is ranked one of the top hospitals in America by U.S.News & World Report (2010). Visit us online at http://www.clevelandclinic.org for a complete listing of our services, staff and locations. Confidentiality Note: This message is intended for use\...{{dropped:13}} ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.