Sam: On Fri, Jan 20, 2012 at 10:52 AM, Sam Steingold <s...@gnu.org> wrote: > Hi, > I have a data frame with one column containing string of the form > "ABC...|XYZ..." > where ABC etc are fields of 6 alphanumeric characters each > and XYZ etc are fields of 8 alphanumeric characters each; > "|" is a mandatory separator; > I do not know in advance how many fields of each kind will each row contain. > I need to extract these fields from the string. > > === How do I do that? > > first I need to split the string in 2 on '|' - how? ?strsplit strsplit(thecolumn, "|",fixed=TRUE)
> then I need to split the two strings by 6/8 characters -- how? This makes no sense to me. strsplit takes care of this. > then I need to convert each 6/8 character string into an integer base 36 > or 64 (depending on the field) - how? No clue. Depends on the encoding AFAICS. -- Bert > > === What do I do with them once I extract them? > > First thing I want to do is to have a count table of them. > Then I thought of adding an extra column for each field value and > putting 0/1 there, e.g., frame > 1,AB > 2,BCD > will turn into > 1,1,1,0,0 > 2,0,1,1,1 > however this would work only if the number of different field values is > manageable. > What do people do? > Can I have a columns of "sets" in data frame? > Does R support the "set" data type? > > Thanks! > > PS. thanks to Sarah Goslee who answered my previous question in so much > detail! > -- > Sam Steingold (http://sds.podval.org/) on Ubuntu 11.10 (oneiric) X > 11.0.11004000 > http://camera.org http://openvotingconsortium.org http://iris.org.il > http://mideasttruth.com http://memri.org http://honestreporting.com > Don't take life too seriously, you'll never get out of it alive! > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Bert Gunter Genentech Nonclinical Biostatistics Internal Contact Info: Phone: 467-7374 Website: http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.