Is one of the rows NULL? Do an 'str(x)' show? The example you sent seems to work with the code. Are you reading in a different set of data? I think I know what happened. I shortened the names on your example so it was easier to access. Here is the data I used:
ID|name|picnic|food|other 1|Yogi Bear|Yes|Hamburgers;#Hot Dogs;#I rely on others to bring the good stuff|"Softball;#Blanket;#I bring boo-boo, but he hides" 2|Boo-Boo|Yes|Potato Salad;#Cole Slaw;#whatever Yogi doesn't eat|Lawn Chairs;#Blanket;#my running shoes 3|Ranger Rick|No|I told you I don't picnic|a big net and handcuffs 4|Magilla Gorilla|Yes|Hamburgers;#Hot Dogs;#Potato Salad;#Cole Slaw;#BBQ Chicken|Softball;#Volleyball;#Lawn Chairs;#Blanket 5|Foghorn Leghorn|Yes|"Hot Dogs;#Cole Slaw;#I say, I say, BBQ Chicken?"|Softball;#Blanket 6|Peter Potamus|Yes|"Hamburgers;#Hot Dogs;#anything, just a lot of it"|Softball;#Lawn Chairs;#hot air balloon 7|Jonny Quest|No|too busy getting into and out of trouble|Hadji and Bandit 8|"Fleegle, Bingo, Drooper and Snorky"|Yes|Hamburgers;#Hot Dogs;#Potato Salad;#Cole Slaw;#A banana split|a laugh track 9|George Jetson|No|Mr. Spacely is making me work|Lawn Chairs;#Blanket;#my flying car 10|Snagglepuss|Yes|Hamburgers;#Hot Dogs;#Potato Salad;#Cole Slaw;#BBQ Chicken|Softball;#Heavens to Murgatroyd! Exit stage left! Here is the code with a few more comments. The basic structure was to loop through the three data columns since they had the same format. x <- read.table("/tempxx.txt", comment="", quote="", sep="|", header=TRUE, as.is=TRUE) # split out by name. the 'lapply' will cycle through for each row in the data and # the index of the row is passed to the '.row' parameter of the function z <- lapply(seq(nrow(x)), function(.row){ # this sets the result to NULL so that we can accumulate the data as it is processed .result <- NULL # construct the data output # the three columns were shorted to the following names. the 'for' loop will # iterate through each of the three columns, taking the data in the columns and # spliting them by your separator ';#' for (i in c('picnic', 'food', 'other')){ # this will access the specific column (given in the variable 'i') .split <- strsplit(x[.row,][[i]], ";#") # this appended on to '.result' the contents of this column after creating # the three columns of data which are the name, the column ID and the value # from that column .result <- rbind(.result, cbind(name=x[.row,][['name']], field=i, value=unlist(.split))) } .result # return the result }) z On Sun, Jul 13, 2008 at 7:25 PM, Hohm, Dale <[EMAIL PROTECTED]> wrote: > Thanks Jim, > > I wish I were comfortable enough with the language for the fix needed to the > syntax to be obvious, but it is not yet. With your example, I get: > > Error in strsplit(x[.row, ][[i]], ";#") : non-character argument > > x appears to be filled properly, but z is not due to the error. > > Also, if you were willing to provide some brief annotation or describe the > overall logic in the code you supplied it would help me immensely. > > Thanks, > > Dale > > -----Original Message----- > From: jim holtman [mailto:[EMAIL PROTECTED] > Sent: Sunday, July 13, 2008 6:35 AM > To: Hohm, Dale > Cc: r-help@r-project.org > Subject: Re: [R] Reading Multi-value data fields for descriptive analysis > > This may do what you want: > >> x <- read.table("/tempxx.txt", comment="", quote="", sep="|", header=TRUE, >> as.is=TRUE) >> # split out by name >> z <- lapply(seq(nrow(x)), function(.row){ > + .result <- NULL > + # construct the data output > + for (i in c('picnic', 'food', 'other')){ > + .split <- strsplit(x[.row,][[i]], ";#") > + .result <- rbind(.result, cbind(name=x[.row,][['name']], > field=i, value=unlist(.split))) > + } > + .result > + }) >> >> >> z > [[1]] > name field value > [1,] "Yogi Bear" "picnic" "Yes" > [2,] "Yogi Bear" "food" "Hamburgers" > [3,] "Yogi Bear" "food" "Hot Dogs" > [4,] "Yogi Bear" "food" "I rely on others to bring the good stuff" > [5,] "Yogi Bear" "other" "\"Softball" > [6,] "Yogi Bear" "other" "Blanket" > [7,] "Yogi Bear" "other" "I bring boo-boo, but he hides\"" > > [[2]] > name field value > [1,] "Boo-Boo" "picnic" "Yes" > [2,] "Boo-Boo" "food" "Potato Salad" > [3,] "Boo-Boo" "food" "Cole Slaw" > [4,] "Boo-Boo" "food" "whatever Yogi doesn't eat" > [5,] "Boo-Boo" "other" "Lawn Chairs" > [6,] "Boo-Boo" "other" "Blanket" > [7,] "Boo-Boo" "other" "my running shoes" > > [[3]] > name field value > [1,] "Ranger Rick" "picnic" "No" > [2,] "Ranger Rick" "food" "I told you I don't picnic" > [3,] "Ranger Rick" "other" "a big net and handcuffs" > > [[4]] > name field value > [1,] "Magilla Gorilla" "picnic" "Yes" > [2,] "Magilla Gorilla" "food" "Hamburgers" > [3,] "Magilla Gorilla" "food" "Hot Dogs" > [4,] "Magilla Gorilla" "food" "Potato Salad" > [5,] "Magilla Gorilla" "food" "Cole Slaw" > [6,] "Magilla Gorilla" "food" "BBQ Chicken" > [7,] "Magilla Gorilla" "other" "Softball" > [8,] "Magilla Gorilla" "other" "Volleyball" > [9,] "Magilla Gorilla" "other" "Lawn Chairs" > [10,] "Magilla Gorilla" "other" "Blanket" > > > > On Sun, Jul 13, 2008 at 12:56 AM, Hohm, Dale <[EMAIL PROTECTED]> wrote: >> Thanks for the reply Jim. >> >> Here is a representation of the data I want to analyze - 10 records as >> requested. Each line can easily include an ID number as below. >> >> So I want to determine a frequency or percentage of respondents that bring >> each of the 5 foods (Hamburgers, Hot Dogs, Potato Salad, Cole Slaw and BBQ >> Chicken) and how many "Other" write-ins there are. The same for what else >> is brought besides food (Softball, Volleyball, Lawn Chairs and Blanket) as >> well as a count of "Other" write-ins. I'll also need to be able to discern >> how many brought Hambergers AND a Blanket or how many brought a Softball AND >> a Vollyball etc. >> >> ID|Your Name|Do you picnic?|What is your favorite picnic food?|What do you >> bring besides food? >> 1|Yogi Bear|Yes|Hamburgers;#Hot Dogs;#I rely on others to bring the good >> stuff|"Softball;#Blanket;#I bring boo-boo, but he hides" >> 2|Boo-Boo|Yes|Potato Salad;#Cole Slaw;#whatever Yogi doesn't eat|Lawn >> Chairs;#Blanket;#my running shoes >> 3|Ranger Rick|No|I told you I don't picnic|a big net and handcuffs >> 4|Magilla Gorilla|Yes|Hamburgers;#Hot Dogs;#Potato Salad;#Cole Slaw;#BBQ >> Chicken|Softball;#Volleyball;#Lawn Chairs;#Blanket >> 5|Foghorn Leghorn|Yes|"Hot Dogs;#Cole Slaw;#I say, I say, BBQ >> Chicken?"|Softball;#Blanket >> 6|Peter Potamus|Yes|"Hamburgers;#Hot Dogs;#anything, just a lot of >> it"|Softball;#Lawn Chairs;#hot air balloon >> 7|Jonny Quest|No|too busy getting into and out of trouble|Hadji and Bandit >> 8|"Fleegle, Bingo, Drooper and Snorky"|Yes|Hamburgers;#Hot Dogs;#Potato >> Salad;#Cole Slaw;#A banana split|a laugh track >> 9|George Jetson|No|Mr. Spacely is making me work|Lawn Chairs;#Blanket;#my >> flying car >> 10|Snagglepuss|Yes|Hamburgers;#Hot Dogs;#Potato Salad;#Cole Slaw;#BBQ >> Chicken|Softball;#Heavens to Murgatroyd! Exit stage left! >> >> Thanks in advance, >> >> Dale >> >> -----Original Message----- >> From: jim holtman [mailto:[EMAIL PROTECTED] >> Sent: Saturday, July 12, 2008 11:32 AM >> To: Hohm, Dale >> Cc: r-help@r-project.org >> Subject: Re: [R] Reading Multi-value data fields for descriptive analysis >> >> Can you provide a more complete example (say 10 lines) of what the >> input is like. Does each line have a unique index that can be related >> to it? Do you want to summarize all the multi1-n values of Col2? Do >> you want to know the percentage of input lines that have a >> Col3/multi-value4 on them? You could read in the data as you have >> indicated below and add a column that is the record number and >> therefore you would have have to worry about trying to say if it >> existed or not. For example, you might have: >> >> Rec#|col#|value >> 1|1|single >> 1|2|multi1 >> 1|2|multi2 >> 1|3|multi1 >> 2|1|single >> 3|1|single >> 3|2|multi1 >> .... >> >> There are a number of potential ways of representing the data, but a >> lot depends on what you want to do with it, so a more extensive >> example of the input, along with the type of output you would like >> will help in providing an answer. >> >> On Sat, Jul 12, 2008 at 12:37 PM, Hohm, Dale <[EMAIL PROTECTED]> wrote: >>> Hello, >>> >>> I'm looking for help on the best approach to get "multi-value" data fields >>> into R for simple descriptive analysis. >>> >>> ------------------------------------- >>> >>> I am new to this list and new to R, but I really want to get over the hump >>> and get productive with it. Some help with how to best get the following >>> data into R would be greatly appreciated. I have programming experience >>> and stale experience with SPSS. >> >>> >>> I am trying to do some simple descriptive analysis (frequencies, >>> cross-tabs) of data stored in a Microsoft SharePoint list. The data can be >>> accessed with ODBC or it can readily be extracted into an Excel or CSV >>> format. One of the challenges with the data is that it uses several >>> "multi-value" fields (Microsoft Access provides the same data-type). >>> >>> By "multi-value" I mean that multiple responses are packed into a single >>> data column; the data input form presents a question with several >>> checkboxes and a free-format write-in response. The individual values >>> within the data field are separated with the two characters ";#". So, the >>> data would be of the following format (in CSV form with column headers and >>> a tilde as the field separator): >>> >>> Column1single~Column2multi~Column3multi >>> a sample value~C2 a multi one;#C2 a multi two~C3 a multi one;#C3 a multi >>> two;#C3 a free-form answer >>> >>> >>> The first approach that comes to mind is to explode the multi-value fields >>> into unique bi-variate data columns and then assign a 0 or 1 to these new >>> columns in each record based on whether that specific value was present. >>> This approach is complicated by the free-form answer as the unique columns >>> could grow very large in number - it might be better to figure out how to >>> indicate the presence of the free-form value in a data column called >>> "Other" (or "C2 Other") and then hold the free-form value in a separate >>> column. >>> >>> The data would then look like this... >>> >>> Column1single: a sample value >>> C2 a multi one: 1 >>> C2 a multi two: 1 >>> C2 a multi three: 0 >>> C3 a multi one: 1 >>> C3 a multi two: 1 >>> C3 a free-form answer: 1 >>> C3 another free-form answer: 0 >>> >>> >>> Or in the second scenario... >>> >>> Column1single: a sample value >>> C2 a multi one: 1 >>> C2 a multi two: 1 >>> C2 a multi three: 0 >>> C3 a multi one: 1 >>> C3 a multi two: 1 >>> C3 Other: 1 >>> C3 Other Text: a free-form answer >>> >>> >>> I am uncertain help to read this data into R in this format, so suggestions >>> and examples would help me greatly. >>> >>> This is a pretty common data packing scenario, so perhaps there are better >>> approaches to reading this data and better ways in R to analyze it than >>> what I have presented. Suggestions greatly appreciated. >>> >>> >>> Thanks, >>> >>> Dale Hohm >>> >>> [[alternative HTML version deleted]] >>> >>> ______________________________________________ >>> R-help@r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >>> >> >> >> >> -- >> Jim Holtman >> Cincinnati, OH >> +1 513 646 9390 >> >> What is the problem you are trying to solve? >> > > > > -- > Jim Holtman > Cincinnati, OH > +1 513 646 9390 > > What is the problem you are trying to solve? > -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem you are trying to solve? ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.