Re: [R] Reading Multi-value data fields for descriptive analysis

jim holtman Sun, 13 Jul 2008 17:51:38 -0700

Is one of the rows NULL?  Do an 'str(x)' show?  The example you sent
seems to work with the code.  Are you reading in a different set of
data?  I think I know what happened.  I shortened the names on your
example so it was easier to access.  Here is the data I used:


ID|name|picnic|food|other
1|Yogi Bear|Yes|Hamburgers;#Hot Dogs;#I rely on others to bring the
good stuff|"Softball;#Blanket;#I bring boo-boo, but he hides"
2|Boo-Boo|Yes|Potato Salad;#Cole Slaw;#whatever Yogi doesn't eat|Lawn
Chairs;#Blanket;#my running shoes
3|Ranger Rick|No|I told you I don't picnic|a big net and handcuffs
4|Magilla Gorilla|Yes|Hamburgers;#Hot Dogs;#Potato Salad;#Cole
Slaw;#BBQ Chicken|Softball;#Volleyball;#Lawn Chairs;#Blanket
5|Foghorn Leghorn|Yes|"Hot Dogs;#Cole Slaw;#I say, I say, BBQ
Chicken?"|Softball;#Blanket
6|Peter Potamus|Yes|"Hamburgers;#Hot Dogs;#anything, just a lot of
it"|Softball;#Lawn Chairs;#hot air balloon
7|Jonny Quest|No|too busy getting into and out of trouble|Hadji and Bandit
8|"Fleegle, Bingo, Drooper and Snorky"|Yes|Hamburgers;#Hot
Dogs;#Potato Salad;#Cole Slaw;#A banana split|a laugh track
9|George Jetson|No|Mr. Spacely is making me work|Lawn
Chairs;#Blanket;#my flying car
10|Snagglepuss|Yes|Hamburgers;#Hot Dogs;#Potato Salad;#Cole Slaw;#BBQ
Chicken|Softball;#Heavens to Murgatroyd!  Exit stage left!


Here is the code with a few more comments.  The basic structure was to
loop through the three data columns since they had the same format.

x <- read.table("/tempxx.txt", comment="", quote="", sep="|",
header=TRUE, as.is=TRUE)
# split out by name. the 'lapply' will cycle through for each row in
the data and
# the index of the row is passed to the '.row' parameter of the function
z <- lapply(seq(nrow(x)), function(.row){
    # this sets the result to NULL so that we can accumulate the data
as it is processed
    .result <- NULL
    # construct the data output
    # the three columns were shorted to the following names.  the
'for' loop will
    # iterate through each of the three columns, taking the data in
the columns and
    # spliting them by your separator ';#'
    for (i in c('picnic', 'food', 'other')){
        # this will access the specific column (given in the variable 'i')
        .split <- strsplit(x[.row,][[i]], ";#")
        # this appended on to '.result' the contents of this column
after creating
        # the three columns of data which are the name, the column ID
and the value
        # from that column
        .result <- rbind(.result, cbind(name=x[.row,][['name']],
field=i, value=unlist(.split)))
    }
    .result   # return the result
})


z



On Sun, Jul 13, 2008 at 7:25 PM, Hohm, Dale <[EMAIL PROTECTED]> wrote:
> Thanks Jim,
>
> I wish I were comfortable enough with the language for the fix needed to the 
> syntax to be obvious, but it is not yet.  With your example, I get:
>
>        Error in strsplit(x[.row, ][[i]], ";#") : non-character argument
>
> x appears to be filled properly, but z is not due to the error.
>
> Also, if you were willing to provide some brief annotation or describe the 
> overall logic in the code you supplied it would help me immensely.
>
> Thanks,
>
> Dale
>
> -----Original Message-----
> From: jim holtman [mailto:[EMAIL PROTECTED]
> Sent: Sunday, July 13, 2008 6:35 AM
> To: Hohm, Dale
> Cc: r-help@r-project.org
> Subject: Re: [R] Reading Multi-value data fields for descriptive analysis
>
> This may do what you want:
>
>> x <- read.table("/tempxx.txt", comment="", quote="", sep="|", header=TRUE, 
>> as.is=TRUE)
>> # split out by name
>> z <- lapply(seq(nrow(x)), function(.row){
> +     .result <- NULL
> +     # construct the data output
> +     for (i in c('picnic', 'food', 'other')){
> +         .split <- strsplit(x[.row,][[i]], ";#")
> +         .result <- rbind(.result, cbind(name=x[.row,][['name']],
> field=i, value=unlist(.split)))
> +     }
> +     .result
> + })
>>
>>
>> z
> [[1]]
>     name        field    value
> [1,] "Yogi Bear" "picnic" "Yes"
> [2,] "Yogi Bear" "food"   "Hamburgers"
> [3,] "Yogi Bear" "food"   "Hot Dogs"
> [4,] "Yogi Bear" "food"   "I rely on others to bring the good stuff"
> [5,] "Yogi Bear" "other"  "\"Softball"
> [6,] "Yogi Bear" "other"  "Blanket"
> [7,] "Yogi Bear" "other"  "I bring boo-boo, but he hides\""
>
> [[2]]
>     name      field    value
> [1,] "Boo-Boo" "picnic" "Yes"
> [2,] "Boo-Boo" "food"   "Potato Salad"
> [3,] "Boo-Boo" "food"   "Cole Slaw"
> [4,] "Boo-Boo" "food"   "whatever Yogi doesn't eat"
> [5,] "Boo-Boo" "other"  "Lawn Chairs"
> [6,] "Boo-Boo" "other"  "Blanket"
> [7,] "Boo-Boo" "other"  "my running shoes"
>
> [[3]]
>     name          field    value
> [1,] "Ranger Rick" "picnic" "No"
> [2,] "Ranger Rick" "food"   "I told you I don't picnic"
> [3,] "Ranger Rick" "other"  "a big net and handcuffs"
>
> [[4]]
>      name              field    value
>  [1,] "Magilla Gorilla" "picnic" "Yes"
>  [2,] "Magilla Gorilla" "food"   "Hamburgers"
>  [3,] "Magilla Gorilla" "food"   "Hot Dogs"
>  [4,] "Magilla Gorilla" "food"   "Potato Salad"
>  [5,] "Magilla Gorilla" "food"   "Cole Slaw"
>  [6,] "Magilla Gorilla" "food"   "BBQ Chicken"
>  [7,] "Magilla Gorilla" "other"  "Softball"
>  [8,] "Magilla Gorilla" "other"  "Volleyball"
>  [9,] "Magilla Gorilla" "other"  "Lawn Chairs"
> [10,] "Magilla Gorilla" "other"  "Blanket"
>
>
>
> On Sun, Jul 13, 2008 at 12:56 AM, Hohm, Dale <[EMAIL PROTECTED]> wrote:
>> Thanks for the reply Jim.
>>
>> Here is a representation of the data I want to analyze - 10 records as 
>> requested.  Each line can easily include an ID number as below.
>>
>> So I want to determine a frequency or percentage of respondents that bring 
>> each of the 5 foods (Hamburgers, Hot Dogs, Potato Salad, Cole Slaw and BBQ 
>> Chicken) and how many "Other" write-ins there are.  The same for what else 
>> is brought besides food (Softball, Volleyball, Lawn Chairs and Blanket) as 
>> well as a count of "Other" write-ins.  I'll also need to be able to discern 
>> how many brought Hambergers AND a Blanket or how many brought a Softball AND 
>> a Vollyball etc.
>>
>> ID|Your Name|Do you picnic?|What is your favorite picnic food?|What do you 
>> bring besides food?
>> 1|Yogi Bear|Yes|Hamburgers;#Hot Dogs;#I rely on others to bring the good 
>> stuff|"Softball;#Blanket;#I bring boo-boo, but he hides"
>> 2|Boo-Boo|Yes|Potato Salad;#Cole Slaw;#whatever Yogi doesn't eat|Lawn 
>> Chairs;#Blanket;#my running shoes
>> 3|Ranger Rick|No|I told you I don't picnic|a big net and handcuffs
>> 4|Magilla Gorilla|Yes|Hamburgers;#Hot Dogs;#Potato Salad;#Cole Slaw;#BBQ 
>> Chicken|Softball;#Volleyball;#Lawn Chairs;#Blanket
>> 5|Foghorn Leghorn|Yes|"Hot Dogs;#Cole Slaw;#I say, I say, BBQ 
>> Chicken?"|Softball;#Blanket
>> 6|Peter Potamus|Yes|"Hamburgers;#Hot Dogs;#anything, just a lot of 
>> it"|Softball;#Lawn Chairs;#hot air balloon
>> 7|Jonny Quest|No|too busy getting into and out of trouble|Hadji and Bandit
>> 8|"Fleegle, Bingo, Drooper and Snorky"|Yes|Hamburgers;#Hot Dogs;#Potato 
>> Salad;#Cole Slaw;#A banana split|a laugh track
>> 9|George Jetson|No|Mr. Spacely is making me work|Lawn Chairs;#Blanket;#my 
>> flying car
>> 10|Snagglepuss|Yes|Hamburgers;#Hot Dogs;#Potato Salad;#Cole Slaw;#BBQ 
>> Chicken|Softball;#Heavens to Murgatroyd!  Exit stage left!
>>
>> Thanks in advance,
>>
>> Dale
>>
>> -----Original Message-----
>> From: jim holtman [mailto:[EMAIL PROTECTED]
>> Sent: Saturday, July 12, 2008 11:32 AM
>> To: Hohm, Dale
>> Cc: r-help@r-project.org
>> Subject: Re: [R] Reading Multi-value data fields for descriptive analysis
>>
>> Can you provide a more complete example (say 10 lines) of what the
>> input is like. Does each line have a unique index that can be related
>> to it?  Do you want to summarize all the multi1-n values of Col2?  Do
>> you want to know the percentage of input lines that have a
>> Col3/multi-value4 on them?  You could read in the data as you have
>> indicated below and add a column that is the record number and
>> therefore you would have have to worry about trying to say if it
>> existed or not.  For example, you might have:
>>
>> Rec#|col#|value
>> 1|1|single
>> 1|2|multi1
>> 1|2|multi2
>> 1|3|multi1
>> 2|1|single
>> 3|1|single
>> 3|2|multi1
>> ....
>>
>> There are a number of potential ways of representing the data, but a
>> lot depends on what you want to do with it, so a more extensive
>> example of the input, along with the type of output you would like
>> will help in providing an answer.
>>
>> On Sat, Jul 12, 2008 at 12:37 PM, Hohm, Dale <[EMAIL PROTECTED]> wrote:
>>> Hello,
>>>
>>> I'm looking for help on the best approach to get "multi-value" data fields 
>>> into R for simple descriptive analysis.
>>>
>>> -------------------------------------
>>>
>>> I am new to this list and new to R, but I really want to get over the hump 
>>> and get productive with it.  Some help with how to best get the following 
>>> data into R would be greatly appreciated.  I have programming experience 
>>> and stale experience with SPSS.
>>
>>>
>>> I am trying to do some simple descriptive analysis (frequencies, 
>>> cross-tabs) of data stored in a Microsoft SharePoint list.  The data can be 
>>> accessed with ODBC or it can readily be extracted into an Excel or CSV 
>>> format.  One of the challenges with the data is that it uses several 
>>> "multi-value" fields (Microsoft Access provides the same data-type).
>>>
>>> By "multi-value" I mean that multiple responses are packed into a single 
>>> data column; the data input form presents a question with several 
>>> checkboxes and a free-format write-in response.  The individual values 
>>> within the data field are separated with the two characters ";#".  So, the 
>>> data would be of the following format (in CSV form with column headers and 
>>> a tilde as the field separator):
>>>
>>> Column1single~Column2multi~Column3multi
>>> a sample value~C2 a multi one;#C2 a multi two~C3 a multi one;#C3 a multi 
>>> two;#C3 a free-form answer
>>>
>>>
>>> The first approach that comes to mind is to explode the multi-value fields 
>>> into unique bi-variate data columns and then assign a 0 or 1 to these new 
>>> columns in each record based on whether that specific value was present.  
>>> This approach is complicated by the free-form answer as the unique columns 
>>> could grow very large in number - it might be better to figure out how to 
>>> indicate the presence of the free-form value in a data column called 
>>> "Other" (or "C2 Other") and then hold the free-form value in a separate 
>>> column.
>>>
>>> The data would then look like this...
>>>
>>> Column1single: a sample value
>>> C2 a multi one: 1
>>> C2 a multi two: 1
>>> C2 a multi three: 0
>>> C3 a multi one: 1
>>> C3 a multi two: 1
>>> C3 a free-form answer: 1
>>> C3 another free-form answer: 0
>>>
>>>
>>> Or in the second scenario...
>>>
>>> Column1single: a sample value
>>> C2 a multi one: 1
>>> C2 a multi two: 1
>>> C2 a multi three: 0
>>> C3 a multi one: 1
>>> C3 a multi two: 1
>>> C3 Other: 1
>>> C3 Other Text: a free-form answer
>>>
>>>
>>> I am uncertain help to read this data into R in this format, so suggestions 
>>> and examples would help me greatly.
>>>
>>> This is a pretty common data packing scenario, so perhaps there are better 
>>> approaches to reading this data and better ways in R to analyze it than 
>>> what I have presented.  Suggestions greatly appreciated.
>>>
>>>
>>> Thanks,
>>>
>>> Dale Hohm
>>>
>>>        [[alternative HTML version deleted]]
>>>
>>> ______________________________________________
>>> R-help@r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>>
>>
>> --
>> Jim Holtman
>> Cincinnati, OH
>> +1 513 646 9390
>>
>> What is the problem you are trying to solve?
>>
>
>
>
> --
> Jim Holtman
> Cincinnati, OH
> +1 513 646 9390
>
> What is the problem you are trying to solve?
>



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Reading Multi-value data fields for descriptive analysis

Reply via email to