(I checked the archive and it seems the entire text of my post was 'scrubbed' - 
trying again)

Hello,

I'm looking for help on the best approach to get "multi-value" data fields into 
R for simple descriptive analysis.

-------------------------------------

I am new to this list and new to R, but I really want to get over the hump and 
get productive with it.  Some help with how to best get the following data into 
R would be greatly appreciated.  I have programming experience and stale 
experience with SPSS.

I am trying to do some simple descriptive analysis (frequencies, cross-tabs) of 
data stored in a Microsoft SharePoint list.  The data can be accessed with ODBC 
or it can readily be extracted into an Excel or CSV format.  One of the 
challenges with the data is that it uses several "multi-value" fields 
(Microsoft Access provides the same data-type).

By "multi-value" I mean that multiple responses are packed into a single data 
column; the data input form presents a question with several checkboxes and a 
free-format write-in response.  The individual values within the data field are 
separated with the two characters ";#".  So, the data would be of the following 
format (in CSV form with column headers and a tilde as the field separator):

Column1single~Column2multi~Column3multi
a sample value~C2 a multi one;#C2 a multi two~C3 a multi one;#C3 a multi 
two;#C3 a free-form answer


The first approach that comes to mind is to explode the multi-value fields into 
unique bi-variate data columns and then assign a 0 or 1 to these new columns in 
each record based on whether that specific value was present.  This approach is 
complicated by the free-form answer as the unique columns could grow very large 
in number - it might be better to figure out how to indicate the presence of 
the free-form value in a data column called "Other" (or "C2 Other") and then 
hold the free-form value in a separate column.

The data would then look like this...

Column1single: a sample value
C2 a multi one: 1
C2 a multi two: 1
C2 a multi three: 0
C3 a multi one: 1
C3 a multi two: 1
C3 a free-form answer: 1
C3 another free-form answer: 0


Or in the second scenario...

Column1single: a sample value
C2 a multi one: 1
C2 a multi two: 1
C2 a multi three: 0
C3 a multi one: 1
C3 a multi two: 1
C3 Other: 1
C3 Other Text: a free-form answer


I am uncertain help to read this data into R in this format, so suggestions and 
examples would help me greatly.

This is a pretty common data packing scenario, so perhaps there are better 
approaches to reading this data and better ways in R to analyze it than what I 
have presented.  Suggestions greatly appreciated.


Thanks,

Dale Hohm

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to