Re: [R] Reading Multi-value data fields for descriptive analysis

jim holtman Sat, 12 Jul 2008 10:51:46 -0700

Can you provide a more complete example (say 10 lines) of what the
input is like. Does each line have a unique index that can be related
to it?  Do you want to summarize all the multi1-n values of Col2?  Do
you want to know the percentage of input lines that have a
Col3/multi-value4 on them?  You could read in the data as you have
indicated below and add a column that is the record number and
therefore you would have have to worry about trying to say if it
existed or not.  For example, you might have:


Rec#|col#|value
1|1|single
1|2|multi1
1|2|multi2
1|3|multi1
2|1|single
3|1|single
3|2|multi1
....

There are a number of potential ways of representing the data, but a
lot depends on what you want to do with it, so a more extensive
example of the input, along with the type of output you would like
will help in providing an answer.

On Sat, Jul 12, 2008 at 12:37 PM, Hohm, Dale <[EMAIL PROTECTED]> wrote:
> Hello,
>
> I'm looking for help on the best approach to get "multi-value" data fields 
> into R for simple descriptive analysis.
>
> -------------------------------------
>
> I am new to this list and new to R, but I really want to get over the hump 
> and get productive with it.  Some help with how to best get the following 
> data into R would be greatly appreciated.  I have programming experience and 
> stale experience with SPSS.
>
> I am trying to do some simple descriptive analysis (frequencies, cross-tabs) 
> of data stored in a Microsoft SharePoint list.  The data can be accessed with 
> ODBC or it can readily be extracted into an Excel or CSV format.  One of the 
> challenges with the data is that it uses several "multi-value" fields 
> (Microsoft Access provides the same data-type).
>
> By "multi-value" I mean that multiple responses are packed into a single data 
> column; the data input form presents a question with several checkboxes and a 
> free-format write-in response.  The individual values within the data field 
> are separated with the two characters ";#".  So, the data would be of the 
> following format (in CSV form with column headers and a tilde as the field 
> separator):
>
> Column1single~Column2multi~Column3multi
> a sample value~C2 a multi one;#C2 a multi two~C3 a multi one;#C3 a multi 
> two;#C3 a free-form answer
>
>
> The first approach that comes to mind is to explode the multi-value fields 
> into unique bi-variate data columns and then assign a 0 or 1 to these new 
> columns in each record based on whether that specific value was present.  
> This approach is complicated by the free-form answer as the unique columns 
> could grow very large in number - it might be better to figure out how to 
> indicate the presence of the free-form value in a data column called "Other" 
> (or "C2 Other") and then hold the free-form value in a separate column.
>
> The data would then look like this...
>
> Column1single: a sample value
> C2 a multi one: 1
> C2 a multi two: 1
> C2 a multi three: 0
> C3 a multi one: 1
> C3 a multi two: 1
> C3 a free-form answer: 1
> C3 another free-form answer: 0
>
>
> Or in the second scenario...
>
> Column1single: a sample value
> C2 a multi one: 1
> C2 a multi two: 1
> C2 a multi three: 0
> C3 a multi one: 1
> C3 a multi two: 1
> C3 Other: 1
> C3 Other Text: a free-form answer
>
>
> I am uncertain help to read this data into R in this format, so suggestions 
> and examples would help me greatly.
>
> This is a pretty common data packing scenario, so perhaps there are better 
> approaches to reading this data and better ways in R to analyze it than what 
> I have presented.  Suggestions greatly appreciated.
>
>
> Thanks,
>
> Dale Hohm
>
>        [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Reading Multi-value data fields for descriptive analysis

Reply via email to