Re: [R] Data transformation & cleaning

Jim Lemon Wed, 28 Sep 2011 03:31:12 -0700

On 09/28/2011 01:13 PM, pip56789 wrote:

Hi,


I have a few methodological and implementation questions for ya'll. Thank
you in advance for your help. I have a dataset that reflects people's
preference choices. I want to see if there's any kind of clustering effect
among certain preference choices (e.g. do people who pick choice A also pick
choice D).

I have a data set that has one record per user ID, per preference choice.
It's a "long" form of a data set that looks like this:

ID | Page
123 | Choice A
123 | Choice B
456 | Choice A
456 | Choice B
...

I thought that I should do the following

1. Make the data set "wide", counting the observations so the data looks
like this:
ID | Count of Preference A | Count of Preference B
123 | 1 | 1
...

Using
table1<- dcast(data,ID ~ Page,fun.aggregate=length,value_var='Page' )

2. Create a correlation matrix of preferences
cor(table2[,-1])

How would I restrict my correlation to show preferences that met a minimum
sample threshold? Can you confirm if the two following commands do the same
thing? What would I do from here (or am I taking the wrong approach)
table1<- dcast(data,Page ~ Page,fun.aggregate=length,value_var='Page' )
table2<- with(data, table(Page,Page))

Hi Peter,

An easy way to visualize set intersections is the intersectDiagramfunction in the plotrix package. This will display the counts orpercentages of each type of intersection. Your data could be passed likethis:


choices<-data.frame(IDs=sample(1:20,50,TRUE),
 sample(LETTERS[1:4],50,TRUE))
library(plotrix)
intersectDiagram(choices)

This example is a bit messy, as it will generate quite a few repeatedchoices that will be ignored by intersectDiagram, but it should give youthe idea.


Jim

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Data transformation & cleaning

Reply via email to