On 09/28/2011 01:13 PM, pip56789 wrote:
Hi,

I have a few methodological and implementation questions for ya'll. Thank
you in advance for your help. I have a dataset that reflects people's
preference choices. I want to see if there's any kind of clustering effect
among certain preference choices (e.g. do people who pick choice A also pick
choice D).

I have a data set that has one record per user ID, per preference choice.
It's a "long" form of a data set that looks like this:

ID | Page
123 | Choice A
123 | Choice B
456 | Choice A
456 | Choice B
...

I thought that I should do the following

1. Make the data set "wide", counting the observations so the data looks
like this:
ID | Count of Preference A | Count of Preference B
123 | 1 | 1
...

Using
table1<- dcast(data,ID ~ Page,fun.aggregate=length,value_var='Page' )

2. Create a correlation matrix of preferences
cor(table2[,-1])

How would I restrict my correlation to show preferences that met a minimum
sample threshold? Can you confirm if the two following commands do the same
thing? What would I do from here (or am I taking the wrong approach)
table1<- dcast(data,Page ~ Page,fun.aggregate=length,value_var='Page' )
table2<- with(data, table(Page,Page))


Hi Peter,
An easy way to visualize set intersections is the intersectDiagram function in the plotrix package. This will display the counts or percentages of each type of intersection. Your data could be passed like this:

choices<-data.frame(IDs=sample(1:20,50,TRUE),
 sample(LETTERS[1:4],50,TRUE))
library(plotrix)
intersectDiagram(choices)

This example is a bit messy, as it will generate quite a few repeated choices that will be ignored by intersectDiagram, but it should give you the idea.

Jim

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to