Hi, I have a few methodological and implementation questions for ya'll. Thank you in advance for your help. I have a dataset that reflects people's preference choices. I want to see if there's any kind of clustering effect among certain preference choices (e.g. do people who pick choice A also pick choice D).
I have a data set that has one record per user ID, per preference choice. It's a "long" form of a data set that looks like this: ID | Page 123 | Choice A 123 | Choice B 456 | Choice A 456 | Choice B ... I thought that I should do the following 1. Make the data set "wide", counting the observations so the data looks like this: ID | Count of Preference A | Count of Preference B 123 | 1 | 1 ... Using table1 <- dcast(data,ID ~ Page,fun.aggregate=length,value_var='Page' ) 2. Create a correlation matrix of preferences cor(table2[,-1]) How would I restrict my correlation to show preferences that met a minimum sample threshold? Can you confirm if the two following commands do the same thing? What would I do from here (or am I taking the wrong approach) table1 <- dcast(data,Page ~ Page,fun.aggregate=length,value_var='Page' ) table2 <- with(data, table(Page,Page)) many thanks, Peter -- View this message in context: http://r.789695.n4.nabble.com/Data-transformation-cleaning-tp3849889p3849889.html Sent from the R help mailing list archive at Nabble.com. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.