On Jun 24, 2014, at 10:14 PM, Abhinaba Roy wrote: > Dear R helpers, > > I want to generate data for say 1000 patients (i.e., 1000 unique IDs) > having suffered from various diseases in the past (say diseases > A,B,C,D,E,F). The only condition imposed is that each patient should've > suffered from *atleast* two diseases. So my data frame will have two > columns 'ID' and 'Disease'. > > I want to do a basket analysis with this data, where ID will be the > identifier and we will establish rules based on the 'Disease' column. > > How can I generate this type of data in R? >
Perhaps something along these lines for 20 cases: > data.frame(patient=1:20, disease = sapply(pmin(2+rpois(20, 2), 6), > function(n) paste0( sample( c('A','B','C','D','E','F'), n), collapse="+" ) ) + ) patient disease 1 1 F+D 2 2 F+A+D+E 3 3 F+D+C+E 4 4 B+D+C+A 5 5 D+A+F+C 6 6 E+A+D 7 7 E+F+B+C+A+D 8 8 A+B+C+D+E 9 9 B+E+C+F 10 10 C+A 11 11 B+A+D+E+C+F 12 12 B+C 13 13 A+D+B+E 14 14 D+C+E+F+B+A 15 15 C+F+D+E+A 16 16 A+C+B 17 17 C+D+B+E 18 18 A+B 19 19 C+B+D+E+F 20 20 D+C+F > -- > Regards > Abhinaba Roy > > [[alternative HTML version deleted]] You should read the Posting Guide and learn to post in HTML. > > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- David Winsemius Alameda, CA, USA ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.