On Jun 25, 2014, at 1:49 PM, David Winsemius wrote: > > On Jun 24, 2014, at 11:18 PM, Abhinaba Roy wrote: > >> Hi David, >> >> I was thinking something like this: >> >> ID Disease >> 1 A >> 2 B >> 3 A >> 1 C >> 2 D >> 5 A >> 4 B >> 3 D >> 2 A >> .. .. >> >> How can this be done? > > do.call(rbind, lapply( 1:20, function(pt) { > data.frame( patient=pt, > disease= sample( c('A','B','C','D','E','F'), > pmin(2+rpois(1, 2), 6)) )}) )
If you were doing this repeatedly I suppose you might get time efficiency by the rpois vector as a single item of the same length as your PatientID's > > -- > David. >> >> >> On Wed, Jun 25, 2014 at 11:34 AM, David Winsemius <dwinsem...@comcast.net> >> wrote: >> >> On Jun 24, 2014, at 10:14 PM, Abhinaba Roy wrote: >> >>> Dear R helpers, >>> >>> I want to generate data for say 1000 patients (i.e., 1000 unique IDs) >>> having suffered from various diseases in the past (say diseases >>> A,B,C,D,E,F). The only condition imposed is that each patient should've >>> suffered from *atleast* two diseases. So my data frame will have two >>> columns 'ID' and 'Disease'. >>> >>> I want to do a basket analysis with this data, where ID will be the >>> identifier and we will establish rules based on the 'Disease' column. >>> >>> How can I generate this type of data in R? >>> >> >> Perhaps something along these lines for 20 cases: >> >>> data.frame(patient=1:20, disease = sapply(pmin(2+rpois(20, 2), 6), >>> function(n) paste0( sample( c('A','B','C','D','E','F'), n), collapse="+" ) ) >> + ) >> patient disease >> 1 1 F+D >> 2 2 F+A+D+E >> 3 3 F+D+C+E >> 4 4 B+D+C+A >> 5 5 D+A+F+C >> 6 6 E+A+D >> 7 7 E+F+B+C+A+D >> 8 8 A+B+C+D+E >> 9 9 B+E+C+F >> 10 10 C+A >> 11 11 B+A+D+E+C+F >> 12 12 B+C >> 13 13 A+D+B+E >> 14 14 D+C+E+F+B+A >> 15 15 C+F+D+E+A >> 16 16 A+C+B >> 17 17 C+D+B+E >> 18 18 A+B >> 19 19 C+B+D+E+F >> 20 20 D+C+F >> >>> -- >>> Regards >>> Abhinaba Roy >>> >>> [[alternative HTML version deleted]] >> >> You should read the Posting Guide and learn to post in HTML. >>> >>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >> >> >> -- >> David Winsemius >> Alameda, CA, USA >> >> >> >> >> -- >> Regards >> Abhinaba Roy >> Statistician >> Radix Analytics Pvt. Ltd >> Ahmedabad >> > > David Winsemius > Alameda, CA, USA > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. David Winsemius Alameda, CA, USA ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.