On Jun 24, 2014, at 10:14 PM, Abhinaba Roy wrote:

> Dear R helpers,
> 
> I want to generate data for say 1000 patients (i.e., 1000 unique IDs)
> having suffered from various diseases in the past (say diseases
> A,B,C,D,E,F). The only condition imposed is that each patient should've
> suffered from *atleast* two diseases. So my data frame will have two
> columns 'ID' and 'Disease'.
> 
> I want to do a basket analysis with this data, where ID will be the
> identifier and we will establish rules based on the 'Disease' column.
> 
> How can I generate this type of data in R?
> 

Perhaps something along these lines for 20 cases:

> data.frame(patient=1:20, disease = sapply(pmin(2+rpois(20, 2), 6), 
> function(n) paste0( sample( c('A','B','C','D','E','F'), n), collapse="+" ) )
+ )
   patient     disease
1        1         F+D
2        2     F+A+D+E
3        3     F+D+C+E
4        4     B+D+C+A
5        5     D+A+F+C
6        6       E+A+D
7        7 E+F+B+C+A+D
8        8   A+B+C+D+E
9        9     B+E+C+F
10      10         C+A
11      11 B+A+D+E+C+F
12      12         B+C
13      13     A+D+B+E
14      14 D+C+E+F+B+A
15      15   C+F+D+E+A
16      16       A+C+B
17      17     C+D+B+E
18      18         A+B
19      19   C+B+D+E+F
20      20       D+C+F

> -- 
> Regards
> Abhinaba Roy
> 
>       [[alternative HTML version deleted]]

You should read the Posting Guide and learn to post in HTML.
> 
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.


-- 
David Winsemius
Alameda, CA, USA

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to