Re: [R] Generating Patient Data

2014-07-01 Thread David Winsemius

On Jun 25, 2014, at 1:49 PM, David Winsemius wrote:

 
 On Jun 24, 2014, at 11:18 PM, Abhinaba Roy wrote:
 
 Hi David,
 
 I was thinking something like this:
 
 ID   Disease
 1 A
 2 B
 3 A
 1C
 2D
 5A
 4B
 3D
 2A
 ....
 
 How can this be done?
 
 do.call(rbind,  lapply( 1:20, function(pt) { 
data.frame( patient=pt, 
disease= sample( c('A','B','C','D','E','F'), 
 pmin(2+rpois(1, 2), 6))  )}) )

If you were doing this repeatedly I suppose you might get time efficiency by  
the rpois vector as a single item of the same length as your PatientID's 
 
 -- 
 David.
 
 
 On Wed, Jun 25, 2014 at 11:34 AM, David Winsemius dwinsem...@comcast.net 
 wrote:
 
 On Jun 24, 2014, at 10:14 PM, Abhinaba Roy wrote:
 
 Dear R helpers,
 
 I want to generate data for say 1000 patients (i.e., 1000 unique IDs)
 having suffered from various diseases in the past (say diseases
 A,B,C,D,E,F). The only condition imposed is that each patient should've
 suffered from *atleast* two diseases. So my data frame will have two
 columns 'ID' and 'Disease'.
 
 I want to do a basket analysis with this data, where ID will be the
 identifier and we will establish rules based on the 'Disease' column.
 
 How can I generate this type of data in R?
 
 
 Perhaps something along these lines for 20 cases:
 
 data.frame(patient=1:20, disease = sapply(pmin(2+rpois(20, 2), 6), 
 function(n) paste0( sample( c('A','B','C','D','E','F'), n), collapse=+ ) )
 + )
   patient disease
 11 F+D
 22 F+A+D+E
 33 F+D+C+E
 44 B+D+C+A
 55 D+A+F+C
 66   E+A+D
 77 E+F+B+C+A+D
 88   A+B+C+D+E
 99 B+E+C+F
 10  10 C+A
 11  11 B+A+D+E+C+F
 12  12 B+C
 13  13 A+D+B+E
 14  14 D+C+E+F+B+A
 15  15   C+F+D+E+A
 16  16   A+C+B
 17  17 C+D+B+E
 18  18 A+B
 19  19   C+B+D+E+F
 20  20   D+C+F
 
 --
 Regards
 Abhinaba Roy
 
  [[alternative HTML version deleted]]
 
 You should read the Posting Guide and learn to post in HTML.
 
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 
 --
 David Winsemius
 Alameda, CA, USA
 
 
 
 
 -- 
 Regards
 Abhinaba Roy
 Statistician
 Radix Analytics Pvt. Ltd
 Ahmedabad
 
 
 David Winsemius
 Alameda, CA, USA
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

David Winsemius
Alameda, CA, USA

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Generating Patient Data

2014-06-25 Thread David Winsemius

On Jun 24, 2014, at 10:14 PM, Abhinaba Roy wrote:

 Dear R helpers,
 
 I want to generate data for say 1000 patients (i.e., 1000 unique IDs)
 having suffered from various diseases in the past (say diseases
 A,B,C,D,E,F). The only condition imposed is that each patient should've
 suffered from *atleast* two diseases. So my data frame will have two
 columns 'ID' and 'Disease'.
 
 I want to do a basket analysis with this data, where ID will be the
 identifier and we will establish rules based on the 'Disease' column.
 
 How can I generate this type of data in R?
 

Perhaps something along these lines for 20 cases:

 data.frame(patient=1:20, disease = sapply(pmin(2+rpois(20, 2), 6), 
 function(n) paste0( sample( c('A','B','C','D','E','F'), n), collapse=+ ) )
+ )
   patient disease
11 F+D
22 F+A+D+E
33 F+D+C+E
44 B+D+C+A
55 D+A+F+C
66   E+A+D
77 E+F+B+C+A+D
88   A+B+C+D+E
99 B+E+C+F
10  10 C+A
11  11 B+A+D+E+C+F
12  12 B+C
13  13 A+D+B+E
14  14 D+C+E+F+B+A
15  15   C+F+D+E+A
16  16   A+C+B
17  17 C+D+B+E
18  18 A+B
19  19   C+B+D+E+F
20  20   D+C+F

 -- 
 Regards
 Abhinaba Roy
 
   [[alternative HTML version deleted]]

You should read the Posting Guide and learn to post in HTML.
 
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


-- 
David Winsemius
Alameda, CA, USA

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Generating Patient Data

2014-06-25 Thread Abhinaba Roy
Hi David,

I was thinking something like this:

ID   Disease
1 A
2 B
3 A
1C
2D
5A
4B
3D
2A
....

How can this be done?


On Wed, Jun 25, 2014 at 11:34 AM, David Winsemius dwinsem...@comcast.net
wrote:


 On Jun 24, 2014, at 10:14 PM, Abhinaba Roy wrote:

  Dear R helpers,
 
  I want to generate data for say 1000 patients (i.e., 1000 unique IDs)
  having suffered from various diseases in the past (say diseases
  A,B,C,D,E,F). The only condition imposed is that each patient should've
  suffered from *atleast* two diseases. So my data frame will have two
  columns 'ID' and 'Disease'.
 
  I want to do a basket analysis with this data, where ID will be the
  identifier and we will establish rules based on the 'Disease' column.
 
  How can I generate this type of data in R?
 

 Perhaps something along these lines for 20 cases:

  data.frame(patient=1:20, disease = sapply(pmin(2+rpois(20, 2), 6),
 function(n) paste0( sample( c('A','B','C','D','E','F'), n), collapse=+ ) )
 + )
patient disease
 11 F+D
 22 F+A+D+E
 33 F+D+C+E
 44 B+D+C+A
 55 D+A+F+C
 66   E+A+D
 77 E+F+B+C+A+D
 88   A+B+C+D+E
 99 B+E+C+F
 10  10 C+A
 11  11 B+A+D+E+C+F
 12  12 B+C
 13  13 A+D+B+E
 14  14 D+C+E+F+B+A
 15  15   C+F+D+E+A
 16  16   A+C+B
 17  17 C+D+B+E
 18  18 A+B
 19  19   C+B+D+E+F
 20  20   D+C+F

  --
  Regards
  Abhinaba Roy
 
[[alternative HTML version deleted]]

 You should read the Posting Guide and learn to post in HTML.
 
  PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.


 --
 David Winsemius
 Alameda, CA, USA




-- 
Regards
Abhinaba Roy
Statistician
Radix Analytics Pvt. Ltd
Ahmedabad

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Generating Patient Data

2014-06-25 Thread Anthony Damico
# build off of david's suggestion
x -
data.frame(
patient= 1:20 ,
disease =
sapply(
pmin( 2 + rpois( 20 , 2 ) , 6 ) ,
function( n ) paste0( sample( c('A','B','C','D','E','F'),
n), collapse=+ )
)
)

# break the diseases into a list, one entry per patient
y - strsplit( as.character( x$disease ) , \\+ )

# melt the list
library(reshape2)

z - melt( y )

# re-name the columns in that result
names( z ) - c( disease , patient )

# print the results to the screen
z

# compare the structure to `x` if you like
x





On Wed, Jun 25, 2014 at 2:18 AM, Abhinaba Roy abhinabaro...@gmail.com
wrote:

 Hi David,

 I was thinking something like this:

 ID   Disease
 1 A
 2 B
 3 A
 1C
 2D
 5A
 4B
 3D
 2A
 ....

 How can this be done?


 On Wed, Jun 25, 2014 at 11:34 AM, David Winsemius dwinsem...@comcast.net
 wrote:

 
  On Jun 24, 2014, at 10:14 PM, Abhinaba Roy wrote:
 
   Dear R helpers,
  
   I want to generate data for say 1000 patients (i.e., 1000 unique IDs)
   having suffered from various diseases in the past (say diseases
   A,B,C,D,E,F). The only condition imposed is that each patient should've
   suffered from *atleast* two diseases. So my data frame will have two
   columns 'ID' and 'Disease'.
  
   I want to do a basket analysis with this data, where ID will be the
   identifier and we will establish rules based on the 'Disease' column.
  
   How can I generate this type of data in R?
  
 
  Perhaps something along these lines for 20 cases:
 
   data.frame(patient=1:20, disease = sapply(pmin(2+rpois(20, 2), 6),
  function(n) paste0( sample( c('A','B','C','D','E','F'), n), collapse=+
 ) )
  + )
 patient disease
  11 F+D
  22 F+A+D+E
  33 F+D+C+E
  44 B+D+C+A
  55 D+A+F+C
  66   E+A+D
  77 E+F+B+C+A+D
  88   A+B+C+D+E
  99 B+E+C+F
  10  10 C+A
  11  11 B+A+D+E+C+F
  12  12 B+C
  13  13 A+D+B+E
  14  14 D+C+E+F+B+A
  15  15   C+F+D+E+A
  16  16   A+C+B
  17  17 C+D+B+E
  18  18 A+B
  19  19   C+B+D+E+F
  20  20   D+C+F
 
   --
   Regards
   Abhinaba Roy
  
 [[alternative HTML version deleted]]
 
  You should read the Posting Guide and learn to post in HTML.
  
   PLEASE do read the posting guide
  http://www.R-project.org/posting-guide.html
   and provide commented, minimal, self-contained, reproducible code.
 
 
  --
  David Winsemius
  Alameda, CA, USA
 
 


 --
 Regards
 Abhinaba Roy
 Statistician
 Radix Analytics Pvt. Ltd
 Ahmedabad

 [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Generating Patient Data

2014-06-25 Thread arun



Hi, 

Check if this works:
 set.seed(495)
 dat - data.frame(ID=sample(1:10,20,replace=TRUE), 
Disease=sample(LETTERS[1:6], 20, replace=TRUE) )
subset(melt(table(dat)[rowSums(!!table(dat))1,]), !!value,select=1:2)
   ID Disease
1   2   A
3   4   A
4   6   A
6  10   A
8   3   B
15  4   C
16  6   C
20  3   D
22  6   D
24 10   D
26  3   E
27  4   E
29  7   E
31  2   F
33  4   F
35  7   F
A.K.



On Wednesday, June 25, 2014 1:17 AM, Abhinaba Roy abhinabaro...@gmail.com 
wrote:
Dear R helpers,

I want to generate data for say 1000 patients (i.e., 1000 unique IDs)
having suffered from various diseases in the past (say diseases
A,B,C,D,E,F). The only condition imposed is that each patient should've
suffered from *atleast* two diseases. So my data frame will have two
columns 'ID' and 'Disease'.

I want to do a basket analysis with this data, where ID will be the
identifier and we will establish rules based on the 'Disease' column.

How can I generate this type of data in R?

-- 
Regards
Abhinaba Roy

    [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Generating Patient Data

2014-06-25 Thread arun
Also, you can do:
library(dplyr)
dat%%group_by(ID)%%filter(length(unique(Disease))1)%%arrange(Disease,ID)
A.K.




On Wednesday, June 25, 2014 3:45 AM, arun smartpink...@yahoo.com wrote:


Forgot about:
library(reshape2)





On , arun smartpink...@yahoo.com wrote:



Hi, 

Check if this works:
 set.seed(495)
 dat - data.frame(ID=sample(1:10,20,replace=TRUE), 
Disease=sample(LETTERS[1:6], 20, replace=TRUE) )
subset(melt(table(dat)[rowSums(!!table(dat))1,]), !!value,select=1:2)
   ID Disease
1   2   A
3   4   A
4   6   A
6  10   A
8   3   B
15  4   C
16  6   C
20  3   D
22  6   D
24 10   D
26  3   E
27  4   E
29  7   E
31  2   F
33  4   F
35  7   F
A.K.






On Wednesday, June 25, 2014 1:17 AM, Abhinaba Roy abhinabaro...@gmail.com 
wrote:
Dear R helpers,

I want to generate data for say 1000 patients (i.e., 1000 unique IDs)
having suffered from various diseases in the past (say diseases
A,B,C,D,E,F). The only condition imposed is that each patient should've
suffered from *atleast* two diseases. So my data frame will have two
columns 'ID' and 'Disease'.

I want to do a basket analysis with this data, where ID will be the
identifier and we will establish rules based on the 'Disease' column.

How can I generate this type of data in R?

-- 
Regards
Abhinaba Roy

    [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Generating Patient Data

2014-06-25 Thread David Winsemius

On Jun 24, 2014, at 11:18 PM, Abhinaba Roy wrote:

 Hi David,
 
 I was thinking something like this:
 
 ID   Disease
 1 A
 2 B
 3 A
 1C
 2D
 5A
 4B
 3D
 2A
 ....
 
 How can this be done?

 do.call(rbind,  lapply( 1:20, function(pt) { 
data.frame( patient=pt, 
disease= sample( c('A','B','C','D','E','F'), 
pmin(2+rpois(1, 2), 6))  )}) )

-- 
David.
 
 
 On Wed, Jun 25, 2014 at 11:34 AM, David Winsemius dwinsem...@comcast.net 
 wrote:
 
 On Jun 24, 2014, at 10:14 PM, Abhinaba Roy wrote:
 
  Dear R helpers,
 
  I want to generate data for say 1000 patients (i.e., 1000 unique IDs)
  having suffered from various diseases in the past (say diseases
  A,B,C,D,E,F). The only condition imposed is that each patient should've
  suffered from *atleast* two diseases. So my data frame will have two
  columns 'ID' and 'Disease'.
 
  I want to do a basket analysis with this data, where ID will be the
  identifier and we will establish rules based on the 'Disease' column.
 
  How can I generate this type of data in R?
 
 
 Perhaps something along these lines for 20 cases:
 
  data.frame(patient=1:20, disease = sapply(pmin(2+rpois(20, 2), 6), 
  function(n) paste0( sample( c('A','B','C','D','E','F'), n), collapse=+ ) )
 + )
patient disease
 11 F+D
 22 F+A+D+E
 33 F+D+C+E
 44 B+D+C+A
 55 D+A+F+C
 66   E+A+D
 77 E+F+B+C+A+D
 88   A+B+C+D+E
 99 B+E+C+F
 10  10 C+A
 11  11 B+A+D+E+C+F
 12  12 B+C
 13  13 A+D+B+E
 14  14 D+C+E+F+B+A
 15  15   C+F+D+E+A
 16  16   A+C+B
 17  17 C+D+B+E
 18  18 A+B
 19  19   C+B+D+E+F
 20  20   D+C+F
 
  --
  Regards
  Abhinaba Roy
 
[[alternative HTML version deleted]]
 
 You should read the Posting Guide and learn to post in HTML.
 
  PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 
 
 --
 David Winsemius
 Alameda, CA, USA
 
 
 
 
 -- 
 Regards
 Abhinaba Roy
 Statistician
 Radix Analytics Pvt. Ltd
 Ahmedabad
 

David Winsemius
Alameda, CA, USA

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.