Re: [R] Generating Patient Data
On Jun 25, 2014, at 1:49 PM, David Winsemius wrote: On Jun 24, 2014, at 11:18 PM, Abhinaba Roy wrote: Hi David, I was thinking something like this: ID Disease 1 A 2 B 3 A 1C 2D 5A 4B 3D 2A .... How can this be done? do.call(rbind, lapply( 1:20, function(pt) { data.frame( patient=pt, disease= sample( c('A','B','C','D','E','F'), pmin(2+rpois(1, 2), 6)) )}) ) If you were doing this repeatedly I suppose you might get time efficiency by the rpois vector as a single item of the same length as your PatientID's -- David. On Wed, Jun 25, 2014 at 11:34 AM, David Winsemius dwinsem...@comcast.net wrote: On Jun 24, 2014, at 10:14 PM, Abhinaba Roy wrote: Dear R helpers, I want to generate data for say 1000 patients (i.e., 1000 unique IDs) having suffered from various diseases in the past (say diseases A,B,C,D,E,F). The only condition imposed is that each patient should've suffered from *atleast* two diseases. So my data frame will have two columns 'ID' and 'Disease'. I want to do a basket analysis with this data, where ID will be the identifier and we will establish rules based on the 'Disease' column. How can I generate this type of data in R? Perhaps something along these lines for 20 cases: data.frame(patient=1:20, disease = sapply(pmin(2+rpois(20, 2), 6), function(n) paste0( sample( c('A','B','C','D','E','F'), n), collapse=+ ) ) + ) patient disease 11 F+D 22 F+A+D+E 33 F+D+C+E 44 B+D+C+A 55 D+A+F+C 66 E+A+D 77 E+F+B+C+A+D 88 A+B+C+D+E 99 B+E+C+F 10 10 C+A 11 11 B+A+D+E+C+F 12 12 B+C 13 13 A+D+B+E 14 14 D+C+E+F+B+A 15 15 C+F+D+E+A 16 16 A+C+B 17 17 C+D+B+E 18 18 A+B 19 19 C+B+D+E+F 20 20 D+C+F -- Regards Abhinaba Roy [[alternative HTML version deleted]] You should read the Posting Guide and learn to post in HTML. PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- David Winsemius Alameda, CA, USA -- Regards Abhinaba Roy Statistician Radix Analytics Pvt. Ltd Ahmedabad David Winsemius Alameda, CA, USA __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius Alameda, CA, USA __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Generating Patient Data
On Jun 24, 2014, at 10:14 PM, Abhinaba Roy wrote: Dear R helpers, I want to generate data for say 1000 patients (i.e., 1000 unique IDs) having suffered from various diseases in the past (say diseases A,B,C,D,E,F). The only condition imposed is that each patient should've suffered from *atleast* two diseases. So my data frame will have two columns 'ID' and 'Disease'. I want to do a basket analysis with this data, where ID will be the identifier and we will establish rules based on the 'Disease' column. How can I generate this type of data in R? Perhaps something along these lines for 20 cases: data.frame(patient=1:20, disease = sapply(pmin(2+rpois(20, 2), 6), function(n) paste0( sample( c('A','B','C','D','E','F'), n), collapse=+ ) ) + ) patient disease 11 F+D 22 F+A+D+E 33 F+D+C+E 44 B+D+C+A 55 D+A+F+C 66 E+A+D 77 E+F+B+C+A+D 88 A+B+C+D+E 99 B+E+C+F 10 10 C+A 11 11 B+A+D+E+C+F 12 12 B+C 13 13 A+D+B+E 14 14 D+C+E+F+B+A 15 15 C+F+D+E+A 16 16 A+C+B 17 17 C+D+B+E 18 18 A+B 19 19 C+B+D+E+F 20 20 D+C+F -- Regards Abhinaba Roy [[alternative HTML version deleted]] You should read the Posting Guide and learn to post in HTML. PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- David Winsemius Alameda, CA, USA __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Generating Patient Data
Hi David, I was thinking something like this: ID Disease 1 A 2 B 3 A 1C 2D 5A 4B 3D 2A .... How can this be done? On Wed, Jun 25, 2014 at 11:34 AM, David Winsemius dwinsem...@comcast.net wrote: On Jun 24, 2014, at 10:14 PM, Abhinaba Roy wrote: Dear R helpers, I want to generate data for say 1000 patients (i.e., 1000 unique IDs) having suffered from various diseases in the past (say diseases A,B,C,D,E,F). The only condition imposed is that each patient should've suffered from *atleast* two diseases. So my data frame will have two columns 'ID' and 'Disease'. I want to do a basket analysis with this data, where ID will be the identifier and we will establish rules based on the 'Disease' column. How can I generate this type of data in R? Perhaps something along these lines for 20 cases: data.frame(patient=1:20, disease = sapply(pmin(2+rpois(20, 2), 6), function(n) paste0( sample( c('A','B','C','D','E','F'), n), collapse=+ ) ) + ) patient disease 11 F+D 22 F+A+D+E 33 F+D+C+E 44 B+D+C+A 55 D+A+F+C 66 E+A+D 77 E+F+B+C+A+D 88 A+B+C+D+E 99 B+E+C+F 10 10 C+A 11 11 B+A+D+E+C+F 12 12 B+C 13 13 A+D+B+E 14 14 D+C+E+F+B+A 15 15 C+F+D+E+A 16 16 A+C+B 17 17 C+D+B+E 18 18 A+B 19 19 C+B+D+E+F 20 20 D+C+F -- Regards Abhinaba Roy [[alternative HTML version deleted]] You should read the Posting Guide and learn to post in HTML. PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- David Winsemius Alameda, CA, USA -- Regards Abhinaba Roy Statistician Radix Analytics Pvt. Ltd Ahmedabad [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Generating Patient Data
# build off of david's suggestion x - data.frame( patient= 1:20 , disease = sapply( pmin( 2 + rpois( 20 , 2 ) , 6 ) , function( n ) paste0( sample( c('A','B','C','D','E','F'), n), collapse=+ ) ) ) # break the diseases into a list, one entry per patient y - strsplit( as.character( x$disease ) , \\+ ) # melt the list library(reshape2) z - melt( y ) # re-name the columns in that result names( z ) - c( disease , patient ) # print the results to the screen z # compare the structure to `x` if you like x On Wed, Jun 25, 2014 at 2:18 AM, Abhinaba Roy abhinabaro...@gmail.com wrote: Hi David, I was thinking something like this: ID Disease 1 A 2 B 3 A 1C 2D 5A 4B 3D 2A .... How can this be done? On Wed, Jun 25, 2014 at 11:34 AM, David Winsemius dwinsem...@comcast.net wrote: On Jun 24, 2014, at 10:14 PM, Abhinaba Roy wrote: Dear R helpers, I want to generate data for say 1000 patients (i.e., 1000 unique IDs) having suffered from various diseases in the past (say diseases A,B,C,D,E,F). The only condition imposed is that each patient should've suffered from *atleast* two diseases. So my data frame will have two columns 'ID' and 'Disease'. I want to do a basket analysis with this data, where ID will be the identifier and we will establish rules based on the 'Disease' column. How can I generate this type of data in R? Perhaps something along these lines for 20 cases: data.frame(patient=1:20, disease = sapply(pmin(2+rpois(20, 2), 6), function(n) paste0( sample( c('A','B','C','D','E','F'), n), collapse=+ ) ) + ) patient disease 11 F+D 22 F+A+D+E 33 F+D+C+E 44 B+D+C+A 55 D+A+F+C 66 E+A+D 77 E+F+B+C+A+D 88 A+B+C+D+E 99 B+E+C+F 10 10 C+A 11 11 B+A+D+E+C+F 12 12 B+C 13 13 A+D+B+E 14 14 D+C+E+F+B+A 15 15 C+F+D+E+A 16 16 A+C+B 17 17 C+D+B+E 18 18 A+B 19 19 C+B+D+E+F 20 20 D+C+F -- Regards Abhinaba Roy [[alternative HTML version deleted]] You should read the Posting Guide and learn to post in HTML. PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- David Winsemius Alameda, CA, USA -- Regards Abhinaba Roy Statistician Radix Analytics Pvt. Ltd Ahmedabad [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Generating Patient Data
Hi, Check if this works: set.seed(495) dat - data.frame(ID=sample(1:10,20,replace=TRUE), Disease=sample(LETTERS[1:6], 20, replace=TRUE) ) subset(melt(table(dat)[rowSums(!!table(dat))1,]), !!value,select=1:2) ID Disease 1 2 A 3 4 A 4 6 A 6 10 A 8 3 B 15 4 C 16 6 C 20 3 D 22 6 D 24 10 D 26 3 E 27 4 E 29 7 E 31 2 F 33 4 F 35 7 F A.K. On Wednesday, June 25, 2014 1:17 AM, Abhinaba Roy abhinabaro...@gmail.com wrote: Dear R helpers, I want to generate data for say 1000 patients (i.e., 1000 unique IDs) having suffered from various diseases in the past (say diseases A,B,C,D,E,F). The only condition imposed is that each patient should've suffered from *atleast* two diseases. So my data frame will have two columns 'ID' and 'Disease'. I want to do a basket analysis with this data, where ID will be the identifier and we will establish rules based on the 'Disease' column. How can I generate this type of data in R? -- Regards Abhinaba Roy [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Generating Patient Data
Also, you can do: library(dplyr) dat%%group_by(ID)%%filter(length(unique(Disease))1)%%arrange(Disease,ID) A.K. On Wednesday, June 25, 2014 3:45 AM, arun smartpink...@yahoo.com wrote: Forgot about: library(reshape2) On , arun smartpink...@yahoo.com wrote: Hi, Check if this works: set.seed(495) dat - data.frame(ID=sample(1:10,20,replace=TRUE), Disease=sample(LETTERS[1:6], 20, replace=TRUE) ) subset(melt(table(dat)[rowSums(!!table(dat))1,]), !!value,select=1:2) ID Disease 1 2 A 3 4 A 4 6 A 6 10 A 8 3 B 15 4 C 16 6 C 20 3 D 22 6 D 24 10 D 26 3 E 27 4 E 29 7 E 31 2 F 33 4 F 35 7 F A.K. On Wednesday, June 25, 2014 1:17 AM, Abhinaba Roy abhinabaro...@gmail.com wrote: Dear R helpers, I want to generate data for say 1000 patients (i.e., 1000 unique IDs) having suffered from various diseases in the past (say diseases A,B,C,D,E,F). The only condition imposed is that each patient should've suffered from *atleast* two diseases. So my data frame will have two columns 'ID' and 'Disease'. I want to do a basket analysis with this data, where ID will be the identifier and we will establish rules based on the 'Disease' column. How can I generate this type of data in R? -- Regards Abhinaba Roy [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Generating Patient Data
On Jun 24, 2014, at 11:18 PM, Abhinaba Roy wrote: Hi David, I was thinking something like this: ID Disease 1 A 2 B 3 A 1C 2D 5A 4B 3D 2A .... How can this be done? do.call(rbind, lapply( 1:20, function(pt) { data.frame( patient=pt, disease= sample( c('A','B','C','D','E','F'), pmin(2+rpois(1, 2), 6)) )}) ) -- David. On Wed, Jun 25, 2014 at 11:34 AM, David Winsemius dwinsem...@comcast.net wrote: On Jun 24, 2014, at 10:14 PM, Abhinaba Roy wrote: Dear R helpers, I want to generate data for say 1000 patients (i.e., 1000 unique IDs) having suffered from various diseases in the past (say diseases A,B,C,D,E,F). The only condition imposed is that each patient should've suffered from *atleast* two diseases. So my data frame will have two columns 'ID' and 'Disease'. I want to do a basket analysis with this data, where ID will be the identifier and we will establish rules based on the 'Disease' column. How can I generate this type of data in R? Perhaps something along these lines for 20 cases: data.frame(patient=1:20, disease = sapply(pmin(2+rpois(20, 2), 6), function(n) paste0( sample( c('A','B','C','D','E','F'), n), collapse=+ ) ) + ) patient disease 11 F+D 22 F+A+D+E 33 F+D+C+E 44 B+D+C+A 55 D+A+F+C 66 E+A+D 77 E+F+B+C+A+D 88 A+B+C+D+E 99 B+E+C+F 10 10 C+A 11 11 B+A+D+E+C+F 12 12 B+C 13 13 A+D+B+E 14 14 D+C+E+F+B+A 15 15 C+F+D+E+A 16 16 A+C+B 17 17 C+D+B+E 18 18 A+B 19 19 C+B+D+E+F 20 20 D+C+F -- Regards Abhinaba Roy [[alternative HTML version deleted]] You should read the Posting Guide and learn to post in HTML. PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- David Winsemius Alameda, CA, USA -- Regards Abhinaba Roy Statistician Radix Analytics Pvt. Ltd Ahmedabad David Winsemius Alameda, CA, USA __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.