[R] Stats question: Comparison of the same individuals during two exposure times
Hi, I'm hoping that someone will be able to help. I would like to compare how covariates associate with the risk of a binary outcome during two periods. Period 1 will be non-exposure to a treatment and period 2 will be exposure to a treatment. The same individuals will be examined in each group but I want to be able to compare the association of certain covariates between the two groups to see if there is a treatment interaction. I've looked at case-crossover designs and time series analysis and don't think that they are suitable. The cohort has longitudinal data so individuals will go onto treatment at different times and the effect of the treatment needs to be administered for a while before it has an effect. The reason why I cannot just go ahead with an exposed vs unexposed design is that most individuals in the cohort end up on the treatment eventually and the unexposed group is very small and lacks power for a meaningful comparison. Is there anyway to compare the same individuals during different exposure times and to look at the effect of different covariates under the exposed and unexposed conditions? Thanks for you help, Natalie - Natalie Van Zuydam PhD Student University of Dundee nvanzuy...@dundee.ac.uk -- View this message in context: http://r.789695.n4.nabble.com/Stats-question-Comparison-of-the-same-individuals-during-two-exposure-times-tp4636732.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Subsetting a data frame
Hi R users, I really need help with subsetting data frames: I have a large database of medical records and I want to be able to match patterns from a list of search terms . I've used this simplified data frame in a previous example: db <- structure(list(ind = c("ind1", "ind2", "ind3", "ind4"), test1 = c(1, 2, 1.3, 3), test2 = c(56L, 27L, 58L, 2L), test3 = c(1.1, 28, 9, 1.2)), .Names = c("ind", "test1", "test2", "test3"), class = "data.frame", row.names = c(NA, -4L)) terms_include <- c("1","2","3") terms_exclude <- c("1.1","1.2","1.3") So in this example I want to include all the terms from terms include as long as they don't occur with terms exclude in the same row of the data frame. Previously I was given this function which works very well if you want to match exactly: f <- function(x) !any(x %in% terms_exclude) && any(x %in% terms_include) db[apply(db[, -1], 1, f), ] ind test1 test2 test3 2 ind2 227 28.0 4 ind4 3 2 1.2 I would like to know if there is a way to write a similar function that looks for matches that start with the query string: as in grepl("^pattern",x) I started writing a function but am not sure how to get it to return the dataframe or matrix: for (i in 1:length(terms_include)){ db_new <- apply(db,2, grepl,pattern=i) } Applying this function gives me: db_new <- structure(c(FALSE, FALSE, TRUE, FALSE, FALSE, FALSE, TRUE, TRUE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE), .Dim = c(4L, 4L), .Dimnames = list(NULL, c("ind", "test1", "test2", "test3" ))) So the above is searching the pattern anywhere in the dataframe instead of just at the beginning of the string. How would I incorporate look for terms to include but don't return the row of the data frame if it also includes one of the terms to exclude while using partial matching? I hope that this makes sense. Many thanks, Natalie - Natalie Van Zuydam PhD Student University of Dundee nvanzuy...@dundee.ac.uk -- View this message in context: http://r.789695.n4.nabble.com/Subsetting-a-data-frame-tp4160127p4160127.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Subsetting a data frame with multiple values and exclusions.
Thanks. Such a short and sweet answer that does what it should. - Natalie Van Zuydam PhD Student University of Dundee nvanzuy...@dundee.ac.uk -- View this message in context: http://r.789695.n4.nabble.com/Subsetting-a-data-frame-with-multiple-values-and-exclusions-tp3874967p3877472.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Subsetting a data frame with multiple values and exclusions.
Hi all, I realise that the convention is to provide a working example of my problem but the data are of a sensitive nature so I'm not able to do that in this case. I need to query a database for multiple search terms: db <- structure(list(ind = c("ind1", "ind2", "ind3", "ind4"), test1 = c(1, 2, 1.3, 3), test2 = c(56L, 27L, 58L, 2L), test3 = c(1.1, 28, 9, 1.2)), .Names = c("ind", "test1", "test2", "test3"), class = "data.frame", row.names = c(NA, -4L)) terms_include <- c("1","2","3") terms_exclude <- c("1.1","1.2","1.3") So I need to write a loop where the search of each value in the list of terms_include is searched over the entire data frame. I thought of using apply with grepl and subset? At the same time if the value of terms_include occurs in the same row as values from terms_exclude then that row must be excluded from the output dataframe. I'm not sure where to even begin. I've only worked very basically with subset. The final database is much larger and the number of search terms is many more than are presented here so I would really need to be able to loop over the data frame successively to return a final df with my searched values in at least one of the columns. Your help and assistance is much appreciated, Natalie - Natalie Van Zuydam PhD Student University of Dundee nvanzuy...@dundee.ac.uk -- View this message in context: http://r.789695.n4.nabble.com/Subsetting-a-data-frame-with-multiple-values-and-exclusions-tp3874967p3874967.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Multiple events Cox's model and proportional hazards
Hi, I am using the survival package to perform a Cox's regression analysis on multiple events of myocardial infarctions. I have been using the Andersen and Gill model: coxph(Surv(time1,time2,status)~factor(treatment)+age+sex+cluster(id). I was just wondering if this model should satisfy proportional hazards assumptions. I have run the cox.zph function and the age parameter violates the proportional hazards? What would be the best way to construct this model. Should I include time dependent covariates? Thanks, Natalie - Natalie Van Zuydam PhD Student University of Dundee nvanzuy...@dundee.ac.uk -- View this message in context: http://r.789695.n4.nabble.com/Multiple-events-Cox-s-model-and-proportional-hazards-tp3783031p3783031.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Cox's regression analysis with Left truncated data
Hi, I have a fairly simple question. I would like to use the survival package to perform an analysis on data where an event can have occurred before individuals were recruited into a study. I'm not sure how to do this using the Surv() function. I would have a date of an event and then the enrolment date would be after that. How do I put these two dates into the survival function? Thank you, Natalie - Natalie Van Zuydam PhD Student University of Dundee nvanzuy...@dundee.ac.uk -- View this message in context: http://r.789695.n4.nabble.com/Cox-s-regression-analysis-with-Left-truncated-data-tp3692114p3692114.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] within group sequential subtraction
Hi Everyone, I would like to do sequential subtractions within a group so that I know the time between separate observations for a group of individuals. My data: data <- structure(list(group = c("IND1", "IND1", "IND2", "IND2", "IND2", "IND3", "IND4", "IND5", "IND6", "IND6"), date_obs = structure(c(6468, 7063, 9981, 14186, 14372, 5129, 9767, 11168, 10243, 10647), class = "Date")), .Names = c("group", "date_obs"), row.names = c(NA, 10L), class = "data.frame") So I start with: group date_obs 1 IND1 1987-09-17 2 IND1 1989-05-04 3 IND2 1997-04-30 4 IND2 2008-11-03 5 IND2 2009-05-08 6 IND3 1984-01-17 7 IND4 1996-09-28 8 IND5 2000-07-30 9 IND6 1998-01-17 10 IND6 1999-02-25 what I would like: group date_obs time 1 IND1 1987-09-17 NA 2 IND1 1989-05-04 595 3 IND2 1997-04-30 NA 4 IND2 2008-11-03 4205 5 IND2 2009-05-08 186 6 IND3 1984-01-17 NA 7 IND4 1996-09-28 NA 8 IND5 2000-07-30 NA 9 IND6 1998-01-17 NA 10 IND6 1999-02-25 404 So that if there is one entry/individual a 0/NA would be acceptable and if there is more than one entry/individual the sequential difference would be calculated. I started with some code but it I cannot edit it appropriately. x <- do.call(rbind, lapply(split(data, data$group), function(dat) { dat <- dat[order(dat$date_obs), ] d<-diff(dat$date_obs) dat <- rbind(dat,d) })) I get this error: "Error in as.Date.numeric(value) : 'origin' must be supplied" so I'm not sure if it does what I need it to do. In addition to this the vector lengths won't match up as the first date in the sequence won't be subtracted from itself. I'm not sure if anyone knows an easier way to achieve this. Thanks for the help, Natalie - Natalie Van Zuydam PhD Student University of Dundee nvanzuy...@dundee.ac.uk -- View this message in context: http://r.789695.n4.nabble.com/within-group-sequential-subtraction-tp3346033p3346033.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Cleaning date columns
Dear Bill, Thanks very much for the reply and for the code. I have amended my personal details for future posts. I was wondering if there were any good books or tutorials for writing code similar to what you have provided above? Best wishes, Natalie Van Zuydam - Natalie Van Zuydam PhD Student University of Dundee nvanzuy...@dundee.ac.uk -- View this message in context: http://r.789695.n4.nabble.com/Cleaning-date-columns-tp3343359p3345482.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.