[R] Data manipulation question
Dear R-listers, I am a relatively inexperienced R-user currently migrating from Stata. I am deeply frustrated by this data manipulation question: I know how I could do it in Stata, but I cannot make it work in R. I have a data frame of hospitalization data where each row represents an admission. I need to know when patients were first discharged, but the problem is that patients were sometimes transferred between hospital departments. In my data a transfer looks like a new admission, except that it has a 'start' date equal to the previous admission's 'stop' date. Here is an example: id <- c(rep("a",4),rep("b",2), rep("c",5), rep("d",1)) start <- c(c(0,6,17,20),c(0,1),c(0,5,10,11,50),c(0)) stop <- c(c(6,12,20,30),c(1,10),c(3,10,11,30,55),c(6)) data <- as.data.frame(cbind(id,start,stop)) data #id start stop # 1 a 06 # 2 a 6 12 # 3 a17 20 # 4 a20 30 # 5 b 01 # 6 b 1 10 # 7 c 03 # 8 c 5 10 # 9 c10 11 # 10 c11 30 # 11 c50 55 # 12 d 06 So, what I want to end up with is this: id start stop a 0 12 # This patient was transferred at time 6 and discharged at time 12. The admission starting at time 17 is therefore irrelevant. b 0 10 c 0 3 d 0 6 I have tried tons of variations over lapply, sapply, split, for etc., all to no avail. Thank you in advance for any assistance. Best regards, Peter Jepsen, MD. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Data manipulation question
Hi all, Suppose I have the following data.frame, with an id column and two variables columns : idX Y 0001 NA 21 0002 NA 13 0003 000145 0004 NA 71 0005 000320 What I would like to do is to create a new variable Z whose values are the Y value for the id value in X, that is : idX Y Z 0001 NA 21 NA 0002 NA 13 NA 0003 000145 21 0004 NA 71 NA 0005 000320 45 Do you have an idea on how to obtain that without using a for loop ? Thanks in advance for any help, Julien Here is the R code to reproduce the first data.frame : id <- c("0001","0002","0003","0004","0005") x <- c(NA, NA, "0001", NA, "0003") y <- c(21,13,45,71,20) d <- data.frame(id,x,y) -- Julien Barnier Groupe de recherche sur la socialisation ENS-LSH - Lyon, France __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Data manipulation question
How about: id <- c(rep("a",4),rep("b",2), rep("c",5), rep("d",1)) start <- c(c(0,6,17,20),c(0,1),c(0,5,10,11,50),c(0)) stop <- c(c(6,12,20,30),c(1,10),c(3,10,11,30,55),c(6)) data <- data.frame(id,start,stop) f <- function(data){ m <- match(data$start,data$stop) + 1 if (length(m)==1 && is.na(m)) m <- 1 if (length(m) > 1 && is.na(m[2])) m <- 1 data$stop[min(m,na.rm=T)] } by(data,data$id,f) The if statements in the function are for some special cases, in all the other cases the firs line will do the trick. I would like to add that using data is a somewhat bad behavior, as this overwrites the build in data function of R. And I changed the way you made up the data.frame, as your method would convert everything to factors. Good luck Bart Peter Jepsen wrote: > > Dear R-listers, > > I am a relatively inexperienced R-user currently migrating from Stata. I > am deeply frustrated by this data manipulation question: I know how I > could do it in Stata, but I cannot make it work in R. > > I have a data frame of hospitalization data where each row represents an > admission. I need to know when patients were first discharged, but the > problem is that patients were sometimes transferred between hospital > departments. In my data a transfer looks like a new admission, except > that it has a 'start' date equal to the previous admission's 'stop' > date. > > Here is an example: > > id <- c(rep("a",4),rep("b",2), rep("c",5), rep("d",1)) > start <- c(c(0,6,17,20),c(0,1),c(0,5,10,11,50),c(0)) > stop <- c(c(6,12,20,30),c(1,10),c(3,10,11,30,55),c(6)) > data <- as.data.frame(cbind(id,start,stop)) > data > #id start stop > # 1 a 06 > # 2 a 6 12 > # 3 a17 20 > # 4 a20 30 > # 5 b 01 > # 6 b 1 10 > # 7 c 03 > # 8 c 5 10 > # 9 c10 11 > # 10 c11 30 > # 11 c50 55 > # 12 d 06 > > So, what I want to end up with is this: > > id start stop > a 0 12 # This patient was transferred at time 6 and discharged at > time 12. The admission starting at time 17 is therefore irrelevant. > b 0 10 > c 0 3 > d 0 6 > > I have tried tons of variations over lapply, sapply, split, for etc., > all to no avail. > > Thank you in advance for any assistance. > > Best regards, > Peter Jepsen, MD. > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > -- View this message in context: http://www.nabble.com/Data-manipulation-question-tp20356835p20358624.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Data manipulation question
On Thu, Nov 6, 2008 at 4:23 PM, Peter Jepsen <[EMAIL PROTECTED]> wrote: > > Here is an example: > > id <- c(rep("a",4),rep("b",2), rep("c",5), rep("d",1)) > start <- c(c(0,6,17,20),c(0,1),c(0,5,10,11,50),c(0)) > stop <- c(c(6,12,20,30),c(1,10),c(3,10,11,30,55),c(6)) > data <- as.data.frame(cbind(id,start,stop)) > data > #id start stop > # 1 a 06 > # 2 a 6 12 > # 3 a17 20 > # 4 a20 30 > # 5 b 01 > # 6 b 1 10 > # 7 c 03 > # 8 c 5 10 > # 9 c10 11 > # 10 c11 30 > # 11 c50 55 > # 12 d 06 > > So, what I want to end up with is this: > > id start stop > a 0 12 # This patient was transferred at time 6 and discharged at > time 12. The admission starting at time 17 is therefore irrelevant. > b 0 10 > c 0 3 > d 0 6 > Try this: result <- list() num <- length(levels(factor(data$id))) length(result) <- 3*num dim(result) <- c(3,num) result <- data[data$start == 0,] Y <- as.integer(row.names(result)) for (i in 1:num) { if (Y[i] == dim(data)[1]) (result[i,3] <- data[dim(data)[1],3]) else (result[i,3] <- data[Y[i]+1,3]) } result Sorry it is ugly cuz i am new too but hopefully it gives you some ideas. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Data manipulation question
Thank you for your prompt assistance, cruz and Bart. Bart set me on the right track, and I modified his proposal to this: f <- function(data){ m <- match(data$stop,data$start) n <- min(length(m),which(is.na(m))) data$stop[n] } by(data,data$id,f) It also handles some special cases outside my small example dataset. Thank you again! Peter. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of bartjoosen Sent: 6. november 2008 11:31 To: r-help@r-project.org Subject: Re: [R] Data manipulation question How about: id <- c(rep("a",4),rep("b",2), rep("c",5), rep("d",1)) start <- c(c(0,6,17,20),c(0,1),c(0,5,10,11,50),c(0)) stop <- c(c(6,12,20,30),c(1,10),c(3,10,11,30,55),c(6)) data <- data.frame(id,start,stop) f <- function(data){ m <- match(data$start,data$stop) + 1 if (length(m)==1 && is.na(m)) m <- 1 if (length(m) > 1 && is.na(m[2])) m <- 1 data$stop[min(m,na.rm=T)] } by(data,data$id,f) The if statements in the function are for some special cases, in all the other cases the firs line will do the trick. I would like to add that using data is a somewhat bad behavior, as this overwrites the build in data function of R. And I changed the way you made up the data.frame, as your method would convert everything to factors. Good luck Bart Peter Jepsen wrote: > > Dear R-listers, > > I am a relatively inexperienced R-user currently migrating from Stata. I > am deeply frustrated by this data manipulation question: I know how I > could do it in Stata, but I cannot make it work in R. > > I have a data frame of hospitalization data where each row represents an > admission. I need to know when patients were first discharged, but the > problem is that patients were sometimes transferred between hospital > departments. In my data a transfer looks like a new admission, except > that it has a 'start' date equal to the previous admission's 'stop' > date. > > Here is an example: > > id <- c(rep("a",4),rep("b",2), rep("c",5), rep("d",1)) > start <- c(c(0,6,17,20),c(0,1),c(0,5,10,11,50),c(0)) > stop <- c(c(6,12,20,30),c(1,10),c(3,10,11,30,55),c(6)) > data <- as.data.frame(cbind(id,start,stop)) > data > #id start stop > # 1 a 06 > # 2 a 6 12 > # 3 a17 20 > # 4 a20 30 > # 5 b 01 > # 6 b 1 10 > # 7 c 03 > # 8 c 5 10 > # 9 c10 11 > # 10 c11 30 > # 11 c50 55 > # 12 d 06 > > So, what I want to end up with is this: > > id start stop > a 0 12 # This patient was transferred at time 6 and discharged at > time 12. The admission starting at time 17 is therefore irrelevant. > b 0 10 > c 0 3 > d 0 6 > > I have tried tons of variations over lapply, sapply, split, for etc., > all to no avail. > > Thank you in advance for any assistance. > > Best regards, > Peter Jepsen, MD. > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > -- View this message in context: http://www.nabble.com/Data-manipulation-question-tp20356835p20358624.htm l Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Data manipulation question
Try this: transform(d, z = y[match(x, id)]) On 10/10/07, Julien Barnier <[EMAIL PROTECTED]> wrote: > Hi all, > > Suppose I have the following data.frame, with an id column and two > variables columns : > > idX Y > 0001 NA 21 > 0002 NA 13 > 0003 000145 > 0004 NA 71 > 0005 000320 > > What I would like to do is to create a new variable Z whose values are > the Y value for the id value in X, that is : > > idX Y Z > 0001 NA 21 NA > 0002 NA 13 NA > 0003 000145 21 > 0004 NA 71 NA > 0005 000320 45 > > Do you have an idea on how to obtain that without using a for loop ? > > Thanks in advance for any help, > > Julien > > > > Here is the R code to reproduce the first data.frame : > > id <- c("0001","0002","0003","0004","0005") > x <- c(NA, NA, "0001", NA, "0003") > y <- c(21,13,45,71,20) > d <- data.frame(id,x,y) > > > > -- > Julien Barnier > Groupe de recherche sur la socialisation > ENS-LSH - Lyon, France > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Data manipulation question (opposite of table?)
Dear R users, I am a new user (probably obvious by my question) and have really learned a lot from reading this list. Thank you all very much. My main struggles with R are with data manipulation. So here is my question... I have data that is organized as below, this is a short example. value count 1123225 1588524 2246420 etc... the 'value' field is distances and the 'count' field is the number of times that each distance occurs. So I guess it is in the same format as the output for the table() function. What I need to do is make one long vector (or list) that includes all the actual numbers. In other words 11232 listed 25 times followed by 15885 listed 24 times etc. etc. Thank you again in advance, Michael Never miss a thing. Make Yahoo your home page. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Data manipulation question (opposite of table?)
Try this: rep(x[[1]], x[[2]]) On 27/01/2008, Michael Denslow <[EMAIL PROTECTED]> wrote: > Dear R users, > > I am a new user (probably obvious by my question) and > have really learned a lot from reading this list. > Thank you all very much. My main struggles with R are > with data manipulation. > > So here is my question... > I have data that is organized as below, this is a > short example. > > value count > 1123225 > 1588524 > 2246420 > etc... > > the 'value' field is distances and the 'count' field > is the number of times that each distance occurs. So I > guess it is in the same format as the output for the > table() function. > > What I need to do is make one long vector (or list) > that includes all the actual numbers. In other words > 11232 listed 25 times followed by 15885 listed 24 > times etc. etc. > > Thank you again in advance, > Michael > > > > > > > > > > > > Never miss a thing. Make Yahoo your home page. > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Henrique Dallazuanna Curitiba-Paraná-Brasil 25° 25' 40" S 49° 16' 22" O __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Data manipulation question (opposite of table?)
Michael Denslow <[EMAIL PROTECTED]> wrote in news:[EMAIL PROTECTED]: > Dear R users, > > I am a new user (probably obvious by my question) and > have really learned a lot from reading this list. > Thank you all very much. My main struggles with R are > with data manipulation. > > So here is my question... > I have data that is organized as below, this is a > short example. > > value count > 1123225 > 1588524 > 2246420 > etc... > > the 'value' field is distances and the 'count' field > is the number of times that each distance occurs. So I > guess it is in the same format as the output for the > table() function. > > What I need to do is make one long vector (or list) > that includes all the actual numbers. In other words > 11232 listed 25 times followed by 15885 listed 24 > times etc. etc. Try something like this? dt<-data.frame(value=c(11123,14585),count=c(3,5)) exp.dt<-with(dt,rep(value,count)) > exp.dt [1] 11123 11123 11123 14585 14585 14585 14585 14585 -- David Winsemius __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.