[R] Data manipulation question

2008-11-06 Thread Peter Jepsen
Dear R-listers,

I am a relatively inexperienced R-user currently migrating from Stata. I
am deeply frustrated by this data manipulation question: I know how I
could do it in Stata, but I cannot make it work in R.

I have a data frame of hospitalization data where each row represents an
admission. I need to know when patients were first discharged, but the
problem is that patients were sometimes transferred between hospital
departments. In my data a transfer looks like a new admission, except
that it has a 'start' date equal to the previous admission's 'stop'
date.

Here is an example:

id <- c(rep("a",4),rep("b",2), rep("c",5), rep("d",1))
start <- c(c(0,6,17,20),c(0,1),c(0,5,10,11,50),c(0))
stop <- c(c(6,12,20,30),c(1,10),c(3,10,11,30,55),c(6))
data <- as.data.frame(cbind(id,start,stop))
data
#id start stop
# 1   a 06
# 2   a 6   12
# 3   a17   20
# 4   a20   30
# 5   b 01
# 6   b 1   10
# 7   c 03
# 8   c 5   10
# 9   c10   11
# 10  c11   30
# 11  c50   55
# 12  d 06

So, what I want to end up with is this:

id start stop
a  0 12   # This patient was transferred at time 6 and discharged at
time 12. The admission starting at time 17 is therefore irrelevant.
b  0 10   
c  0 3
d  0 6

I have tried tons of variations over lapply, sapply, split, for etc.,
all to no avail. 

Thank you in advance for any assistance.

Best regards,
Peter Jepsen, MD.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Data manipulation question

2007-10-10 Thread Julien Barnier
Hi all,

Suppose I have the following data.frame, with an id column and two
variables columns :

idX   Y
0001  NA  21
0002  NA  13
0003  000145
0004  NA  71
0005  000320

What I would like to do is to create a new variable Z whose values are
the Y value for the id value in X, that is :

idX   Y  Z
0001  NA  21 NA
0002  NA  13 NA
0003  000145 21
0004  NA  71 NA
0005  000320 45

Do you have an idea on how to obtain that without using a for loop ?

Thanks in advance for any help,

Julien



Here is the R code to reproduce the first data.frame :

id <- c("0001","0002","0003","0004","0005")
x <- c(NA, NA, "0001", NA, "0003")
y <- c(21,13,45,71,20)
d <- data.frame(id,x,y)



-- 
Julien Barnier
Groupe de recherche sur la socialisation
ENS-LSH - Lyon, France

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Data manipulation question

2008-11-06 Thread bartjoosen

How about: 

id <- c(rep("a",4),rep("b",2), rep("c",5), rep("d",1)) 
start <- c(c(0,6,17,20),c(0,1),c(0,5,10,11,50),c(0)) 
stop <- c(c(6,12,20,30),c(1,10),c(3,10,11,30,55),c(6)) 
data <- data.frame(id,start,stop)

f <- function(data){
m <- match(data$start,data$stop) + 1
if (length(m)==1 && is.na(m)) m <- 1 
if (length(m) > 1 && is.na(m[2])) m <- 1
data$stop[min(m,na.rm=T)]
}

by(data,data$id,f)

The if statements in the function are for some special cases, in all the
other cases the firs line will do the trick.
I would like to add that using data is a somewhat bad behavior, as this
overwrites the build in data function of R.
And I changed the way you made up the data.frame, as your method would
convert everything to factors.

Good luck

Bart



Peter Jepsen wrote:
> 
> Dear R-listers,
> 
> I am a relatively inexperienced R-user currently migrating from Stata. I
> am deeply frustrated by this data manipulation question: I know how I
> could do it in Stata, but I cannot make it work in R.
> 
> I have a data frame of hospitalization data where each row represents an
> admission. I need to know when patients were first discharged, but the
> problem is that patients were sometimes transferred between hospital
> departments. In my data a transfer looks like a new admission, except
> that it has a 'start' date equal to the previous admission's 'stop'
> date.
> 
> Here is an example:
> 
> id <- c(rep("a",4),rep("b",2), rep("c",5), rep("d",1))
> start <- c(c(0,6,17,20),c(0,1),c(0,5,10,11,50),c(0))
> stop <- c(c(6,12,20,30),c(1,10),c(3,10,11,30,55),c(6))
> data <- as.data.frame(cbind(id,start,stop))
> data
> #id start stop
> # 1   a 06
> # 2   a 6   12
> # 3   a17   20
> # 4   a20   30
> # 5   b 01
> # 6   b 1   10
> # 7   c 03
> # 8   c 5   10
> # 9   c10   11
> # 10  c11   30
> # 11  c50   55
> # 12  d 06
> 
> So, what I want to end up with is this:
> 
> id start stop
> a  0 12   # This patient was transferred at time 6 and discharged at
> time 12. The admission starting at time 17 is therefore irrelevant.
> b  0 10   
> c  0 3
> d  0 6
> 
> I have tried tons of variations over lapply, sapply, split, for etc.,
> all to no avail. 
> 
> Thank you in advance for any assistance.
> 
> Best regards,
> Peter Jepsen, MD.
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Data-manipulation-question-tp20356835p20358624.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Data manipulation question

2008-11-06 Thread cruz
On Thu, Nov 6, 2008 at 4:23 PM, Peter Jepsen <[EMAIL PROTECTED]> wrote:
>
> Here is an example:
>
> id <- c(rep("a",4),rep("b",2), rep("c",5), rep("d",1))
> start <- c(c(0,6,17,20),c(0,1),c(0,5,10,11,50),c(0))
> stop <- c(c(6,12,20,30),c(1,10),c(3,10,11,30,55),c(6))
> data <- as.data.frame(cbind(id,start,stop))
> data
> #id start stop
> # 1   a 06
> # 2   a 6   12
> # 3   a17   20
> # 4   a20   30
> # 5   b 01
> # 6   b 1   10
> # 7   c 03
> # 8   c 5   10
> # 9   c10   11
> # 10  c11   30
> # 11  c50   55
> # 12  d 06
>
> So, what I want to end up with is this:
>
> id start stop
> a  0 12   # This patient was transferred at time 6 and discharged at
> time 12. The admission starting at time 17 is therefore irrelevant.
> b  0 10
> c  0 3
> d  0 6
>

Try this:

result <- list()
num <- length(levels(factor(data$id)))
length(result) <- 3*num
dim(result) <- c(3,num)
result <- data[data$start == 0,]
Y <- as.integer(row.names(result))

for (i in 1:num) {
  if (Y[i] == dim(data)[1]) (result[i,3] <- data[dim(data)[1],3])
  else (result[i,3] <- data[Y[i]+1,3])
}
result


Sorry it is ugly cuz i am new too but hopefully it gives you some ideas.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Data manipulation question

2008-11-06 Thread Peter Jepsen
Thank you for your prompt assistance, cruz and Bart. 

Bart set me on the right track, and I modified his proposal to this:

f <- function(data){
m <- match(data$stop,data$start) 
n <- min(length(m),which(is.na(m)))
data$stop[n]
}
by(data,data$id,f)

It also handles some special cases outside my small example dataset.

Thank you again!
Peter.


-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
On Behalf Of bartjoosen
Sent: 6. november 2008 11:31
To: r-help@r-project.org
Subject: Re: [R] Data manipulation question


How about: 

id <- c(rep("a",4),rep("b",2), rep("c",5), rep("d",1)) 
start <- c(c(0,6,17,20),c(0,1),c(0,5,10,11,50),c(0)) 
stop <- c(c(6,12,20,30),c(1,10),c(3,10,11,30,55),c(6)) 
data <- data.frame(id,start,stop)

f <- function(data){
m <- match(data$start,data$stop) + 1
if (length(m)==1 && is.na(m)) m <- 1 
if (length(m) > 1 && is.na(m[2])) m <- 1
data$stop[min(m,na.rm=T)]
}

by(data,data$id,f)

The if statements in the function are for some special cases, in all the
other cases the firs line will do the trick.
I would like to add that using data is a somewhat bad behavior, as this
overwrites the build in data function of R.
And I changed the way you made up the data.frame, as your method would
convert everything to factors.

Good luck

Bart



Peter Jepsen wrote:
> 
> Dear R-listers,
> 
> I am a relatively inexperienced R-user currently migrating from Stata.
I
> am deeply frustrated by this data manipulation question: I know how I
> could do it in Stata, but I cannot make it work in R.
> 
> I have a data frame of hospitalization data where each row represents
an
> admission. I need to know when patients were first discharged, but the
> problem is that patients were sometimes transferred between hospital
> departments. In my data a transfer looks like a new admission, except
> that it has a 'start' date equal to the previous admission's 'stop'
> date.
> 
> Here is an example:
> 
> id <- c(rep("a",4),rep("b",2), rep("c",5), rep("d",1))
> start <- c(c(0,6,17,20),c(0,1),c(0,5,10,11,50),c(0))
> stop <- c(c(6,12,20,30),c(1,10),c(3,10,11,30,55),c(6))
> data <- as.data.frame(cbind(id,start,stop))
> data
> #id start stop
> # 1   a 06
> # 2   a 6   12
> # 3   a17   20
> # 4   a20   30
> # 5   b 01
> # 6   b 1   10
> # 7   c 03
> # 8   c 5   10
> # 9   c10   11
> # 10  c11   30
> # 11  c50   55
> # 12  d 06
> 
> So, what I want to end up with is this:
> 
> id start stop
> a  0 12   # This patient was transferred at time 6 and discharged
at
> time 12. The admission starting at time 17 is therefore irrelevant.
> b  0 10   
> c  0 3
> d  0 6
> 
> I have tried tons of variations over lapply, sapply, split, for etc.,
> all to no avail. 
> 
> Thank you in advance for any assistance.
> 
> Best regards,
> Peter Jepsen, MD.
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 
> 

-- 
View this message in context:
http://www.nabble.com/Data-manipulation-question-tp20356835p20358624.htm
l
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Data manipulation question

2007-10-10 Thread Gabor Grothendieck
Try this:

transform(d, z = y[match(x, id)])


On 10/10/07, Julien Barnier <[EMAIL PROTECTED]> wrote:
> Hi all,
>
> Suppose I have the following data.frame, with an id column and two
> variables columns :
>
> idX   Y
> 0001  NA  21
> 0002  NA  13
> 0003  000145
> 0004  NA  71
> 0005  000320
>
> What I would like to do is to create a new variable Z whose values are
> the Y value for the id value in X, that is :
>
> idX   Y  Z
> 0001  NA  21 NA
> 0002  NA  13 NA
> 0003  000145 21
> 0004  NA  71 NA
> 0005  000320 45
>
> Do you have an idea on how to obtain that without using a for loop ?
>
> Thanks in advance for any help,
>
> Julien
>
>
>
> Here is the R code to reproduce the first data.frame :
>
> id <- c("0001","0002","0003","0004","0005")
> x <- c(NA, NA, "0001", NA, "0003")
> y <- c(21,13,45,71,20)
> d <- data.frame(id,x,y)
>
>
>
> --
> Julien Barnier
> Groupe de recherche sur la socialisation
> ENS-LSH - Lyon, France
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Data manipulation question (opposite of table?)

2008-01-27 Thread Michael Denslow
Dear R users,

I am a new user (probably obvious by my question) and
have really learned a lot from reading this list.
Thank you all very much. My main struggles with R are
with data manipulation.

So here is my question...
I have data that is organized as below, this is a
short example.

value count
1123225
1588524
2246420
etc...

the 'value' field is distances and the 'count' field
is the number of times that each distance occurs. So I
guess it is in the same format as the output for the
table() function.

What I need to do is make one long vector (or list)
that includes all the actual numbers. In other words
11232 listed 25 times followed by 15885 listed 24
times etc. etc.

Thank you again in advance,
Michael









  

Never miss a thing.  Make Yahoo your home page.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Data manipulation question (opposite of table?)

2008-01-27 Thread Henrique Dallazuanna
Try this:

rep(x[[1]], x[[2]])



On 27/01/2008, Michael Denslow <[EMAIL PROTECTED]> wrote:
> Dear R users,
>
> I am a new user (probably obvious by my question) and
> have really learned a lot from reading this list.
> Thank you all very much. My main struggles with R are
> with data manipulation.
>
> So here is my question...
> I have data that is organized as below, this is a
> short example.
>
> value count
> 1123225
> 1588524
> 2246420
> etc...
>
> the 'value' field is distances and the 'count' field
> is the number of times that each distance occurs. So I
> guess it is in the same format as the output for the
> table() function.
>
> What I need to do is make one long vector (or list)
> that includes all the actual numbers. In other words
> 11232 listed 25 times followed by 15885 listed 24
> times etc. etc.
>
> Thank you again in advance,
> Michael
>
>
>
>
>
>
>
>
>
>   
> 
> Never miss a thing.  Make Yahoo your home page.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


-- 
Henrique Dallazuanna
Curitiba-Paraná-Brasil
25° 25' 40" S 49° 16' 22" O

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Data manipulation question (opposite of table?)

2008-01-27 Thread David Winsemius
Michael Denslow <[EMAIL PROTECTED]> wrote in
news:[EMAIL PROTECTED]: 

> Dear R users,
> 
> I am a new user (probably obvious by my question) and
> have really learned a lot from reading this list.
> Thank you all very much. My main struggles with R are
> with data manipulation.
> 
> So here is my question...
> I have data that is organized as below, this is a
> short example.
> 
> value count
> 1123225
> 1588524
> 2246420
> etc...
> 
> the 'value' field is distances and the 'count' field
> is the number of times that each distance occurs. So I
> guess it is in the same format as the output for the
> table() function.
> 
> What I need to do is make one long vector (or list)
> that includes all the actual numbers. In other words
> 11232 listed 25 times followed by 15885 listed 24
> times etc. etc.

Try something like this?

dt<-data.frame(value=c(11123,14585),count=c(3,5))
exp.dt<-with(dt,rep(value,count))

> exp.dt
[1] 11123 11123 11123 14585 14585 14585 14585 14585

-- 
David Winsemius

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.