You might try the following function.  First it identifies the last element in 
each run, then the length of each run, then calls sequence() to generate the 
within-run sequence numbers.  my.sequence is a version of sequence that is more 
efficient (less time, less memory) than sequence when there are lots of short 
runs (sequence() calls lapply, which makes a memory consuming list, and then 
unlists it, and my.sequence avoids the big intermediate list).

For your data, f(data) produces the same thing as data$conditional_time.

f<-function(data, use.my.sequence=FALSE){
   n<-nrow(data)
   lastInRun <- with(data, eif | c(id[-1]!=id[-n], TRUE))
   runLengths <- diff(c(0L,which(lastInRun)))
   if (use.my.sequence) {
      my.sequence<- 
function(nvec)seq_len(sum(nvec))-rep.int(c(0L,cumsum(nvec[-length(nvec)])),nvec)
      my.sequence(runLengths)
   } else {
      sequence(runLengths)
   }
}

Bill Dunlap, Spotfire Division, TIBCO Software Inc.
---------------------------------------- 


 Hi everyone,

Please forgive me if my question is simple and my code terrible, I'm new to
R. I am not looking for a ready-made answer, but I would really appreciate
it if someone could share conceptual hints for programming, or point me
toward an R function/package that could speed up my processing time.

Thanks a lot for your help!

##

My dataframe includes the variables 'year', 'id', and 'eif' and has +/- 1.9
million id-year observations

I would like to do 2 things:

-1- I want to create a 'conditional_time' variable, which increases in
increments of 1 every year, but which resets during year(t) if event 'eif'
occured for this 'id' at year(t-1). It should also reset when we switch to a
new 'id'. For example:

dataframe = test
 year        id         eif  conditional_time

1990       1010          0    1
1991       1010          0    2
1992       1010          1    3
1993       1010          0    1
1994       1010          0    2
1995       1010          0    3
1996       1010          0    4
1997       1010          1    5
1998       1010          0    1
1999       1010          0    2
2000       1010          0    3
2001       1010          0    4
2002       1010          0    5
2003       1010          0    6
1990       2010          0    1
1991       2010          0    2
1992       2010          0    3
1993       2010          0    4
1994       2010          0    5
1995       2010          0    6
1996       2010          0    7
1997       2010          0    8
1998       2010          0    9
1999       2010          0    10
2000       2010          0    11
2001       2010          1    12
2002       2010          0    1
2003       2010          0    2

-2- In a copy of the original dataframe, drop all id-year rows that
correspond to years after a given id has experienced his first 'eif' event.

I have written the code below to take care of -1-, but it is incredibly
inefficient. Given the size of my database, and considering how slow my
computer is, I don't think it's practical to use it. Also, it depends on
correct sorting of the dataframe, which might generate errors.

##

for (i in 1:nrow(test)) {
    if (i == 1) {                            # If first id-year
        cond_time <- 1
        test[i, 4] <- cond_time

    } else if ((test[i-1, 1]) != (test[i, 4])) {             # If new id
        cond_time <- 1
        test[i, 4] <- cond_time
     } else {                            # Same id as previous row
        if (test[i, 3] == 0) {
            test[i, 4] <- sum(cond_time, 1)
            cond_time <- test[i, 6]
        } else {
            test[i, 4] <- sum(cond_time, 1)
            cond_time <- 0
            }
        }
}

--
Vincent Arel
M.A. Student, McGill University

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to