Re: [R] generating multiple sequences in subsets of data

Jason Baucom Fri, 11 Sep 2009 13:39:06 -0700

My apologies for bringing up an old topic, but still having some problems!

I got this code to work, and it was running perfectly fine. I tried it with a 
larger data set and it crashed my machine, slowly chewing up memory until it 
could not allocate any more for the process. The following line killed me:


merged_cut_col$pickseq<-with(merged_cut_col,ave(as.numeric(as.Date(pickts)),cpid,FUN=seq))

So, I thought I'd try it another way, using the transformBy in the doBy package:

merged_cut_col<-transformBy(~cpid,data=merged_cut_col,pickseqREDO=seq(cpid))

This too ran for hours until eventually running out of memory. I've tried it on 
a beefier machine and I run in to the same problem.

Is there an alternative to these methods that would be less memory/time 
intensive? This is a fairly simple routine I'm trying, just generating sequence 
numbers based on simple criteria. I'm surprised it's bringing my computer to 
its knees. I'm running about 1M rows now, but doing other operations such as 
merges or adding new columns/rows seems fine.

-----Original Message-----
From: David Winsemius [mailto:dwinsem...@comcast.net] 
Sent: Thursday, August 27, 2009 12:48 PM
To: Jason Baucom
Cc: Henrique Dallazuanna; r-help@r-project.org; Steven Few
Subject: Re: [R] generating multiple sequences in subsets of data


On Aug 27, 2009, at 11:58 AM, Jason Baucom wrote:

> I got this to work. Thanks for the insight! row7 is what I need.
>
>
>
>> checkLimit <-function(x) x<3
>
>> stuff$row6<-checkLimit(stuff$row1)

You don't actually need those intermediate steps:

 > stuff$row7 <- with(stuff, ave(row1, row2, row1 < 3, FUN = seq))
 > stuff
    row1 row2 row7
1     0    1    1
2     1    1    2
3     2    1    3
4     3    1    1
5     4    1    2
6     5    1    3
7     1    2    1
8     2    2    2
9     3    2    1
10    4    2    2

The expression row1 < 3 gets turned into a logical vector that ave()  
is perfectly happy with.

-- 
David Winsemius

>
>> stuff$row7 <- with(stuff, ave(row1,row2, row6, FUN = sequence))
>
>> stuff
>
>   row1 row2 row3 row4 row5  row6 row7
>
> 1     0    1    1    1    1  TRUE    1
>
> 2     1    1    2    2    2  TRUE    2
>
> 3     2    1    3    3    3  TRUE    3
>
> 4     3    1    4    1    4 FALSE    1
>
> 5     4    1    5    1    5 FALSE    2
>
> 6     5    1    6    1    6 FALSE    3
>
> 7     1    2    1    1    1  TRUE    1
>
> 8     2    2    2    2    2  TRUE    2
>
> 9     3    2    3    1    3 FALSE    1
>
> 10    4    2    4    1    4 FALSE    2
>
>
>
> Jason
>
>
>
> ________________________________
>
> From: Henrique Dallazuanna [mailto:www...@gmail.com]
> Sent: Thursday, August 27, 2009 11:02 AM
> To: Jason Baucom
> Cc: r-help@r-project.org; Steven Few
> Subject: Re: [R] generating multiple sequences in subsets of data
>
>
>
> Try this;
>
> stuff$row3 <- with(stuff, ave(row1, row2, FUN = seq))
>
> I don't understand the fourth column
>
> On Thu, Aug 27, 2009 at 11:55 AM, Jason Baucom  
> <jason.bau...@ateb.com> wrote:
>
> I'm running into a problem I can't seem to find a solution for. I'm
> attempting to add sequences into an existing data set based on subsets
> of the data.  I've done this using a for loop with a small subset of
> data, but attempting the same process using real data (200k rows) is
> taking way too long.
>
>
>
> Here is some sample data and my ultimate goal
>
>> row1<-c(0,1,2,3,4,5,1,2,3,4)
>
>> row2<-c(1,1,1,1,1,1,2,2,2,2)
>
>> stuff<-data.frame(row1=row1,row2=row2)
>
>> stuff
>
>  row1 row2
>
> 1     0    1
>
> 2     1    1
>
> 3     2    1
>
> 4     3    1
>
> 5     4    1
>
> 6     5    1
>
> 7     1    2
>
> 8     2    2
>
> 9     3    2
>
> 10    4    2
>
>
>
>
>
> I need to derive 2 columns. I need a sequence for each unique row2,  
> and
> then I need a sequence that restarts based on a cutoff value for row1
> and unique row2. The following table is what is -should- look like  
> using
> a cutoff of 3 for row4
>
>
>
>  row1 row2 row3 row4
>
> 1     0    1    1    1
>
> 2     1    1    2    2
>
> 3     2    1    3    3
>
> 4     3    1    4    1
>
> 5     4    1    5    2
>
> 6     5    1    6    3
>
> 7     1    2    1    1
>
> 8     2    2    2    2
>
> 9     3    2    3    1
>
> 10    4    2    4    2
>
>
>
> I need something like row3<-sequence(nrow(unique(stuff$row2))) that
> actually works :-) Here is the for loop that functions properly for
> row3:
>
>
>
> stuff$row3<-c(1)
>
> for (i in 2:nrow(stuff)) { if ( stuff$row2[i] == stuff$row2[i-1]) {
> stuff$row3[i] = stuff$row3[i-1]+1}}
>
> Thanks!
>
>
>
> Jason Baucom
>
> Ateb, Inc.
>
> 919.882.4992 O
>
> 919.872.1645 F
>
> www.ateb.com <http://www.ateb.com/>
>
>
>
>
>       [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
>
>
> -- 
> Henrique Dallazuanna
> Curitiba-Paraná-Brasil
> 25° 25' 40" S 49° 16' 22" O
>
>
>       [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius, MD
Heritage Laboratories
West Hartford, CT

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] generating multiple sequences in subsets of data

Reply via email to