subject:"Re\: \[R\] generating multiple sequences in subsets of data"

Re: [R] generating multiple sequences in subsets of data

2009-09-11 Thread Jason Baucom

My apologies for bringing up an old topic, but still having some problems!

I got this code to work, and it was running perfectly fine. I tried it with a 
larger data set and it crashed my machine, slowly chewing up memory until it 
could not allocate any more for the process. The following line killed me:

merged_cut_col$pickseq-with(merged_cut_col,ave(as.numeric(as.Date(pickts)),cpid,FUN=seq))

So, I thought I'd try it another way, using the transformBy in the doBy package:

merged_cut_col-transformBy(~cpid,data=merged_cut_col,pickseqREDO=seq(cpid))

This too ran for hours until eventually running out of memory. I've tried it on 
a beefier machine and I run in to the same problem.

Is there an alternative to these methods that would be less memory/time 
intensive? This is a fairly simple routine I'm trying, just generating sequence 
numbers based on simple criteria. I'm surprised it's bringing my computer to 
its knees. I'm running about 1M rows now, but doing other operations such as 
merges or adding new columns/rows seems fine.

-Original Message-
From: David Winsemius [mailto:dwinsem...@comcast.net] 
Sent: Thursday, August 27, 2009 12:48 PM
To: Jason Baucom
Cc: Henrique Dallazuanna; r-help@r-project.org; Steven Few
Subject: Re: [R] generating multiple sequences in subsets of data


On Aug 27, 2009, at 11:58 AM, Jason Baucom wrote:

 I got this to work. Thanks for the insight! row7 is what I need.



 checkLimit -function(x) x3

 stuff$row6-checkLimit(stuff$row1)

You don't actually need those intermediate steps:

  stuff$row7 - with(stuff, ave(row1, row2, row1  3, FUN = seq))
  stuff
row1 row2 row7
1 011
2 112
3 213
4 311
5 412
6 513
7 121
8 222
9 321
10422

The expression row1  3 gets turned into a logical vector that ave()  
is perfectly happy with.

-- 
David Winsemius


 stuff$row7 - with(stuff, ave(row1,row2, row6, FUN = sequence))

 stuff

   row1 row2 row3 row4 row5  row6 row7

 1 01111  TRUE1

 2 11222  TRUE2

 3 21333  TRUE3

 4 31414 FALSE1

 5 41515 FALSE2

 6 51616 FALSE3

 7 12111  TRUE1

 8 22222  TRUE2

 9 32313 FALSE1

 1042414 FALSE2



 Jason



 

 From: Henrique Dallazuanna [mailto:www...@gmail.com]
 Sent: Thursday, August 27, 2009 11:02 AM
 To: Jason Baucom
 Cc: r-help@r-project.org; Steven Few
 Subject: Re: [R] generating multiple sequences in subsets of data



 Try this;

 stuff$row3 - with(stuff, ave(row1, row2, FUN = seq))

 I don't understand the fourth column

 On Thu, Aug 27, 2009 at 11:55 AM, Jason Baucom  
 jason.bau...@ateb.com wrote:

 I'm running into a problem I can't seem to find a solution for. I'm
 attempting to add sequences into an existing data set based on subsets
 of the data.  I've done this using a for loop with a small subset of
 data, but attempting the same process using real data (200k rows) is
 taking way too long.



 Here is some sample data and my ultimate goal

 row1-c(0,1,2,3,4,5,1,2,3,4)

 row2-c(1,1,1,1,1,1,2,2,2,2)

 stuff-data.frame(row1=row1,row2=row2)

 stuff

  row1 row2

 1 01

 2 11

 3 21

 4 31

 5 41

 6 51

 7 12

 8 22

 9 32

 1042





 I need to derive 2 columns. I need a sequence for each unique row2,  
 and
 then I need a sequence that restarts based on a cutoff value for row1
 and unique row2. The following table is what is -should- look like  
 using
 a cutoff of 3 for row4



  row1 row2 row3 row4

 1 0111

 2 1122

 3 2133

 4 3141

 5 4152

 6 5163

 7 1211

 8 2222

 9 3231

 104242



 I need something like row3-sequence(nrow(unique(stuff$row2))) that
 actually works :-) Here is the for loop that functions properly for
 row3:



 stuff$row3-c(1)

 for (i in 2:nrow(stuff)) { if ( stuff$row2[i] == stuff$row2[i-1]) {
 stuff$row3[i] = stuff$row3[i-1]+1}}

 Thanks!



 Jason Baucom

 Ateb, Inc.

 919.882.4992 O

 919.872.1645 F

 www.ateb.com http://www.ateb.com/




   [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




 -- 
 Henrique Dallazuanna
 Curitiba-Paraná-Brasil
 25° 25' 40 S 49° 16' 22 O


   [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list

Re: [R] generating multiple sequences in subsets of data

2009-09-11 Thread Jason Baucom

A bit of debugging information

 merged_cut_col$pickseq - 
 ave(as.numeric(as.Date(merged_cut_col$pickts)),merged_cut_col$cpid,as.numeric(as.Date(merged_cut_col$pickts))
   as.numeric(as.Date(2008-12-01)),FUN=seq)
Error: cannot allocate vector of size 55 Kb
 memory.size()
[1] 1882.56
 object.size(merged_cut_col)
75250816 bytes
 gc()
   used  (Mb) gc trigger   (Mb)  max used   (Mb)
Ncells   226664   6.11423891   38.1   3463550   92.5
Vcells 19186778 146.4  156381436 1193.1 241372511 1841.6

-Original Message-
From: David Winsemius [mailto:dwinsem...@comcast.net] 
Sent: Thursday, August 27, 2009 12:48 PM
To: Jason Baucom
Cc: Henrique Dallazuanna; r-help@r-project.org; Steven Few
Subject: Re: [R] generating multiple sequences in subsets of data


On Aug 27, 2009, at 11:58 AM, Jason Baucom wrote:

 I got this to work. Thanks for the insight! row7 is what I need.



 checkLimit -function(x) x3

 stuff$row6-checkLimit(stuff$row1)

You don't actually need those intermediate steps:

  stuff$row7 - with(stuff, ave(row1, row2, row1  3, FUN = seq))
  stuff
row1 row2 row7
1 011
2 112
3 213
4 311
5 412
6 513
7 121
8 222
9 321
10422

The expression row1  3 gets turned into a logical vector that ave()  
is perfectly happy with.

-- 
David Winsemius


 stuff$row7 - with(stuff, ave(row1,row2, row6, FUN = sequence))

 stuff

   row1 row2 row3 row4 row5  row6 row7

 1 01111  TRUE1

 2 11222  TRUE2

 3 21333  TRUE3

 4 31414 FALSE1

 5 41515 FALSE2

 6 51616 FALSE3

 7 12111  TRUE1

 8 22222  TRUE2

 9 32313 FALSE1

 1042414 FALSE2



 Jason



 

 From: Henrique Dallazuanna [mailto:www...@gmail.com]
 Sent: Thursday, August 27, 2009 11:02 AM
 To: Jason Baucom
 Cc: r-help@r-project.org; Steven Few
 Subject: Re: [R] generating multiple sequences in subsets of data



 Try this;

 stuff$row3 - with(stuff, ave(row1, row2, FUN = seq))

 I don't understand the fourth column

 On Thu, Aug 27, 2009 at 11:55 AM, Jason Baucom  
 jason.bau...@ateb.com wrote:

 I'm running into a problem I can't seem to find a solution for. I'm
 attempting to add sequences into an existing data set based on subsets
 of the data.  I've done this using a for loop with a small subset of
 data, but attempting the same process using real data (200k rows) is
 taking way too long.



 Here is some sample data and my ultimate goal

 row1-c(0,1,2,3,4,5,1,2,3,4)

 row2-c(1,1,1,1,1,1,2,2,2,2)

 stuff-data.frame(row1=row1,row2=row2)

 stuff

  row1 row2

 1 01

 2 11

 3 21

 4 31

 5 41

 6 51

 7 12

 8 22

 9 32

 1042





 I need to derive 2 columns. I need a sequence for each unique row2,  
 and
 then I need a sequence that restarts based on a cutoff value for row1
 and unique row2. The following table is what is -should- look like  
 using
 a cutoff of 3 for row4



  row1 row2 row3 row4

 1 0111

 2 1122

 3 2133

 4 3141

 5 4152

 6 5163

 7 1211

 8 2222

 9 3231

 104242



 I need something like row3-sequence(nrow(unique(stuff$row2))) that
 actually works :-) Here is the for loop that functions properly for
 row3:



 stuff$row3-c(1)

 for (i in 2:nrow(stuff)) { if ( stuff$row2[i] == stuff$row2[i-1]) {
 stuff$row3[i] = stuff$row3[i-1]+1}}

 Thanks!



 Jason Baucom

 Ateb, Inc.

 919.882.4992 O

 919.872.1645 F

 www.ateb.com http://www.ateb.com/




   [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




 -- 
 Henrique Dallazuanna
 Curitiba-Paraná-Brasil
 25° 25' 40 S 49° 16' 22 O


   [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

David Winsemius, MD
Heritage Laboratories
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] generating multiple sequences in subsets of data

2009-09-11 Thread David Winsemius

Have you tried running merged_cut_col$pickts through something that is  
less complex? Perhaps:


table(merged_cut_col$pickts)

... to see if there are problems with the inner functions? Also I  
think the as.numeric might be superfluous, since Dates are really just  
integers with some attitude,  er, attributes.


--
David.

On Sep 11, 2009, at 4:36 PM, Jason Baucom wrote:

My apologies for bringing up an old topic, but still having some  
problems!


I got this code to work, and it was running perfectly fine. I tried  
it with a larger data set and it crashed my machine, slowly chewing  
up memory until it could not allocate any more for the process. The  
following line killed me:


merged_cut_col$pickseq- 
with(merged_cut_col,ave(as.numeric(as.Date(pickts)),cpid,FUN=seq))


So, I thought I'd try it another way, using the transformBy in the  
doBy package:


merged_cut_col- 
transformBy(~cpid,data=merged_cut_col,pickseqREDO=seq(cpid))


This too ran for hours until eventually running out of memory. I've  
tried it on a beefier machine and I run in to the same problem.


Is there an alternative to these methods that would be less memory/ 
time intensive? This is a fairly simple routine I'm trying, just  
generating sequence numbers based on simple criteria. I'm surprised  
it's bringing my computer to its knees. I'm running about 1M rows  
now, but doing other operations such as merges or adding new columns/ 
rows seems fine.


-Original Message-
From: David Winsemius [mailto:dwinsem...@comcast.net]
Sent: Thursday, August 27, 2009 12:48 PM
To: Jason Baucom
Cc: Henrique Dallazuanna; r-help@r-project.org; Steven Few
Subject: Re: [R] generating multiple sequences in subsets of data


On Aug 27, 2009, at 11:58 AM, Jason Baucom wrote:


I got this to work. Thanks for the insight! row7 is what I need.




checkLimit -function(x) x3



stuff$row6-checkLimit(stuff$row1)


You don't actually need those intermediate steps:


stuff$row7 - with(stuff, ave(row1, row2, row1  3, FUN = seq))
stuff

   row1 row2 row7
1 011
2 112
3 213
4 311
5 412
6 513
7 121
8 222
9 321
10422

The expression row1  3 gets turned into a logical vector that ave()
is perfectly happy with.

--
David Winsemius




stuff$row7 - with(stuff, ave(row1,row2, row6, FUN = sequence))



stuff


 row1 row2 row3 row4 row5  row6 row7

1 01111  TRUE1

2 11222  TRUE2

3 21333  TRUE3

4 31414 FALSE1

5 41515 FALSE2

6 51616 FALSE3

7 12111  TRUE1

8 22222  TRUE2

9 32313 FALSE1

1042414 FALSE2



Jason





From: Henrique Dallazuanna [mailto:www...@gmail.com]
Sent: Thursday, August 27, 2009 11:02 AM
To: Jason Baucom
Cc: r-help@r-project.org; Steven Few
Subject: Re: [R] generating multiple sequences in subsets of data



Try this;

stuff$row3 - with(stuff, ave(row1, row2, FUN = seq))

I don't understand the fourth column

On Thu, Aug 27, 2009 at 11:55 AM, Jason Baucom
jason.bau...@ateb.com wrote:

I'm running into a problem I can't seem to find a solution for. I'm
attempting to add sequences into an existing data set based on  
subsets

of the data.  I've done this using a for loop with a small subset of
data, but attempting the same process using real data (200k rows) is
taking way too long.



Here is some sample data and my ultimate goal


row1-c(0,1,2,3,4,5,1,2,3,4)



row2-c(1,1,1,1,1,1,2,2,2,2)



stuff-data.frame(row1=row1,row2=row2)



stuff


row1 row2

1 01

2 11

3 21

4 31

5 41

6 51

7 12

8 22

9 32

1042





I need to derive 2 columns. I need a sequence for each unique row2,
and
then I need a sequence that restarts based on a cutoff value for row1
and unique row2. The following table is what is -should- look like
using
a cutoff of 3 for row4



row1 row2 row3 row4

1 0111

2 1122

3 2133

4 3141

5 4152

6 5163

7 1211

8 2222

9 3231

104242



I need something like row3-sequence(nrow(unique(stuff$row2))) that
actually works :-) Here is the for loop that functions properly for
row3:



stuff$row3-c(1)

for (i in 2:nrow(stuff)) { if ( stuff$row2[i] == stuff$row2[i-1]) {
stuff$row3[i] = stuff$row3[i-1]+1}}

Thanks!



Jason Baucom

Ateb, Inc.

919.882.4992 O

919.872.1645 F

www.ateb.com http://www.ateb.com/




 [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide

Re: [R] generating multiple sequences in subsets of data

2009-08-27 Thread Henrique Dallazuanna

Try this;

stuff$row3 - with(stuff, ave(row1, row2, FUN = seq))

I don't understand the fourth column

On Thu, Aug 27, 2009 at 11:55 AM, Jason Baucom jason.bau...@ateb.comwrote:

 I'm running into a problem I can't seem to find a solution for. I'm
 attempting to add sequences into an existing data set based on subsets
 of the data.  I've done this using a for loop with a small subset of
 data, but attempting the same process using real data (200k rows) is
 taking way too long.



 Here is some sample data and my ultimate goal

  row1-c(0,1,2,3,4,5,1,2,3,4)

  row2-c(1,1,1,1,1,1,2,2,2,2)

  stuff-data.frame(row1=row1,row2=row2)

  stuff

   row1 row2

 1 01

 2 11

 3 21

 4 31

 5 41

 6 51

 7 12

 8 22

 9 32

 1042





 I need to derive 2 columns. I need a sequence for each unique row2, and
 then I need a sequence that restarts based on a cutoff value for row1
 and unique row2. The following table is what is -should- look like using
 a cutoff of 3 for row4



   row1 row2 row3 row4

 1 0111

 2 1122

 3 2133

 4 3141

 5 4152

 6 5163

 7 1211

 8 2222

 9 3231

 104242



 I need something like row3-sequence(nrow(unique(stuff$row2))) that
 actually works :-) Here is the for loop that functions properly for
 row3:



 stuff$row3-c(1)

 for (i in 2:nrow(stuff)) { if ( stuff$row2[i] == stuff$row2[i-1]) {
 stuff$row3[i] = stuff$row3[i-1]+1}}

 Thanks!



 Jason Baucom

 Ateb, Inc.

 919.882.4992 O

 919.872.1645 F

 www.ateb.com http://www.ateb.com/




[[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Henrique Dallazuanna
Curitiba-Paraná-Brasil
25° 25' 40 S 49° 16' 22 O

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] generating multiple sequences in subsets of data

2009-08-27 Thread Jason Baucom

Henrique,

 

That works great! Thanks.

 

The row3 is a sequence that restarts each time a new row2 is reached.

 

Row4 is a sequence that restarts each time a new row2 is reached OR row1 
reaches some threshold. By setting a threshold of 3, we expect a restart of the 
sequence once row1 reaches 3. This way we can have two unique sequences for 
each row2, assuming of course the threshold is reached.

 

Jason

 



From: Henrique Dallazuanna [mailto:www...@gmail.com] 
Sent: Thursday, August 27, 2009 11:02 AM
To: Jason Baucom
Cc: r-help@r-project.org; Steven Few
Subject: Re: [R] generating multiple sequences in subsets of data



Try this;

stuff$row3 - with(stuff, ave(row1, row2, FUN = seq))

I don't understand the fourth column

On Thu, Aug 27, 2009 at 11:55 AM, Jason Baucom jason.bau...@ateb.com wrote:

I'm running into a problem I can't seem to find a solution for. I'm
attempting to add sequences into an existing data set based on subsets
of the data.  I've done this using a for loop with a small subset of
data, but attempting the same process using real data (200k rows) is
taking way too long.



Here is some sample data and my ultimate goal

 row1-c(0,1,2,3,4,5,1,2,3,4)

 row2-c(1,1,1,1,1,1,2,2,2,2)

 stuff-data.frame(row1=row1,row2=row2)

 stuff

  row1 row2

1 01

2 11

3 21

4 31

5 41

6 51

7 12

8 22

9 32

1042





I need to derive 2 columns. I need a sequence for each unique row2, and
then I need a sequence that restarts based on a cutoff value for row1
and unique row2. The following table is what is -should- look like using
a cutoff of 3 for row4



  row1 row2 row3 row4

1 0111

2 1122

3 2133

4 3141

5 4152

6 5163

7 1211

8 2222

9 3231

104242



I need something like row3-sequence(nrow(unique(stuff$row2))) that
actually works :-) Here is the for loop that functions properly for
row3:



stuff$row3-c(1)

for (i in 2:nrow(stuff)) { if ( stuff$row2[i] == stuff$row2[i-1]) {
stuff$row3[i] = stuff$row3[i-1]+1}}

Thanks!



Jason Baucom

Ateb, Inc.

919.882.4992 O

919.872.1645 F

www.ateb.com http://www.ateb.com/




   [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




-- 
Henrique Dallazuanna
Curitiba-Paraná-Brasil
25° 25' 40 S 49° 16' 22 O


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] generating multiple sequences in subsets of data

2009-08-27 Thread Jason Baucom

I got this to work. Thanks for the insight! row7 is what I need.

 

 checkLimit -function(x) x3

 stuff$row6-checkLimit(stuff$row1)

 stuff$row7 - with(stuff, ave(row1,row2, row6, FUN = sequence))

 stuff

   row1 row2 row3 row4 row5  row6 row7

1 01111  TRUE1

2 11222  TRUE2

3 21333  TRUE3

4 31414 FALSE1

5 41515 FALSE2

6 51616 FALSE3

7 12111  TRUE1

8 22222  TRUE2

9 32313 FALSE1

1042414 FALSE2

 

Jason

 



From: Henrique Dallazuanna [mailto:www...@gmail.com] 
Sent: Thursday, August 27, 2009 11:02 AM
To: Jason Baucom
Cc: r-help@r-project.org; Steven Few
Subject: Re: [R] generating multiple sequences in subsets of data



Try this;

stuff$row3 - with(stuff, ave(row1, row2, FUN = seq))

I don't understand the fourth column

On Thu, Aug 27, 2009 at 11:55 AM, Jason Baucom jason.bau...@ateb.com wrote:

I'm running into a problem I can't seem to find a solution for. I'm
attempting to add sequences into an existing data set based on subsets
of the data.  I've done this using a for loop with a small subset of
data, but attempting the same process using real data (200k rows) is
taking way too long.



Here is some sample data and my ultimate goal

 row1-c(0,1,2,3,4,5,1,2,3,4)

 row2-c(1,1,1,1,1,1,2,2,2,2)

 stuff-data.frame(row1=row1,row2=row2)

 stuff

  row1 row2

1 01

2 11

3 21

4 31

5 41

6 51

7 12

8 22

9 32

1042





I need to derive 2 columns. I need a sequence for each unique row2, and
then I need a sequence that restarts based on a cutoff value for row1
and unique row2. The following table is what is -should- look like using
a cutoff of 3 for row4



  row1 row2 row3 row4

1 0111

2 1122

3 2133

4 3141

5 4152

6 5163

7 1211

8 2222

9 3231

104242



I need something like row3-sequence(nrow(unique(stuff$row2))) that
actually works :-) Here is the for loop that functions properly for
row3:



stuff$row3-c(1)

for (i in 2:nrow(stuff)) { if ( stuff$row2[i] == stuff$row2[i-1]) {
stuff$row3[i] = stuff$row3[i-1]+1}}

Thanks!



Jason Baucom

Ateb, Inc.

919.882.4992 O

919.872.1645 F

www.ateb.com http://www.ateb.com/




   [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




-- 
Henrique Dallazuanna
Curitiba-Paraná-Brasil
25° 25' 40 S 49° 16' 22 O


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] generating multiple sequences in subsets of data

2009-08-27 Thread David Winsemius



On Aug 27, 2009, at 11:58 AM, Jason Baucom wrote:


I got this to work. Thanks for the insight! row7 is what I need.




checkLimit -function(x) x3



stuff$row6-checkLimit(stuff$row1)


You don't actually need those intermediate steps:

 stuff$row7 - with(stuff, ave(row1, row2, row1  3, FUN = seq))
 stuff
   row1 row2 row7
1 011
2 112
3 213
4 311
5 412
6 513
7 121
8 222
9 321
10422

The expression row1  3 gets turned into a logical vector that ave()  
is perfectly happy with.


--
David Winsemius




stuff$row7 - with(stuff, ave(row1,row2, row6, FUN = sequence))



stuff


  row1 row2 row3 row4 row5  row6 row7

1 01111  TRUE1

2 11222  TRUE2

3 21333  TRUE3

4 31414 FALSE1

5 41515 FALSE2

6 51616 FALSE3

7 12111  TRUE1

8 22222  TRUE2

9 32313 FALSE1

1042414 FALSE2



Jason





From: Henrique Dallazuanna [mailto:www...@gmail.com]
Sent: Thursday, August 27, 2009 11:02 AM
To: Jason Baucom
Cc: r-help@r-project.org; Steven Few
Subject: Re: [R] generating multiple sequences in subsets of data



Try this;

stuff$row3 - with(stuff, ave(row1, row2, FUN = seq))

I don't understand the fourth column

On Thu, Aug 27, 2009 at 11:55 AM, Jason Baucom  
jason.bau...@ateb.com wrote:


I'm running into a problem I can't seem to find a solution for. I'm
attempting to add sequences into an existing data set based on subsets
of the data.  I've done this using a for loop with a small subset of
data, but attempting the same process using real data (200k rows) is
taking way too long.



Here is some sample data and my ultimate goal


row1-c(0,1,2,3,4,5,1,2,3,4)



row2-c(1,1,1,1,1,1,2,2,2,2)



stuff-data.frame(row1=row1,row2=row2)



stuff


 row1 row2

1 01

2 11

3 21

4 31

5 41

6 51

7 12

8 22

9 32

1042





I need to derive 2 columns. I need a sequence for each unique row2,  
and

then I need a sequence that restarts based on a cutoff value for row1
and unique row2. The following table is what is -should- look like  
using

a cutoff of 3 for row4



 row1 row2 row3 row4

1 0111

2 1122

3 2133

4 3141

5 4152

6 5163

7 1211

8 2222

9 3231

104242



I need something like row3-sequence(nrow(unique(stuff$row2))) that
actually works :-) Here is the for loop that functions properly for
row3:



stuff$row3-c(1)

for (i in 2:nrow(stuff)) { if ( stuff$row2[i] == stuff$row2[i-1]) {
stuff$row3[i] = stuff$row3[i-1]+1}}

Thanks!



Jason Baucom

Ateb, Inc.

919.882.4992 O

919.872.1645 F

www.ateb.com http://www.ateb.com/




  [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




--
Henrique Dallazuanna
Curitiba-Paraná-Brasil
25° 25' 40 S 49° 16' 22 O


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
Heritage Laboratories
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] generating multiple sequences in subsets of data

Re: [R] generating multiple sequences in subsets of data

Re: [R] generating multiple sequences in subsets of data

Re: [R] generating multiple sequences in subsets of data

Re: [R] generating multiple sequences in subsets of data

Re: [R] generating multiple sequences in subsets of data

Re: [R] generating multiple sequences in subsets of data

7 matches

Site Navigation

Mail list logo

Footer information