Re: [R] algorithm help
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 01/06/2011 11:57 PM, (Ted Harding) wrote: On 06-Jan-11 22:16:38, array chip wrote: Hi, I am seeking help on designing an algorithm to identify the locations of stretches of 1s in a vector of 0s and 1s. Below is an simple example: dat-as.data.frame(cbind(a=c(F,F,T,T,T,T,F,F,T,T,F,T,T,T,T,F,F,F,F,T) ,b=c(4,12,13,16,18,20,28,30,34,46,47,49,61,73,77,84,87,90,95,97))) dat a b 1 0 4 2 0 12 3 1 13 4 1 16 5 1 18 6 1 20 7 0 28 8 0 30 9 1 34 10 1 46 11 0 47 12 1 49 13 1 61 14 1 73 15 1 77 16 0 84 17 0 87 18 0 90 19 0 95 20 1 97 In this dataset, b is sorted and denotes the location for each number in a. So I would like to find the starting ending locations for each stretch of 1s within a, also counting the number of 1s in each stretch as well. Hope the results from the algorithm would be: stretch start end No.of.1s 1 13 204 2 34 462 3 49 774 4 97 971 I can imagine using for loops can do the job, but I feel it's not a clever way to do this. Is there an efficient algorithm that can do this fast? Thanks for any suggestions. John The basic information you need can be got using rle() (run length encoding). See '?rle'. In your example: rle(dat$a) # Run Length Encoding # lengths: int [1:8] 2 4 2 2 1 4 4 1 # values : num [1:8] 0 1 0 1 0 1 0 1 ## Note: F - 0, T - 1 The following has a somewhat twisted logic at the end, and may be flawed, but you can probably adapt it! L - rle(dat$a)$lengths V - rle(dat$a)$values pos - c(1,cumsum(L)) V1 - c(-1,V) 1+pos[V1==0] # [1] 3 9 12 20 ## Positions in the series dat$a where each run of T (i.e. 1) ## starts A different approach would be to use the diff() function: Where diff(dat$a) [1] 0 1 0 0 0 -1 0 1 0 -1 1 0 0 0 -1 0 0 0 1 is not equal 0, the value is changing from 0 to 1 or one to 0. The indices of the first new value can be found by: which(diff(dat$a)!=0) + 1 [1] 3 7 9 11 12 16 20 where it is changing from 0 to 1 is at which(diff(dat$a)==1) + 1 [1] 3 9 12 20 where it is changing from 1 to 0 is at which(diff(dat$a)==-1) + 1 [1] 7 11 16 By taking into consideration if the first value and the last values are 0 or 1, you can calculate the length. Cheers, Rainer Hoping this helps, Ted. E-Mail: (Ted Harding) ted.hard...@wlandres.net Fax-to-email: +44 (0)870 094 0861 Date: 06-Jan-11 Time: 22:57:44 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. - -- Rainer M. Krug, PhD (Conservation Ecology, SUN), MSc (Conservation Biology, UCT), Dipl. Phys. (Germany) Centre of Excellence for Invasion Biology Natural Sciences Building Office Suite 2039 Stellenbosch University Main Campus, Merriman Avenue Stellenbosch South Africa Tel:+33 - (0)9 53 10 27 44 Cell: +27 - (0)8 39 47 90 42 Fax (SA): +27 - (0)8 65 16 27 82 Fax (D) : +49 - (0)3 21 21 25 22 44 Fax (FR): +33 - (0)9 58 10 27 44 email: rai...@krugs.de Skype: RMkrug -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAk0m1dMACgkQoYgNqgF2egoQbACcCB3iFQ6SKYfL4KVX8AMAN9Gp 1awAn0Z+8KXnOmwCLu61gihc8xZIT++j =O+xA -END PGP SIGNATURE- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] algorithm help
Hi, I am seeking help on designing an algorithm to identify the locations of stretches of 1s in a vector of 0s and 1s. Below is an simple example: dat-as.data.frame(cbind(a=c(F,F,T,T,T,T,F,F,T,T,F,T,T,T,T,F,F,F,F,T) ,b=c(4,12,13,16,18,20,28,30,34,46,47,49,61,73,77,84,87,90,95,97))) dat a b 1 0 4 2 0 12 3 1 13 4 1 16 5 1 18 6 1 20 7 0 28 8 0 30 9 1 34 10 1 46 11 0 47 12 1 49 13 1 61 14 1 73 15 1 77 16 0 84 17 0 87 18 0 90 19 0 95 20 1 97 In this dataset, b is sorted and denotes the location for each number in a. So I would like to find the starting ending locations for each stretch of 1s within a, also counting the number of 1s in each stretch as well. Hope the results from the algorithm would be: stretch start end No.of.1s 1 13 204 2 34 462 3 49 774 4 97 971 I can imagine using for loops can do the job, but I feel it's not a clever way to do this. Is there an efficient algorithm that can do this fast? Thanks for any suggestions. John [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] algorithm help
try this: ?rle Carl ** From: array chip arrayprofile_at_yahoo.com Date: Thu, 06 Jan 2011 14:16:38 -0800 (PST) Hi, I am seeking help on designing an algorithm to identify the locations of stretches of 1s in a vector of 0s and 1s. Below is an simple example: dat-as.data.frame(cbind(a=c(F,F,T,T,T,T,F,F,T,T,F,T,T,T,T,F,F,F,F,T) ,b=c(4,12,13,16,18,20,28,30,34,46,47,49,61,73,77,84,87,90,95,97))) dat a b 1 0 4 2 0 12 3 1 13 4 1 16 5 1 18 6 1 20 7 0 28 8 0 30 9 1 34 10 1 46 11 0 47 12 1 49 13 1 61 14 1 73 15 1 77 16 0 84 17 0 87 18 0 90 19 0 95 20 1 97 In this dataset, b is sorted and denotes the location for each number in a. So I would like to find the starting ending locations for each stretch of 1s within a, also counting the number of 1s in each stretch as well. Hope the results from the algorithm would be: __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] algorithm help
On 06-Jan-11 22:16:38, array chip wrote: Hi, I am seeking help on designing an algorithm to identify the locations of stretches of 1s in a vector of 0s and 1s. Below is an simple example: dat-as.data.frame(cbind(a=c(F,F,T,T,T,T,F,F,T,T,F,T,T,T,T,F,F,F,F,T) ,b=c(4,12,13,16,18,20,28,30,34,46,47,49,61,73,77,84,87,90,95,97))) dat a b 1 0 4 2 0 12 3 1 13 4 1 16 5 1 18 6 1 20 7 0 28 8 0 30 9 1 34 10 1 46 11 0 47 12 1 49 13 1 61 14 1 73 15 1 77 16 0 84 17 0 87 18 0 90 19 0 95 20 1 97 In this dataset, b is sorted and denotes the location for each number in a. So I would like to find the starting ending locations for each stretch of 1s within a, also counting the number of 1s in each stretch as well. Hope the results from the algorithm would be: stretch start end No.of.1s 1 13 204 2 34 462 3 49 774 4 97 971 I can imagine using for loops can do the job, but I feel it's not a clever way to do this. Is there an efficient algorithm that can do this fast? Thanks for any suggestions. John The basic information you need can be got using rle() (run length encoding). See '?rle'. In your example: rle(dat$a) # Run Length Encoding # lengths: int [1:8] 2 4 2 2 1 4 4 1 # values : num [1:8] 0 1 0 1 0 1 0 1 ## Note: F - 0, T - 1 The following has a somewhat twisted logic at the end, and may be flawed, but you can probably adapt it! L - rle(dat$a)$lengths V - rle(dat$a)$values pos - c(1,cumsum(L)) V1 - c(-1,V) 1+pos[V1==0] # [1] 3 9 12 20 ## Positions in the series dat$a where each run of T (i.e. 1) ## starts Hoping this helps, Ted. E-Mail: (Ted Harding) ted.hard...@wlandres.net Fax-to-email: +44 (0)870 094 0861 Date: 06-Jan-11 Time: 22:57:44 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] algorithm help
Thanks very much, Ted. Yes, it does what I need! I made a routine to do this: f.fragment-function(a,b) { dat-as.data.frame(cbind(a,b)) L - rle(dat$a)$lengths V - rle(dat$a)$values pos - c(1,cumsum(L)) V1 - c(-1,V) start-1+pos[V1==0] end-pos[V1==1] cbind(stretch=1:length(start),start=dat$b[start] ,end=dat$b[end],no.of.1s=L[V==1]) } f.fragment(dat$a,dat$b) stretch start end no.of.1s [1,] 113 204 [2,] 234 462 [3,] 349 774 [4,] 497 971 John From: ted.hard...@wlandres.net ted.hard...@wlandres.net Cc: r-h...@stat.math.ethz.ch Sent: Thu, January 6, 2011 2:57:47 PM Subject: RE: [R] algorithm help On 06-Jan-11 22:16:38, array chip wrote: Hi, I am seeking help on designing an algorithm to identify the locations of stretches of 1s in a vector of 0s and 1s. Below is an simple example: dat-as.data.frame(cbind(a=c(F,F,T,T,T,T,F,F,T,T,F,T,T,T,T,F,F,F,F,T) ,b=c(4,12,13,16,18,20,28,30,34,46,47,49,61,73,77,84,87,90,95,97))) dat a b 1 0 4 2 0 12 3 1 13 4 1 16 5 1 18 6 1 20 7 0 28 8 0 30 9 1 34 10 1 46 11 0 47 12 1 49 13 1 61 14 1 73 15 1 77 16 0 84 17 0 87 18 0 90 19 0 95 20 1 97 In this dataset, b is sorted and denotes the location for each number in a. So I would like to find the starting ending locations for each stretch of 1s within a, also counting the number of 1s in each stretch as well. Hope the results from the algorithm would be: stretch start end No.of.1s 1 13 204 2 34 462 3 49 774 4 97 971 I can imagine using for loops can do the job, but I feel it's not a clever way to do this. Is there an efficient algorithm that can do this fast? Thanks for any suggestions. John The basic information you need can be got using rle() (run length encoding). See '?rle'. In your example: rle(dat$a) # Run Length Encoding # lengths: int [1:8] 2 4 2 2 1 4 4 1 # values : num [1:8] 0 1 0 1 0 1 0 1 ## Note: F - 0, T - 1 The following has a somewhat twisted logic at the end, and may [[elided Yahoo spam]] L - rle(dat$a)$lengths V - rle(dat$a)$values pos - c(1,cumsum(L)) V1 - c(-1,V) 1+pos[V1==0] # [1] 3 9 12 20 ## Positions in the series dat$a where each run of T (i.e. 1) ## starts Hoping this helps, Ted. E-Mail: (Ted Harding) ted.hard...@wlandres.net Fax-to-email: +44 (0)870 094 0861 Date: 06-Jan-11 Time: 22:57:44 -- XFMail -- [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] algorithm help
-Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of array chip Sent: Thursday, January 06, 2011 3:29 PM To: ted.hard...@wlandres.net Cc: r-h...@stat.math.ethz.ch Subject: Re: [R] algorithm help Thanks very much, Ted. Yes, it does what I need! I made a routine to do this: f.fragment-function(a,b) { dat-as.data.frame(cbind(a,b)) L - rle(dat$a)$lengths V - rle(dat$a)$values pos - c(1,cumsum(L)) V1 - c(-1,V) start-1+pos[V1==0] end-pos[V1==1] cbind(stretch=1:length(start),start=dat$b[start] ,end=dat$b[end],no.of.1s=L[V==1]) } f.fragment(dat$a,dat$b) stretch start end no.of.1s [1,] 113 204 [2,] 234 462 [3,] 349 774 [4,] 497 971 You need to be more careful about the first and last rows in the dataset. I think yours only works when a starts with 0 and ends with 1. f.fragment(c(1,1,0,0), c(11,12,13,14)) stretch start end no.of.1s [1,] 1NA 122 f.fragment(c(1,1,0,1), c(11,12,13,14)) stretch start end no.of.1s [1,] 114 122 [2,] 114 141 f.fragment(c(0,1,0,1), c(11,12,13,14)) stretch start end no.of.1s [1,] 112 121 [2,] 214 141 f.fragment(c(0,1,0,0), c(11,12,13,14)) stretch start end no.of.1s [1,] 112 121 [2,] 2NA 121 f.fragment(c(1,1,1,1), c(11,12,13,14)) stretch end no.of.1s [1,] 1 144 [2,] 0 144 f.fragment(c(0,0,0,0), c(11,12,13,14)) stretch start [1,] 1NA The following does better. It keeps things as logical vectors as long as possible, which tends to work better when dealing with runs. f - function(a, b) { isFirstIn1Run - c(TRUE, a[-1] != a[-length(a)]) a==1 isLastIn1Run - c(a[-1] != a[-length(a)], TRUE) a==1 data.frame(stretch=seq_len(sum(isFirstIn1Run)), start = b[isFirstIn1Run], end = b[isLastIn1Run], no.of.1s = which(isLastIn1Run) - which(isFirstIn1Run) + 1) } f(c(1,1,0,0), c(11,12,13,14)) stretch start end no.of.1s 1 111 122 f(c(1,1,0,1), c(11,12,13,14)) stretch start end no.of.1s 1 111 122 2 214 141 f(c(0,1,0,1), c(11,12,13,14)) stretch start end no.of.1s 1 112 121 2 214 141 f(c(0,1,0,0), c(11,12,13,14)) stretch start end no.of.1s 1 112 121 f(c(1,1,1,1), c(11,12,13,14)) stretch start end no.of.1s 1 111 144 f(c(0,0,0,0), c(11,12,13,14)) [1] stretch startend no.of.1s 0 rows (or 0-length row.names) Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com John From: ted.hard...@wlandres.net ted.hard...@wlandres.net Cc: r-h...@stat.math.ethz.ch Sent: Thu, January 6, 2011 2:57:47 PM Subject: RE: [R] algorithm help On 06-Jan-11 22:16:38, array chip wrote: Hi, I am seeking help on designing an algorithm to identify the locations of stretches of 1s in a vector of 0s and 1s. Below is an simple example: dat-as.data.frame(cbind(a=c(F,F,T,T,T,T,F,F,T,T,F,T,T,T,T,F,F,F,F,T) ,b=c(4,12,13,16,18,20,28,30,34,46,47,49,61,73,77,84,87,90,95,97))) dat a b 1 0 4 2 0 12 3 1 13 4 1 16 5 1 18 6 1 20 7 0 28 8 0 30 9 1 34 10 1 46 11 0 47 12 1 49 13 1 61 14 1 73 15 1 77 16 0 84 17 0 87 18 0 90 19 0 95 20 1 97 In this dataset, b is sorted and denotes the location for each number in a. So I would like to find the starting ending locations for each stretch of 1s within a, also counting the number of 1s in each stretch as well. Hope the results from the algorithm would be: stretch start end No.of.1s 1 13 204 2 34 462 3 49 774 4 97 971 I can imagine using for loops can do the job, but I feel it's not a clever way to do this. Is there an efficient algorithm that can do this fast? Thanks for any suggestions. John The basic information you need can be got using rle() (run length encoding). See '?rle'. In your example: rle(dat$a) # Run Length Encoding # lengths: int [1:8] 2 4 2 2 1 4 4 1 # values : num [1:8] 0 1 0 1 0 1 0 1 ## Note: F - 0, T - 1 The following has a somewhat twisted logic at the end, and may [[elided Yahoo spam]] L - rle(dat$a)$lengths V - rle(dat$a)$values pos - c(1,cumsum(L)) V1 - c(-1,V) 1+pos[V1==0] # [1] 3 9 12 20 ## Positions in the series dat$a where each run of T (i.e. 1) ## starts Hoping this helps, Ted
Re: [R] algorithm help
Thanks very much Bill, good catch! John From: William Dunlap wdun...@tibco.com Cc: r-h...@stat.math.ethz.ch Sent: Thu, January 6, 2011 3:52:47 PM Subject: RE: [R] algorithm help -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of array chip Sent: Thursday, January 06, 2011 3:29 PM To: ted.hard...@wlandres.net Cc: r-h...@stat.math.ethz.ch Subject: Re: [R] algorithm help [[elided Yahoo spam]] I made a routine to do this: f.fragment-function(a,b) { dat-as.data.frame(cbind(a,b)) L - rle(dat$a)$lengths V - rle(dat$a)$values pos - c(1,cumsum(L)) V1 - c(-1,V) start-1+pos[V1==0] end-pos[V1==1] cbind(stretch=1:length(start),start=dat$b[start] ,end=dat$b[end],no.of.1s=L[V==1]) } f.fragment(dat$a,dat$b) stretch start end no.of.1s [1,] 113 204 [2,] 234 462 [3,] 349 774 [4,] 497 971 You need to be more careful about the first and last rows in the dataset. I think yours only works when a starts with 0 and ends with 1. f.fragment(c(1,1,0,0), c(11,12,13,14)) stretch start end no.of.1s [1,] 1NA 122 f.fragment(c(1,1,0,1), c(11,12,13,14)) stretch start end no.of.1s [1,] 114 122 [2,] 114 141 f.fragment(c(0,1,0,1), c(11,12,13,14)) stretch start end no.of.1s [1,] 112 121 [2,] 214 141 f.fragment(c(0,1,0,0), c(11,12,13,14)) stretch start end no.of.1s [1,] 112 121 [2,] 2NA 121 f.fragment(c(1,1,1,1), c(11,12,13,14)) stretch end no.of.1s [1,] 1 144 [2,] 0 144 f.fragment(c(0,0,0,0), c(11,12,13,14)) stretch start [1,] 1NA The following does better. It keeps things as logical vectors as long as possible, which tends to work better when dealing with runs. f - function(a, b) { isFirstIn1Run - c(TRUE, a[-1] != a[-length(a)]) a==1 isLastIn1Run - c(a[-1] != a[-length(a)], TRUE) a==1 data.frame(stretch=seq_len(sum(isFirstIn1Run)), start = b[isFirstIn1Run], end = b[isLastIn1Run], no.of.1s = which(isLastIn1Run) - which(isFirstIn1Run) + 1) } f(c(1,1,0,0), c(11,12,13,14)) stretch start end no.of.1s 1 111 122 f(c(1,1,0,1), c(11,12,13,14)) stretch start end no.of.1s 1 111 122 2 214 141 f(c(0,1,0,1), c(11,12,13,14)) stretch start end no.of.1s 1 112 121 2 214 141 f(c(0,1,0,0), c(11,12,13,14)) stretch start end no.of.1s 1 112 121 f(c(1,1,1,1), c(11,12,13,14)) stretch start end no.of.1s 1 111 144 f(c(0,0,0,0), c(11,12,13,14)) [1] stretch startend no.of.1s 0 rows (or 0-length row.names) Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com John From: ted.hard...@wlandres.net ted.hard...@wlandres.net Cc: r-h...@stat.math.ethz.ch Sent: Thu, January 6, 2011 2:57:47 PM Subject: RE: [R] algorithm help On 06-Jan-11 22:16:38, array chip wrote: Hi, I am seeking help on designing an algorithm to identify the locations of stretches of 1s in a vector of 0s and 1s. Below is an simple example: dat-as.data.frame(cbind(a=c(F,F,T,T,T,T,F,F,T,T,F,T,T,T,T,F,F,F,F,T) ,b=c(4,12,13,16,18,20,28,30,34,46,47,49,61,73,77,84,87,90,95,97))) dat a b 1 0 4 2 0 12 3 1 13 4 1 16 5 1 18 6 1 20 7 0 28 8 0 30 9 1 34 10 1 46 11 0 47 12 1 49 13 1 61 14 1 73 15 1 77 16 0 84 17 0 87 18 0 90 19 0 95 20 1 97 In this dataset, b is sorted and denotes the location for each number in a. So I would like to find the starting ending locations for each stretch of 1s within a, also counting the number of 1s in each stretch as well. Hope the results from the algorithm would be: stretch start end No.of.1s 1 13 204 2 34 462 3 49 774 4 97 971 I can imagine using for loops can do the job, but I feel it's not a clever way to do this. Is there an efficient algorithm that can do this fast? Thanks for any suggestions. John The basic information you need can be got using rle() (run length encoding). See '?rle'. In your example: rle(dat$a) # Run Length Encoding # lengths: int [1:8] 2 4 2 2 1 4 4 1 # values : num [1:8] 0 1 0 1 0 1 0 1 ## Note: F - 0, T - 1 The following has a somewhat twisted logic at the end, and may [[elided Yahoo spam]] L - rle(dat$a)$lengths V - rle(dat$a)$values pos - c(1,cumsum(L)) V1 - c(-1,V) 1+pos[V1==0