Re: [R] counting sets of consecutive integers in a vector

Mike Miller Sun, 04 Jan 2015 19:56:09 -0800

Thanks, Peter. Why not cbind your idea for the first column with my ideafor the second column and get it done in one line?:


v <- c(1,2,5,6,7,8,25,30,31,32,33)
M <- cbind( v[ c(1, which( diff(v) !=1 ) + 1 ) ] , rle( v - 1:length(v) 
)$lengths )
M


     [,1] [,2]
[1,]    1    2
[2,]    5    4
[3,]   25    1
[4,]   30    4

I find that pretty appealing and I'll probably stick with it. It seemsquite fast. Here's an example:


# make fairly long vector
v <- sort(unique(round(100000*runif(100000))))
length(v)
[1] 63274

# time the procedure:
ptm <- proc.time() ; M <- cbind( v[ c(1, which( diff(v) !=1 ) + 1 ) ] , rle( v 
- 1:length(v) )$lengths ) ; proc.time() - ptm
   user  system elapsed
   0.03    0.00    0.03

dim(M)
[1] 23212     2

I probably won't be using vectors any longer than that, and this isn't thekind of thing that I do over and over again, so that speed is excellent.


Mike



On Mon, 5 Jan 2015, Peter Alspach wrote:

Tena koe Mike

An alternative, which is slightly fast:

 diffv <- diff(v)
 starts <- c(1, which(diffv!=1)+1)
 cbind(v[starts], c(diff(starts), length(v)-starts[length(starts)]+1))

Peter Alspach

-----Original Message-----
From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Mike Miller
Sent: Monday, 5 January 2015 1:03 p.m.
To: R-Help List
Subject: [R] counting sets of consecutive integers in a vector

I have a vector of sorted positive integer values (e.g., postive integers after 
applying sort() and unique()).  For example, this:

c(1,2,5,6,7,8,25,30,31,32,33)

I want to make a matrix from that vector that has two columns: (1) the first 
value in every run of consecutive integer values, and (2) the corresponding 
number of consecutive values.  For example:

c(1:20) would become this...

1  20

...because there are 20 consecutive integers beginning with 1 and
c(1,2,5,6,7,8,25,30,31,32,33) would become

1  2
5  4
25 1
30 4

What would be the best way to accomplish this?  Here is my first effort:

v <- c(1,2,5,6,7,8,25,30,31,32,33)
L <- rle( v - 1:length(v) )$lengths
n <- length( L )
matrix( c( v[ c( 1, cumsum(L)+1 ) ][1:n], L), nrow=n)

     [,1] [,2]
[1,]    1    2
[2,]    5    4
[3,]   25    1
[4,]   30    4

I suppose that works well enough, but there may be a better way, and besides, I 
wouldn't want to deny anyone here the opportunity to solve a fun puzzle.  ;-)

The use for this is that I will be doing repeated seeks of a binary file to 
extract data.  seek() gives the starting point and readBin(n=X) gives the 
number of bytes to read.  So when there are many consecutive variables to be 
read, I can multiply the X in n=X by that number instead of doing many 
different seek() calls.  (The data are in a transposed format where I read in 
every record for some variable as sequential elements.)  I'm probably not the 
first person to deal with this.

Best,

Mike

--
Michael B. Miller, Ph.D.
University of Minnesota
http://scholar.google.com/citations?user=EV_phq4AAAAJ

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
The contents of this e-mail are confidential and may be subject to legal 
privilege.
If you are not the intended recipient you must not use, disseminate, distribute 
or
reproduce all or any part of this e-mail or attachments.  If you have received 
this
e-mail in error, please notify the sender and delete all material pertaining to 
this
e-mail.  Any opinion or views expressed in this e-mail are those of the 
individual
sender and may not represent those of The New Zealand Institute for Plant and
Food Research Limited.


______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] counting sets of consecutive integers in a vector

Reply via email to