Neither you nor your responder have continued the eamil chain very well so let me put things back together:
on  Aug 13, 2010; 03:54pm fishkbob wrote subj = merge function in R?

So I have a bunch of c(start,end) points and want to consolidate them into as few c(start,end) as possible.

For example:
sample   start    end
A              5       10
B              7       18
C              1        4
D              16      20

I'd want the function to return the two distinct sets (1,4) and (5,20)

Is there an R function that already does this?
or should I write my own? (how would I go about that?)

In an effort to be be helpful but not copying the prior message on Aug 13, 2010; 06:46pm JesperHybel wrote:

I think it would be helpful if you could clarify youre question - do you want distinct sets - maybe use

unique()

but why (5,20) when its (5,10) in the row in youre example? What criteria do you want the function to select the "sets" by and what kind of output do you need?

Maybe it's just me who dosn't get the question..sr

On Aug 13, 2010, at 7:01 PM, fishkbob wrote:


I too think I worded it incorrectly...

so the second two columns of the matrix are the start and end of an interval however, because some of the intervals overlap, I want to limit the number
of intervals I have to deal with.

So therefore,
(5     10)    should merge with    (7     18)   making    (5     18)
and then (5    18)   should merge with (16    20)   giving   (5    20)
whereas (1 4) has no overlap with any other interval and is therefore
left on its own

Ideal output would just be a collapsing of the matrix
sample   start     end
#              5       20
#              1        4

I got this to work using unique(c(5:10,7:18,16:20,1:4)) which gives me a
c(1:4,5:20)
However, I have to do this on a very large dataset and the numbers are more
like
c(100542:100782,598322:598821,...)

any help would be appreciated
thanks
--
View this message in context: 
http://r.789695.n4.nabble.com/merge-function-in-R-tp2324684p2324855.html
Sent from the R help mailing list archive at Nabble.com.

Nabble is where I saw all of this, but Nabble is not r-help:

I suggest you sort your rows by the "start" variable and then examine where the breaks would remain by looking at the prior values of "end":

> dd <- rd.txt("sample   start    end
+ A              5       10
+ B              7       18
+ C              1        4
+ D              16      20")
> dd[order(dd$start), ]
  sample start end
3      C     1   4
1      A     5  10
2      B     7  18
4      D    16  20
> ndd <- dd[order(dd$start), ]
> ndd$inprior <- c(NA, ndd[1:nrow(ndd)-1,3] >= ndd[2:nrow(ndd),2] )
> ndd
  sample start end inprior
3      C     1   4      NA
1      A     5  10   FALSE
2      B     7  18    TRUE
4      D    16  20    TRUE

--

David Winsemius, MD
West Hartford, CT

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to