Neither you nor your responder have continued the eamil chain very
well so let me put things back together:
on Aug 13, 2010; 03:54pm fishkbob wrote subj = merge function in R?
So I have a bunch of c(start,end) points and want to consolidate
them into as few c(start,end) as possible.
For example:
sample start end
A 5 10
B 7 18
C 1 4
D 16 20
I'd want the function to return the two distinct sets (1,4) and
(5,20)
Is there an R function that already does this?
or should I write my own? (how would I go about that?)
In an effort to be be helpful but not copying the prior message on
Aug 13, 2010; 06:46pm JesperHybel wrote:
I think it would be helpful if you could clarify youre question -
do you want distinct sets - maybe use
unique()
but why (5,20) when its (5,10) in the row in youre example? What
criteria do you want the function to select the "sets" by and what
kind of output do you need?
Maybe it's just me who dosn't get the question..sr
On Aug 13, 2010, at 7:01 PM, fishkbob wrote:
I too think I worded it incorrectly...
so the second two columns of the matrix are the start and end of an
interval
however, because some of the intervals overlap, I want to limit the
number
of intervals I have to deal with.
So therefore,
(5 10) should merge with (7 18) making (5 18)
and then (5 18) should merge with (16 20) giving (5 20)
whereas (1 4) has no overlap with any other interval and is
therefore
left on its own
Ideal output would just be a collapsing of the matrix
sample start end
# 5 20
# 1 4
I got this to work using unique(c(5:10,7:18,16:20,1:4)) which gives
me a
c(1:4,5:20)
However, I have to do this on a very large dataset and the numbers
are more
like
c(100542:100782,598322:598821,...)
any help would be appreciated
thanks
--
View this message in context:
http://r.789695.n4.nabble.com/merge-function-in-R-tp2324684p2324855.html
Sent from the R help mailing list archive at Nabble.com.
Nabble is where I saw all of this, but Nabble is not r-help:
I suggest you sort your rows by the "start" variable and then examine
where the breaks would remain by looking at the prior values of "end":
> dd <- rd.txt("sample start end
+ A 5 10
+ B 7 18
+ C 1 4
+ D 16 20")
> dd[order(dd$start), ]
sample start end
3 C 1 4
1 A 5 10
2 B 7 18
4 D 16 20
> ndd <- dd[order(dd$start), ]
> ndd$inprior <- c(NA, ndd[1:nrow(ndd)-1,3] >= ndd[2:nrow(ndd),2] )
> ndd
sample start end inprior
3 C 1 4 NA
1 A 5 10 FALSE
2 B 7 18 TRUE
4 D 16 20 TRUE
--
David Winsemius, MD
West Hartford, CT
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.