Jim, Although I can't find the post this code stems from, I had come across it on my prowling the NG. It's not the one you had shared with me to eliminate overlaps (and which I referenced below: http://tolstoy.newcastle.edu.au/R/e2/help/07/07/21286.html). That particular solution you had come up with marked entries as overlapping or not, and I am looking for an extension to that code which would also return the actual "clusters" of consecutively overlapping values. While Gabor's code in this thread does what I require for the example I still hope somebody more cluefull than myself can extent your code since it carries the - for me - significant advantage of being able to build the windows of overlap with different values for 'up' and 'down', let's say check which values overlap when the overlap-defining distance is 5ppm 'up' and 7.5ppm 'down' from each value. This is a generalization I would highly cherish.
Thanks for your help and continuous patience on r-help. Joh jim holtman wrote: > Here is a modification of the algorithm to use a specified value for > the overlap: > >> vector <- c(0,0.45,1,2,3,3.25,3.33,3.75,4.1,5,6,6.45,7,7.1,8) >> # following add 0.5 as the overlap detection -- can be changed >> x <- rbind(cbind(value=vector, oper=1, id=seq_along(vector)), > + cbind(value=vector+0.5, oper=-1, id=seq_along(vector))) >> x <- x[order(x[,'value'], -x[, 'oper']),] >> # determine which ones overlap >> x <- cbind(x, over=cumsum(x[, 'oper'])) >> # now partition into groups and only use groups greater than or equal to >> # 3 determine where the breaks are (0 values in cumsum(over)) >> x <- cbind(x, breaks=cumsum(x[, 'over'] == 0)) >> # delete entries with 'over' == 0 >> x <- x[x[, 'over'] != 0,] >> # split into groupd >> x.groups <- split(x[, 'id'], x[, 'breaks']) >> # only keep those with more than 2 >> x.subsets <- x.groups[sapply(x.groups, length) >= 3] >> # print out the subsets >> invisible(lapply(x.subsets, function(a) print(vector[unique(a)]))) > [1] 0.00 0.45 > [1] 3.00 3.25 3.33 3.75 4.10 > [1] 6.00 6.45 > [1] 7.0 7.1 > > > On Dec 21, 2007 4:56 AM, Johannes Graumann <[EMAIL PROTECTED]> > wrote: >> <posted & mailed> >> >> Dear all, >> >> I'm trying to solve the problem, of how to find clusters of values in a >> vector that are closer than a given value. Illustrated this might look as >> follows: >> >> vector <- c(0,0.45,1,2,3,3.25,3.33,3.75,4.1,5,6,6.45,7,7.1,8) >> >> When using '0.5' as the proximity requirement, the following groups would >> result: >> 0,0.45 >> 3,3.25,3.33,3.75,4.1 >> 6,6.45 >> 7,7.1 >> >> Jim Holtman proposed a very elegant solution in >> http://tolstoy.newcastle.edu.au/R/e2/help/07/07/21286.html, which I have >> modified and perused since he wrote it to me. The beauty of this approach >> is that it will not only work for constant proximity requirements as >> above, but also for overlap-windows defined in terms of ppm around each >> value. Now I have an additional need and have found no way (short of >> iteratively step through all the groups returned) to figure out how to do >> that with Jim's approach: how to figure out that 6,6.45 and 7,7.1 are >> separate clusters? >> >> Thanks for any hints, Joh >> >> ______________________________________________ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html and provide commented, >> minimal, self-contained, reproducible code. >> > > > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.