The window you describe is not one I would call sliding and the
intervals are regular with an irregular number of events within the
windows. One way would be to use the results of trunc(pos/10000) as a
factor with tapply:
(Related functions are floor() and round(), but your pos values appear
to be positive, so there should not be problems with how they work
across 0)
After creating a dataframe, dta, try something like:
> tapply(dta$xpehh, as.factor(trunc(dta$pos/10000)), min)
1579 1580 1581 1582
-0.153413 -0.367296 0.302555 0.090302
--
David Winsemius
On Mar 30, 2009, at 9:01 AM, Irene Gallego Romero wrote:
Dear all,
I have some very big data files that look something like this:
id chr pos ihh1 ihh2 xpehh
rs5748748 22 15795572 0.0230222 0.0268394 -0.153413
rs5748755 22 15806401 0.0186084 0.0268672 -0.367296
rs2385785 22 15807037 0.0198204 0.0186616 0.0602451
rs1981707 22 15809384 0.0299685 0.0176768 0.527892
rs1981708 22 15809434 0.0305465 0.0187227 0.489512
rs11914222 22 15810040 0.0307183 0.0172399 0.577633
rs4819923 22 15813210 0.02707 0.0159736 0.527491
rs5994105 22 15813888 0.025202 0.0141296 0.578651
rs5748760 22 15814084 0.0242894 0.0146486 0.505691
rs2385786 22 15816846 0.0173057 0.0107816 0.473199
rs1990483 22 15817310 0.0176641 0.0130525 0.302555
rs5994110 22 15821524 0.0178411 0.0129001 0.324267
rs17733785 22 15822154 0.0201797 0.0182093 0.102746
rs7287116 22 15823131 0.0201993 0.0179028 0.12069
rs5748765 22 15825502 0.0193195 0.0176513 0.090302
I'm trying to extract the maximum and minimum xpehh (last column)
values within a sliding window (non overlapping), of width 10000
(calculated relative to pos (third column)). However, as you can
tell from the brief excerpt here, although all possible intervals
will probably be covered by at least one data point, the number of
data points will be variable (incidentally, if anyone knows of a way
to obtain this number, that would be lovely), as will the spacing
between them. Furthermore, values of chr (second column) will range
from 1 to 22, and values of pos will be overlapping across them; I
want to evaluate the window separately for each value of chr.
I've looked at the help and FAQ on sliding windows, but I'm a
relative newcomer to R and cannot find a way to do what I need to
do. Everything I've managed to unearth so far seems geared towards
smoother time series. Any help on this problem would be vastly
appreciated.
Thanks,
Irene
--
Irene Gallego Romero
Leverhulme Centre for Human Evolutionary Studies
University of Cambridge
Fitzwilliam St
Cambridge
CB2 1QH
UK
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
David Winsemius, MD
Heritage Laboratories
West Hartford, CT
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.