dear Hervé, Thanks for the quick reply about directions to take this.
I'm sorry for not providing sufficient detail about the goal of block bootstrapping in my initial post. Let me try again. For a moment, let me ignore multiple chromosomes/seqs and just focus on a single set of IRanges. The point of the block bootstrap is: Let's say we want to find the number of overlaps of x and y, and then assess how surprised we are at how large that overlap is. Both of them may have features that tend to cluster together along the genome (independently). One method would just be to move the features in y around to random start sites, making y', say B times, and then calculate each of the B times the number of overlaps between x and y'. Or we might make this better by having blacklisted sites where the randomly shuffled features in y cannot go. The block bootstrap is an alternative to randomly moving the start sites, where instead we create random data, by taking big "blocks" of features in y. Each block is a lot like a View. And the ultimate goal is to make B versions of the data y where the features have been shuffled around, but by taking blocks, we preserve the clumpiness of the features in y. Let me give some numbers to make this more concrete, so say we're making a single block bootstrap sample of a chromosome that is 1000 bp long. Here is the original y: y <- IRanges(c(51,61,71,111,121,131,501,511,521,921,931,941),width=5) If I go with my coverage approach, I should extend it all the way to the end of the chromosome. Here I lose information if there are overlaps of features in y, and I'm thinking of a fix I'll describe below. cov <- c(coverage(y), Rle(rep(0,55))) I could make one block bootstrap sample of y (this is 1 out of B in the ultimate procedure) by taking 10 blocks of width 100. The blocks have random start positions from 1 to 901. y.boot.1 <- unlist(Views(cov, start=round(runif(10,1,901)), width=100)) This last line below is a hack to get back to the ranges for a single block bootstrap sample of y. It works here, but only because none of the features in y were overlapping each other. Instead of coverage(), if I'd made an Rle where the non-zero elements are located at the start of ranges, and the value is the width of the range, I think this Views approach would actually work. y.boot.1.rng <- IRanges(start(y.boot.1)[runValue(y.boot.1) == 1], width=runLength(y.boot.1)[runValue(y.boot.1) == 1]) I'm interested in building a function that takes in IRanges and outputs these shuffled set of IRanges. _______________________________________________ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel