Re: [Bioc-sig-seq] weighted coverage on GRanges object

Hervé Pagès Tue, 17 May 2011 17:15:17 -0700

Hi Janet,

On 11-05-17 11:40 AM, Janet Young wrote:

Hi,


Yes, it would be great for us to be able to simply pass in weights as a vector 
rather than the list (it seems very instinctive to me to put the scores in an 
elementMetadata column, so that's mostly where I'd like to take weights from).


Ok now this works in GenomicRanges 1.4.4:

  > library(GenomicRanges)

  > dat2 <- GRanges(seqnames=rep(c("chr2L","chr3L"), c(4,3)),
                    IRanges(start=c(  (1:4)*5 , (1:3)*5),width=20),
                    score=c(3,5,10,20,6,10,30))

  > coverage(dat2, weight=elementMetadata(dat2)$score)
  SimpleRleList of length 2
  $chr2L
  'integer' Rle of length 39 with 8 runs
    Lengths:  4  5  5  5  5  5  5  5
    Values :  0  3  8 18 38 35 30 20

  $chr3L
  'integer' Rle of length 34 with 6 runs
    Lengths:  4  5  5 10  5  5
    Values :  0  6 16 46 40 30

The old behaviour is preserved i.e. 'weight' can still be a list.
The 'shift' argument is handled the same way as 'weight' i.e. it
can also be a numeric vector in addition to a list. (The code that
checks and preprocesses 'shift' and 'weight' is the same.)

Details in ?GRanges

GenomicRanges 1.4.4 should propagate to the public repo tomorrow
morning.

Cheers,
H.

> sessionInfo()
R version 2.13.0 (2011-04-13)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_CA.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_CA.UTF-8        LC_COLLATE=en_CA.UTF-8
 [5] LC_MONETARY=C              LC_MESSAGES=en_CA.UTF-8
 [7] LC_PAPER=en_CA.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_CA.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] GenomicRanges_1.4.4 IRanges_1.10.4


thanks,

Janet


On May 13, 2011, at 10:41 AM, Hervé Pagès wrote:

Hi Janet,

Not sure it makes sense that the 'weight' arg for the coverage,GRanges
method is required to be a list to start with. Generally speaking, one
would like to be able to associate a weight to each range, and this can
simply be achieved by passing a numeric vector of length the number of
ranges. If the user wants to define the weights on a per-chromosome
basis, then this would better be done before, when preparing the weight
vector, and it's easy.

Anyway, I'll keep support for list but will also add support for numeric
vector.

Thanks for your feedback!
H.


On 11-05-06 12:48 PM, Janet Young wrote:

Hi there,

I have a suggestion - is it possible to make it a little easier to directly 
calculate weighted coverage on a GRanges object that has regions on>1 
chromosome?   I think the code below explains what I mean.

I've figured out a workaround, but it took me quite a while to get there - 
perhaps putting the coercion in the underlying coverage/weights code will help 
others avoid that in future?

thanks very much,

Janet

-------------------------------------------------------------------

Dr. Janet Young (Trask lab)

Fred Hutchinson Cancer Research Center
1100 Fairview Avenue N., C3-168,
P.O. Box 19024, Seattle, WA 98109-1024, USA.

tel: (206) 667 1471 fax: (206) 667 6524
email: jayoung  ...at...  fhcrc.org

http://www.fhcrc.org/labs/trask/

-------------------------------------------------------------------


library(GenomicRanges)


dat1<- 
GRanges(seqnames="chr2L",IRanges(start=((1:4)*5),width=20),score=c(3,5,10,20) )

dat2<- GRanges(seqnames=rep(c("chr2L","chr3L"),c(4,3)),IRanges(start=c(  
(1:4)*5 , (1:3)*5 ),width=20),score=c(3,5,10,20,6,10,30) )
seqlengths(dat2)<- c(50,40)

#### this works
coverage(dat1,weight=list(elementMetadata(dat1)$score))

## SimpleRleList of length 1
## $chr2L
## 'integer' Rle of length 39 with 8 runs
##   Lengths:  4  5  5  5  5  5  5  5
##   Values :  0  3  8 18 38 35 30 20


#### this works
myscores<- list(c(3,5,10,20), c(6,10,30))
coverage(dat2,weight=myscores)

## SimpleRleList of length 2
## $chr2L
## 'integer' Rle of length 50 with 9 runs
##   Lengths:  4  5  5  5  5  5  5  5 11
##   Values :  0  3  8 18 38 35 30 20  0

## $chr3L
## 'integer' Rle of length 40 with 7 runs
##   Lengths:  4  5  5 10  5  5  6
##   Values :  0  6 16 46 40 30  0

##### but it'd be great if this could work too....
coverage(dat2,weight=list(elementMetadata(dat2)$score))
## Error in normargWeight(weight, nseq) : 'weight' is longer than 'x'

### for now I'll coerce my scores into a list by chromosome: does this seem to 
be a reliable way to do it?  this is the part it took me a while to figure out 
on my own...
myscores<- split( elementMetadata(dat2)$score , as.character(seqnames(dat2)) )
coverage(dat2,weight=myscores)

sessionInfo()

R version 2.13.0 (2011-04-13)
Platform: i386-apple-darwin9.8.0/i386 (32-bit)

locale:
[1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] GenomicRanges_1.4.3 IRanges_1.10.0

_______________________________________________
Bioc-sig-sequencing mailing list
Bioc-sig-sequencing@r-project.org
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing



--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fhcrc.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319



--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fhcrc.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319

_______________________________________________
Bioc-sig-sequencing mailing list
Bioc-sig-sequencing@r-project.org
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing

Re: [Bioc-sig-seq] weighted coverage on GRanges object

Reply via email to