Re: [Bioc-devel] How to speed up GRange comparision

2020-01-30 Thread Pages, Herve
On 1/30/20 13:17, Michael Lawrence wrote:
> That sucks. It was broken since it was added in 2017... now fixed.

Unfortunately these things tend to happen to stuff that doesn't have 
examples or unit tests.

Thanks for the fix!

H.
___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] How to speed up GRange comparision

2020-01-30 Thread Michael Lawrence via Bioc-devel
That sucks. It was broken since it was added in 2017... now fixed.

On Thu, Jan 30, 2020 at 11:20 AM Pages, Herve  wrote:
>
> On 1/30/20 11:10, Hervé Pagès wrote:
> > Yes poverlaps() is a good option, as mentioned earlier.
>
> Well actually not. Looks like it's broken:
>
>  > poverlaps(GRanges("chr1:11-15"), GRanges("chr1:16-20"))
> Error in isSingleNumber(minoverlap) : object 'minoverlaps' not found
>
> H.



-- 
Michael Lawrence
Senior Scientist, Bioinformatics and Computational Biology
Genentech, A Member of the Roche Group
Office +1 (650) 225-7760
micha...@gene.com

Join Genentech on LinkedIn | Twitter | Facebook | Instagram | YouTube

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] How to speed up GRange comparision

2020-01-30 Thread Pages, Herve
On 1/30/20 11:10, Hervé Pagès wrote:
> Yes poverlaps() is a good option, as mentioned earlier.

Well actually not. Looks like it's broken:

 > poverlaps(GRanges("chr1:11-15"), GRanges("chr1:16-20"))
Error in isSingleNumber(minoverlap) : object 'minoverlaps' not found

H.
___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] How to speed up GRange comparision

2020-01-30 Thread Pages, Herve
Yes poverlaps() is a good option, as mentioned earlier. I was just 
commenting about doing something like:

   out <- vector("numeric", length(x))
   out[(which(foo(x))] <- 1

when you can just do:

   out <- foo(x)

H.

On 1/30/20 10:56, Michael Lawrence wrote:
> What happened to just poverlaps()?
> 
> On Thu, Jan 30, 2020 at 10:34 AM Pages, Herve  wrote:
>>
>> On 1/29/20 23:31, web working wrote:
>>> Hi Herve,
>>>
>>> Thank you for your answer. pcompare works fine for me. Here my solution:
>>>
>>> query <- GRanges(rep("chr1", 4), IRanges(c(1, 5, 9, 20), c(2, 6, 10, 22)))
>>> subject <- GRanges(rep("chr1",4), IRanges(c(3, 1, 1, 15), c(4, 2, 2, 21)))
>>> out <- vector("numeric", length(query))
>>> out[(which(abs(pcompare(query, subject))<5))] <- 1
>>> out
>>
>> Why not just
>>
>> out <- abs(pcompare(query, subject)) < 5
>>
>> In any case you should use integer instead of numeric (twice more
>> compact in memory).
>>
>> H.
>>
>>>
>>> Carey was right that this here is off list. Next time I will pose my
>>> question on support.bioconductor.org
>>> .
>>>
>>> Best,
>>>
>>> Tobias
>>>
>>> Am 29.01.20 um 18:02 schrieb Pages, Herve:
 Yes poverlaps().

 Or pcompare(), which should be even faster. But only if you are not
 afraid to go low-level. See ?rangeComparisonCodeToLetter for the meaning
 of the codes returned by pcompare().

 H.

 On 1/29/20 08:01, Michael Lawrence via Bioc-devel wrote:
> poverlaps()?
>
> On Wed, Jan 29, 2020 at 7:50 AM web working  wrote:
>> Hello,
>>
>> I have two big GRanges objects and want to search for an overlap of  the
>> first range of query with the first range of subject. Then take the
>> second range of query and compare it with the second range of subject
>> and so on. Here an example of my problem:
>>
>> # GRanges objects
>> query <- GRanges(rep("chr1", 4), IRanges(c(1, 5, 9, 20), c(2, 6, 10,
>> 22)), id=1:4)
>> subject <- GRanges(rep("chr1",4), IRanges(c(3, 1, 1, 15), c(4, 2, 2,
>> 21)), id=1:4)
>>
>> # The 2 overlaps at the first position should not be counted, because
>> these ranges are at different rows.
>> countOverlaps(query, subject)
>>
>> # Approach 1 (bad style. I have simplified it to understand)
>> dat <- as.data.frame(findOverlaps(query, subject))
>> indexDat <- apply(dat, 1, function(x) x[1]==x[2])
>> indexBool <- dat[indexDat,1]
>> out <- rep(FALSE, length(query))
>> out[indexBool] <- TRUE
>> as.numeric(out)
>>
>> # Approach 2 (bad style and takes too long)
>> out <- vector("numeric", 4)
>> for(i in seq_along(query)) out[i] <- (overlapsAny(query[i], subject[i]))
>> out
>>
>> # Approach 3 (wrong results)
>> as.numeric(overlapsAny(query, subject))
>> as.numeric(overlapsAny(split(query, 1:4), split(subject, 1:4)))
>>
>>
>> Maybe someone has an idea to speed this up?
>>
>>
>> Best,
>>
>> Tobias
>>
>> ___
>> Bioc-devel@r-project.org  mailing list
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_bioc-2Ddevel&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=FSrHBK59_OMc6EbEtcPhkTVO0cfDgSbQBGFOXWyHhjc&s=3tZpvRAw7T5dP21u32TRTf4lZ4QFLtmkouKR7TUlJws&e=
>
>>
>> --
>> Hervé Pagès
>>
>> Program in Computational Biology
>> Division of Public Health Sciences
>> Fred Hutchinson Cancer Research Center
>> 1100 Fairview Ave. N, M1-B514
>> P.O. Box 19024
>> Seattle, WA 98109-1024
>>
>> E-mail: hpa...@fredhutch.org
>> Phone:  (206) 667-5791
>> Fax:(206) 667-1319
> 
> 
> 

-- 
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fredhutch.org
Phone:  (206) 667-5791
Fax:(206) 667-1319
___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] How to speed up GRange comparision

2020-01-30 Thread Michael Lawrence via Bioc-devel
What happened to just poverlaps()?

On Thu, Jan 30, 2020 at 10:34 AM Pages, Herve  wrote:
>
> On 1/29/20 23:31, web working wrote:
> > Hi Herve,
> >
> > Thank you for your answer. pcompare works fine for me. Here my solution:
> >
> > query <- GRanges(rep("chr1", 4), IRanges(c(1, 5, 9, 20), c(2, 6, 10, 22)))
> > subject <- GRanges(rep("chr1",4), IRanges(c(3, 1, 1, 15), c(4, 2, 2, 21)))
> > out <- vector("numeric", length(query))
> > out[(which(abs(pcompare(query, subject))<5))] <- 1
> > out
>
> Why not just
>
>out <- abs(pcompare(query, subject)) < 5
>
> In any case you should use integer instead of numeric (twice more
> compact in memory).
>
> H.
>
> >
> > Carey was right that this here is off list. Next time I will pose my
> > question on support.bioconductor.org
> > .
> >
> > Best,
> >
> > Tobias
> >
> > Am 29.01.20 um 18:02 schrieb Pages, Herve:
> >> Yes poverlaps().
> >>
> >> Or pcompare(), which should be even faster. But only if you are not
> >> afraid to go low-level. See ?rangeComparisonCodeToLetter for the meaning
> >> of the codes returned by pcompare().
> >>
> >> H.
> >>
> >> On 1/29/20 08:01, Michael Lawrence via Bioc-devel wrote:
> >>> poverlaps()?
> >>>
> >>> On Wed, Jan 29, 2020 at 7:50 AM web working  wrote:
>  Hello,
> 
>  I have two big GRanges objects and want to search for an overlap of  the
>  first range of query with the first range of subject. Then take the
>  second range of query and compare it with the second range of subject
>  and so on. Here an example of my problem:
> 
>  # GRanges objects
>  query <- GRanges(rep("chr1", 4), IRanges(c(1, 5, 9, 20), c(2, 6, 10,
>  22)), id=1:4)
>  subject <- GRanges(rep("chr1",4), IRanges(c(3, 1, 1, 15), c(4, 2, 2,
>  21)), id=1:4)
> 
>  # The 2 overlaps at the first position should not be counted, because
>  these ranges are at different rows.
>  countOverlaps(query, subject)
> 
>  # Approach 1 (bad style. I have simplified it to understand)
>  dat <- as.data.frame(findOverlaps(query, subject))
>  indexDat <- apply(dat, 1, function(x) x[1]==x[2])
>  indexBool <- dat[indexDat,1]
>  out <- rep(FALSE, length(query))
>  out[indexBool] <- TRUE
>  as.numeric(out)
> 
>  # Approach 2 (bad style and takes too long)
>  out <- vector("numeric", 4)
>  for(i in seq_along(query)) out[i] <- (overlapsAny(query[i], subject[i]))
>  out
> 
>  # Approach 3 (wrong results)
>  as.numeric(overlapsAny(query, subject))
>  as.numeric(overlapsAny(split(query, 1:4), split(subject, 1:4)))
> 
> 
>  Maybe someone has an idea to speed this up?
> 
> 
>  Best,
> 
>  Tobias
> 
>  ___
>  Bioc-devel@r-project.org  mailing list
>  https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_bioc-2Ddevel&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=FSrHBK59_OMc6EbEtcPhkTVO0cfDgSbQBGFOXWyHhjc&s=3tZpvRAw7T5dP21u32TRTf4lZ4QFLtmkouKR7TUlJws&e=
> >>>
>
> --
> Hervé Pagès
>
> Program in Computational Biology
> Division of Public Health Sciences
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N, M1-B514
> P.O. Box 19024
> Seattle, WA 98109-1024
>
> E-mail: hpa...@fredhutch.org
> Phone:  (206) 667-5791
> Fax:(206) 667-1319



-- 
Michael Lawrence
Senior Scientist, Bioinformatics and Computational Biology
Genentech, A Member of the Roche Group
Office +1 (650) 225-7760
micha...@gene.com

Join Genentech on LinkedIn | Twitter | Facebook | Instagram | YouTube

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] How to speed up GRange comparision

2020-01-30 Thread Pages, Herve
On 1/29/20 23:31, web working wrote:
> Hi Herve,
> 
> Thank you for your answer. pcompare works fine for me. Here my solution:
> 
> query <- GRanges(rep("chr1", 4), IRanges(c(1, 5, 9, 20), c(2, 6, 10, 22)))
> subject <- GRanges(rep("chr1",4), IRanges(c(3, 1, 1, 15), c(4, 2, 2, 21)))
> out <- vector("numeric", length(query))
> out[(which(abs(pcompare(query, subject))<5))] <- 1
> out

Why not just

   out <- abs(pcompare(query, subject)) < 5

In any case you should use integer instead of numeric (twice more 
compact in memory).

H.

> 
> Carey was right that this here is off list. Next time I will pose my 
> question on support.bioconductor.org 
> .
> 
> Best,
> 
> Tobias
> 
> Am 29.01.20 um 18:02 schrieb Pages, Herve:
>> Yes poverlaps().
>>
>> Or pcompare(), which should be even faster. But only if you are not
>> afraid to go low-level. See ?rangeComparisonCodeToLetter for the meaning
>> of the codes returned by pcompare().
>>
>> H.
>>
>> On 1/29/20 08:01, Michael Lawrence via Bioc-devel wrote:
>>> poverlaps()?
>>>
>>> On Wed, Jan 29, 2020 at 7:50 AM web working  wrote:
 Hello,

 I have two big GRanges objects and want to search for an overlap of  the
 first range of query with the first range of subject. Then take the
 second range of query and compare it with the second range of subject
 and so on. Here an example of my problem:

 # GRanges objects
 query <- GRanges(rep("chr1", 4), IRanges(c(1, 5, 9, 20), c(2, 6, 10,
 22)), id=1:4)
 subject <- GRanges(rep("chr1",4), IRanges(c(3, 1, 1, 15), c(4, 2, 2,
 21)), id=1:4)

 # The 2 overlaps at the first position should not be counted, because
 these ranges are at different rows.
 countOverlaps(query, subject)

 # Approach 1 (bad style. I have simplified it to understand)
 dat <- as.data.frame(findOverlaps(query, subject))
 indexDat <- apply(dat, 1, function(x) x[1]==x[2])
 indexBool <- dat[indexDat,1]
 out <- rep(FALSE, length(query))
 out[indexBool] <- TRUE
 as.numeric(out)

 # Approach 2 (bad style and takes too long)
 out <- vector("numeric", 4)
 for(i in seq_along(query)) out[i] <- (overlapsAny(query[i], subject[i]))
 out

 # Approach 3 (wrong results)
 as.numeric(overlapsAny(query, subject))
 as.numeric(overlapsAny(split(query, 1:4), split(subject, 1:4)))


 Maybe someone has an idea to speed this up?


 Best,

 Tobias

 ___
 Bioc-devel@r-project.org  mailing list
 https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_bioc-2Ddevel&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=FSrHBK59_OMc6EbEtcPhkTVO0cfDgSbQBGFOXWyHhjc&s=3tZpvRAw7T5dP21u32TRTf4lZ4QFLtmkouKR7TUlJws&e=
>>>

-- 
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fredhutch.org
Phone:  (206) 667-5791
Fax:(206) 667-1319
___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] How to speed up GRange comparision

2020-01-29 Thread web working
Hi Herve,

Thank you for your answer. pcompare works fine for me. Here my solution:

query <- GRanges(rep("chr1", 4), IRanges(c(1, 5, 9, 20), c(2, 6, 10, 22)))
subject <- GRanges(rep("chr1",4), IRanges(c(3, 1, 1, 15), c(4, 2, 2, 21)))
out <- vector("numeric", length(query))
out[(which(abs(pcompare(query, subject))<5))] <- 1
out

Carey was right that this here is off list. Next time I will pose my 
question on support.bioconductor.org .

Best,

Tobias

Am 29.01.20 um 18:02 schrieb Pages, Herve:
> Yes poverlaps().
>
> Or pcompare(), which should be even faster. But only if you are not
> afraid to go low-level. See ?rangeComparisonCodeToLetter for the meaning
> of the codes returned by pcompare().
>
> H.
>
> On 1/29/20 08:01, Michael Lawrence via Bioc-devel wrote:
>> poverlaps()?
>>
>> On Wed, Jan 29, 2020 at 7:50 AM web working  wrote:
>>> Hello,
>>>
>>> I have two big GRanges objects and want to search for an overlap of  the
>>> first range of query with the first range of subject. Then take the
>>> second range of query and compare it with the second range of subject
>>> and so on. Here an example of my problem:
>>>
>>> # GRanges objects
>>> query <- GRanges(rep("chr1", 4), IRanges(c(1, 5, 9, 20), c(2, 6, 10,
>>> 22)), id=1:4)
>>> subject <- GRanges(rep("chr1",4), IRanges(c(3, 1, 1, 15), c(4, 2, 2,
>>> 21)), id=1:4)
>>>
>>> # The 2 overlaps at the first position should not be counted, because
>>> these ranges are at different rows.
>>> countOverlaps(query, subject)
>>>
>>> # Approach 1 (bad style. I have simplified it to understand)
>>> dat <- as.data.frame(findOverlaps(query, subject))
>>> indexDat <- apply(dat, 1, function(x) x[1]==x[2])
>>> indexBool <- dat[indexDat,1]
>>> out <- rep(FALSE, length(query))
>>> out[indexBool] <- TRUE
>>> as.numeric(out)
>>>
>>> # Approach 2 (bad style and takes too long)
>>> out <- vector("numeric", 4)
>>> for(i in seq_along(query)) out[i] <- (overlapsAny(query[i], subject[i]))
>>> out
>>>
>>> # Approach 3 (wrong results)
>>> as.numeric(overlapsAny(query, subject))
>>> as.numeric(overlapsAny(split(query, 1:4), split(subject, 1:4)))
>>>
>>>
>>> Maybe someone has an idea to speed this up?
>>>
>>>
>>> Best,
>>>
>>> Tobias
>>>
>>> ___
>>> Bioc-devel@r-project.org mailing list
>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_bioc-2Ddevel&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=FSrHBK59_OMc6EbEtcPhkTVO0cfDgSbQBGFOXWyHhjc&s=3tZpvRAw7T5dP21u32TRTf4lZ4QFLtmkouKR7TUlJws&e=
>>
>>

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] How to speed up GRange comparision

2020-01-29 Thread Pages, Herve
On 1/29/20 13:14, Jianhong Ou, Ph.D. wrote:
> Oh, I forget that. Thank you for reminder.
> Then how about:
> 
> distance(query, narrow(subject, start=2, end=-2)) == 0
> 
> ?

Yep, that's more accurate. With the following gotcha:

   'narrow(subject, start=2, end=-2)' will fail if 'subject'
   contains ranges that cover less than 2 positions

Not an unlikely situation e.g. if 'subject' contains TSS!

I just feel that distance() is not really appropriate to detect overlaps.

H.

> 
> 
> On 1/29/20, 12:40 PM, "Pages, Herve"  wrote:
> 
>  On 1/29/20 08:04, Jianhong Ou, Ph.D. wrote:
>  > Try
>  > dist=distance(query, subject)
>  > dist==0
>  > ?
>  
>  Please be aware that dist==0 does NOT mean that 2 ranges overlap. It
>  means that they overlap OR are **adjacent**:
>  
>   > distance(GRanges("chr1:1-20"), GRanges("chr1:21-25"))
>  [1] 0
>  
>  H.
>  
>  >
>  > On 1/29/20, 10:50 AM, "Bioc-devel on behalf of web working" 
>  wrote:
>  >
>  >  Hello,
>  >
>  >  I have two big GRanges objects and want to search for an overlap 
> of  the
>  >  first range of query with the first range of subject. Then take 
> the
>  >  second range of query and compare it with the second range of 
> subject
>  >  and so on. Here an example of my problem:
>  >
>  >  # GRanges objects
>  >  query <- GRanges(rep("chr1", 4), IRanges(c(1, 5, 9, 20), c(2, 6, 
> 10,
>  >  22)), id=1:4)
>  >  subject <- GRanges(rep("chr1",4), IRanges(c(3, 1, 1, 15), c(4, 2, 
> 2,
>  >  21)), id=1:4)
>  >
>  >  # The 2 overlaps at the first position should not be counted, 
> because
>  >  these ranges are at different rows.
>  >  countOverlaps(query, subject)
>  >
>  >  # Approach 1 (bad style. I have simplified it to understand)
>  >  dat <- as.data.frame(findOverlaps(query, subject))
>  >  indexDat <- apply(dat, 1, function(x) x[1]==x[2])
>  >  indexBool <- dat[indexDat,1]
>  >  out <- rep(FALSE, length(query))
>  >  out[indexBool] <- TRUE
>  >  as.numeric(out)
>  >
>  >  # Approach 2 (bad style and takes too long)
>  >  out <- vector("numeric", 4)
>  >  for(i in seq_along(query)) out[i] <- (overlapsAny(query[i], 
> subject[i]))
>  >  out
>  >
>  >  # Approach 3 (wrong results)
>  >  as.numeric(overlapsAny(query, subject))
>  >  as.numeric(overlapsAny(split(query, 1:4), split(subject, 1:4)))
>  >
>  >
>  >  Maybe someone has an idea to speed this up?
>  >
>  >
>  >  Best,
>  >
>  >  Tobias
>  >
>  >  ___
>  >  Bioc-devel@r-project.org mailing list
>  >  
> https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_bioc-2Ddevel&d=DwIDaQ&c=imBPVzF25OnBgGmVOlcsiEgHoG1i6YHLR0Sj_gZ4adc&r=PXg851DHXyo-Gs3eMIfeo49gUXVh-JSZu_MZDDxGun8&m=CL_4pe8tWi75jDizROxriMm7-LhebnosKRxforvK2Jo&s=Ft0x9f_4tOy2Ov9DHVp5KlTOSI4CeURNB8ywlrwgn9E&e=
>  >
>  >
>  > ___
>  > Bioc-devel@r-project.org mailing list
>  > 
> https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_bioc-2Ddevel&d=DwIGaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=mlMbcbdMyysqzyTia1k6Xb4YO7x7jyDtw2bT7ad0dyg&s=jPRTi7pxhHzcFnU-du42SSiHfemeYcUdEF4RZfqdCvU&e=
>  >
>  
>  --
>  Hervé Pagès
>  
>  Program in Computational Biology
>  Division of Public Health Sciences
>  Fred Hutchinson Cancer Research Center
>  1100 Fairview Ave. N, M1-B514
>  P.O. Box 19024
>  Seattle, WA 98109-1024
>  
>  E-mail: hpa...@fredhutch.org
>  Phone:  (206) 667-5791
>  Fax:(206) 667-1319
>  
> 

-- 
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fredhutch.org
Phone:  (206) 667-5791
Fax:(206) 667-1319
___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] How to speed up GRange comparision

2020-01-29 Thread Jianhong Ou, Ph.D.
Oh, I forget that. Thank you for reminder. 
Then how about:

distance(query, narrow(subject, start=2, end=-2)) == 0

?


On 1/29/20, 12:40 PM, "Pages, Herve"  wrote:

On 1/29/20 08:04, Jianhong Ou, Ph.D. wrote:
> Try
> dist=distance(query, subject)
> dist==0
> ?

Please be aware that dist==0 does NOT mean that 2 ranges overlap. It 
means that they overlap OR are **adjacent**:

 > distance(GRanges("chr1:1-20"), GRanges("chr1:21-25"))
[1] 0

H.

> 
> On 1/29/20, 10:50 AM, "Bioc-devel on behalf of web working" 
 wrote:
> 
>  Hello,
>  
>  I have two big GRanges objects and want to search for an overlap of  
the
>  first range of query with the first range of subject. Then take the
>  second range of query and compare it with the second range of subject
>  and so on. Here an example of my problem:
>  
>  # GRanges objects
>  query <- GRanges(rep("chr1", 4), IRanges(c(1, 5, 9, 20), c(2, 6, 10,
>  22)), id=1:4)
>  subject <- GRanges(rep("chr1",4), IRanges(c(3, 1, 1, 15), c(4, 2, 2,
>  21)), id=1:4)
>  
>  # The 2 overlaps at the first position should not be counted, because
>  these ranges are at different rows.
>  countOverlaps(query, subject)
>  
>  # Approach 1 (bad style. I have simplified it to understand)
>  dat <- as.data.frame(findOverlaps(query, subject))
>  indexDat <- apply(dat, 1, function(x) x[1]==x[2])
>  indexBool <- dat[indexDat,1]
>  out <- rep(FALSE, length(query))
>  out[indexBool] <- TRUE
>  as.numeric(out)
>  
>  # Approach 2 (bad style and takes too long)
>  out <- vector("numeric", 4)
>  for(i in seq_along(query)) out[i] <- (overlapsAny(query[i], 
subject[i]))
>  out
>  
>  # Approach 3 (wrong results)
>  as.numeric(overlapsAny(query, subject))
>  as.numeric(overlapsAny(split(query, 1:4), split(subject, 1:4)))
>  
>  
>  Maybe someone has an idea to speed this up?
>  
>  
>  Best,
>  
>  Tobias
>  
>  ___
>  Bioc-devel@r-project.org mailing list
>  
https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_bioc-2Ddevel&d=DwIDaQ&c=imBPVzF25OnBgGmVOlcsiEgHoG1i6YHLR0Sj_gZ4adc&r=PXg851DHXyo-Gs3eMIfeo49gUXVh-JSZu_MZDDxGun8&m=CL_4pe8tWi75jDizROxriMm7-LhebnosKRxforvK2Jo&s=Ft0x9f_4tOy2Ov9DHVp5KlTOSI4CeURNB8ywlrwgn9E&e=
>  
> 
> ___
> Bioc-devel@r-project.org mailing list
> 
https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_bioc-2Ddevel&d=DwIGaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=mlMbcbdMyysqzyTia1k6Xb4YO7x7jyDtw2bT7ad0dyg&s=jPRTi7pxhHzcFnU-du42SSiHfemeYcUdEF4RZfqdCvU&e=
> 

-- 
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fredhutch.org
Phone:  (206) 667-5791
Fax:(206) 667-1319


___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] How to speed up GRange comparision

2020-01-29 Thread Pages, Herve
On 1/29/20 08:04, Jianhong Ou, Ph.D. wrote:
> Try
> dist=distance(query, subject)
> dist==0
> ?

Please be aware that dist==0 does NOT mean that 2 ranges overlap. It 
means that they overlap OR are **adjacent**:

 > distance(GRanges("chr1:1-20"), GRanges("chr1:21-25"))
[1] 0

H.

> 
> On 1/29/20, 10:50 AM, "Bioc-devel on behalf of web working" 
>  wrote:
> 
>  Hello,
>  
>  I have two big GRanges objects and want to search for an overlap of  the
>  first range of query with the first range of subject. Then take the
>  second range of query and compare it with the second range of subject
>  and so on. Here an example of my problem:
>  
>  # GRanges objects
>  query <- GRanges(rep("chr1", 4), IRanges(c(1, 5, 9, 20), c(2, 6, 10,
>  22)), id=1:4)
>  subject <- GRanges(rep("chr1",4), IRanges(c(3, 1, 1, 15), c(4, 2, 2,
>  21)), id=1:4)
>  
>  # The 2 overlaps at the first position should not be counted, because
>  these ranges are at different rows.
>  countOverlaps(query, subject)
>  
>  # Approach 1 (bad style. I have simplified it to understand)
>  dat <- as.data.frame(findOverlaps(query, subject))
>  indexDat <- apply(dat, 1, function(x) x[1]==x[2])
>  indexBool <- dat[indexDat,1]
>  out <- rep(FALSE, length(query))
>  out[indexBool] <- TRUE
>  as.numeric(out)
>  
>  # Approach 2 (bad style and takes too long)
>  out <- vector("numeric", 4)
>  for(i in seq_along(query)) out[i] <- (overlapsAny(query[i], subject[i]))
>  out
>  
>  # Approach 3 (wrong results)
>  as.numeric(overlapsAny(query, subject))
>  as.numeric(overlapsAny(split(query, 1:4), split(subject, 1:4)))
>  
>  
>  Maybe someone has an idea to speed this up?
>  
>  
>  Best,
>  
>  Tobias
>  
>  ___
>  Bioc-devel@r-project.org mailing list
>  
> https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_bioc-2Ddevel&d=DwIDaQ&c=imBPVzF25OnBgGmVOlcsiEgHoG1i6YHLR0Sj_gZ4adc&r=PXg851DHXyo-Gs3eMIfeo49gUXVh-JSZu_MZDDxGun8&m=CL_4pe8tWi75jDizROxriMm7-LhebnosKRxforvK2Jo&s=Ft0x9f_4tOy2Ov9DHVp5KlTOSI4CeURNB8ywlrwgn9E&e=
>  
> 
> ___
> Bioc-devel@r-project.org mailing list
> https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_bioc-2Ddevel&d=DwIGaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=mlMbcbdMyysqzyTia1k6Xb4YO7x7jyDtw2bT7ad0dyg&s=jPRTi7pxhHzcFnU-du42SSiHfemeYcUdEF4RZfqdCvU&e=
> 

-- 
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fredhutch.org
Phone:  (206) 667-5791
Fax:(206) 667-1319
___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] How to speed up GRange comparision

2020-01-29 Thread Pages, Herve
Yes poverlaps().

Or pcompare(), which should be even faster. But only if you are not 
afraid to go low-level. See ?rangeComparisonCodeToLetter for the meaning 
of the codes returned by pcompare().

H.

On 1/29/20 08:01, Michael Lawrence via Bioc-devel wrote:
> poverlaps()?
> 
> On Wed, Jan 29, 2020 at 7:50 AM web working  wrote:
>>
>> Hello,
>>
>> I have two big GRanges objects and want to search for an overlap of  the
>> first range of query with the first range of subject. Then take the
>> second range of query and compare it with the second range of subject
>> and so on. Here an example of my problem:
>>
>> # GRanges objects
>> query <- GRanges(rep("chr1", 4), IRanges(c(1, 5, 9, 20), c(2, 6, 10,
>> 22)), id=1:4)
>> subject <- GRanges(rep("chr1",4), IRanges(c(3, 1, 1, 15), c(4, 2, 2,
>> 21)), id=1:4)
>>
>> # The 2 overlaps at the first position should not be counted, because
>> these ranges are at different rows.
>> countOverlaps(query, subject)
>>
>> # Approach 1 (bad style. I have simplified it to understand)
>> dat <- as.data.frame(findOverlaps(query, subject))
>> indexDat <- apply(dat, 1, function(x) x[1]==x[2])
>> indexBool <- dat[indexDat,1]
>> out <- rep(FALSE, length(query))
>> out[indexBool] <- TRUE
>> as.numeric(out)
>>
>> # Approach 2 (bad style and takes too long)
>> out <- vector("numeric", 4)
>> for(i in seq_along(query)) out[i] <- (overlapsAny(query[i], subject[i]))
>> out
>>
>> # Approach 3 (wrong results)
>> as.numeric(overlapsAny(query, subject))
>> as.numeric(overlapsAny(split(query, 1:4), split(subject, 1:4)))
>>
>>
>> Maybe someone has an idea to speed this up?
>>
>>
>> Best,
>>
>> Tobias
>>
>> ___
>> Bioc-devel@r-project.org mailing list
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_bioc-2Ddevel&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=FSrHBK59_OMc6EbEtcPhkTVO0cfDgSbQBGFOXWyHhjc&s=3tZpvRAw7T5dP21u32TRTf4lZ4QFLtmkouKR7TUlJws&e=
> 
> 
> 

-- 
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fredhutch.org
Phone:  (206) 667-5791
Fax:(206) 667-1319
___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] How to speed up GRange comparision

2020-01-29 Thread Jianhong Ou, Ph.D.
Try 
dist=distance(query, subject)
dist==0
?

On 1/29/20, 10:50 AM, "Bioc-devel on behalf of web working" 
 wrote:

Hello,

I have two big GRanges objects and want to search for an overlap of  the 
first range of query with the first range of subject. Then take the 
second range of query and compare it with the second range of subject 
and so on. Here an example of my problem:

# GRanges objects
query <- GRanges(rep("chr1", 4), IRanges(c(1, 5, 9, 20), c(2, 6, 10, 
22)), id=1:4)
subject <- GRanges(rep("chr1",4), IRanges(c(3, 1, 1, 15), c(4, 2, 2, 
21)), id=1:4)

# The 2 overlaps at the first position should not be counted, because 
these ranges are at different rows.
countOverlaps(query, subject)

# Approach 1 (bad style. I have simplified it to understand)
dat <- as.data.frame(findOverlaps(query, subject))
indexDat <- apply(dat, 1, function(x) x[1]==x[2])
indexBool <- dat[indexDat,1]
out <- rep(FALSE, length(query))
out[indexBool] <- TRUE
as.numeric(out)

# Approach 2 (bad style and takes too long)
out <- vector("numeric", 4)
for(i in seq_along(query)) out[i] <- (overlapsAny(query[i], subject[i]))
out

# Approach 3 (wrong results)
as.numeric(overlapsAny(query, subject))
as.numeric(overlapsAny(split(query, 1:4), split(subject, 1:4)))


Maybe someone has an idea to speed this up?


Best,

Tobias

___
Bioc-devel@r-project.org mailing list

https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_bioc-2Ddevel&d=DwIDaQ&c=imBPVzF25OnBgGmVOlcsiEgHoG1i6YHLR0Sj_gZ4adc&r=PXg851DHXyo-Gs3eMIfeo49gUXVh-JSZu_MZDDxGun8&m=CL_4pe8tWi75jDizROxriMm7-LhebnosKRxforvK2Jo&s=Ft0x9f_4tOy2Ov9DHVp5KlTOSI4CeURNB8ywlrwgn9E&e=
 


___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] How to speed up GRange comparision

2020-01-29 Thread Michael Lawrence via Bioc-devel
poverlaps()?

On Wed, Jan 29, 2020 at 7:50 AM web working  wrote:
>
> Hello,
>
> I have two big GRanges objects and want to search for an overlap of  the
> first range of query with the first range of subject. Then take the
> second range of query and compare it with the second range of subject
> and so on. Here an example of my problem:
>
> # GRanges objects
> query <- GRanges(rep("chr1", 4), IRanges(c(1, 5, 9, 20), c(2, 6, 10,
> 22)), id=1:4)
> subject <- GRanges(rep("chr1",4), IRanges(c(3, 1, 1, 15), c(4, 2, 2,
> 21)), id=1:4)
>
> # The 2 overlaps at the first position should not be counted, because
> these ranges are at different rows.
> countOverlaps(query, subject)
>
> # Approach 1 (bad style. I have simplified it to understand)
> dat <- as.data.frame(findOverlaps(query, subject))
> indexDat <- apply(dat, 1, function(x) x[1]==x[2])
> indexBool <- dat[indexDat,1]
> out <- rep(FALSE, length(query))
> out[indexBool] <- TRUE
> as.numeric(out)
>
> # Approach 2 (bad style and takes too long)
> out <- vector("numeric", 4)
> for(i in seq_along(query)) out[i] <- (overlapsAny(query[i], subject[i]))
> out
>
> # Approach 3 (wrong results)
> as.numeric(overlapsAny(query, subject))
> as.numeric(overlapsAny(split(query, 1:4), split(subject, 1:4)))
>
>
> Maybe someone has an idea to speed this up?
>
>
> Best,
>
> Tobias
>
> ___
> Bioc-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel



-- 
Michael Lawrence
Senior Scientist, Bioinformatics and Computational Biology
Genentech, A Member of the Roche Group
Office +1 (650) 225-7760
micha...@gene.com

Join Genentech on LinkedIn | Twitter | Facebook | Instagram | YouTube

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


[Bioc-devel] How to speed up GRange comparision

2020-01-29 Thread web working

Hello,

I have two big GRanges objects and want to search for an overlap of  the 
first range of query with the first range of subject. Then take the 
second range of query and compare it with the second range of subject 
and so on. Here an example of my problem:


# GRanges objects
query <- GRanges(rep("chr1", 4), IRanges(c(1, 5, 9, 20), c(2, 6, 10, 
22)), id=1:4)
subject <- GRanges(rep("chr1",4), IRanges(c(3, 1, 1, 15), c(4, 2, 2, 
21)), id=1:4)


# The 2 overlaps at the first position should not be counted, because 
these ranges are at different rows.

countOverlaps(query, subject)

# Approach 1 (bad style. I have simplified it to understand)
dat <- as.data.frame(findOverlaps(query, subject))
indexDat <- apply(dat, 1, function(x) x[1]==x[2])
indexBool <- dat[indexDat,1]
out <- rep(FALSE, length(query))
out[indexBool] <- TRUE
as.numeric(out)

# Approach 2 (bad style and takes too long)
out <- vector("numeric", 4)
for(i in seq_along(query)) out[i] <- (overlapsAny(query[i], subject[i]))
out

# Approach 3 (wrong results)
as.numeric(overlapsAny(query, subject))
as.numeric(overlapsAny(split(query, 1:4), split(subject, 1:4)))


Maybe someone has an idea to speed this up?


Best,

Tobias

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel