Re: [Bioc-devel] How to speed up GRange comparision

2020-01-29 Thread web working
Hi Herve,

Thank you for your answer. pcompare works fine for me. Here my solution:

query <- GRanges(rep("chr1", 4), IRanges(c(1, 5, 9, 20), c(2, 6, 10, 22)))
subject <- GRanges(rep("chr1",4), IRanges(c(3, 1, 1, 15), c(4, 2, 2, 21)))
out <- vector("numeric", length(query))
out[(which(abs(pcompare(query, subject))<5))] <- 1
out

Carey was right that this here is off list. Next time I will pose my 
question on support.bioconductor.org .

Best,

Tobias

Am 29.01.20 um 18:02 schrieb Pages, Herve:
> Yes poverlaps().
>
> Or pcompare(), which should be even faster. But only if you are not
> afraid to go low-level. See ?rangeComparisonCodeToLetter for the meaning
> of the codes returned by pcompare().
>
> H.
>
> On 1/29/20 08:01, Michael Lawrence via Bioc-devel wrote:
>> poverlaps()?
>>
>> On Wed, Jan 29, 2020 at 7:50 AM web working  wrote:
>>> Hello,
>>>
>>> I have two big GRanges objects and want to search for an overlap of  the
>>> first range of query with the first range of subject. Then take the
>>> second range of query and compare it with the second range of subject
>>> and so on. Here an example of my problem:
>>>
>>> # GRanges objects
>>> query <- GRanges(rep("chr1", 4), IRanges(c(1, 5, 9, 20), c(2, 6, 10,
>>> 22)), id=1:4)
>>> subject <- GRanges(rep("chr1",4), IRanges(c(3, 1, 1, 15), c(4, 2, 2,
>>> 21)), id=1:4)
>>>
>>> # The 2 overlaps at the first position should not be counted, because
>>> these ranges are at different rows.
>>> countOverlaps(query, subject)
>>>
>>> # Approach 1 (bad style. I have simplified it to understand)
>>> dat <- as.data.frame(findOverlaps(query, subject))
>>> indexDat <- apply(dat, 1, function(x) x[1]==x[2])
>>> indexBool <- dat[indexDat,1]
>>> out <- rep(FALSE, length(query))
>>> out[indexBool] <- TRUE
>>> as.numeric(out)
>>>
>>> # Approach 2 (bad style and takes too long)
>>> out <- vector("numeric", 4)
>>> for(i in seq_along(query)) out[i] <- (overlapsAny(query[i], subject[i]))
>>> out
>>>
>>> # Approach 3 (wrong results)
>>> as.numeric(overlapsAny(query, subject))
>>> as.numeric(overlapsAny(split(query, 1:4), split(subject, 1:4)))
>>>
>>>
>>> Maybe someone has an idea to speed this up?
>>>
>>>
>>> Best,
>>>
>>> Tobias
>>>
>>> ___
>>> Bioc-devel@r-project.org mailing list
>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_bioc-2Ddevel=DwICAg=eRAMFD45gAfqt84VtBcfhQ=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA=FSrHBK59_OMc6EbEtcPhkTVO0cfDgSbQBGFOXWyHhjc=3tZpvRAw7T5dP21u32TRTf4lZ4QFLtmkouKR7TUlJws=
>>
>>

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] How to speed up GRange comparision

2020-01-29 Thread Pages, Herve
On 1/29/20 13:14, Jianhong Ou, Ph.D. wrote:
> Oh, I forget that. Thank you for reminder.
> Then how about:
> 
> distance(query, narrow(subject, start=2, end=-2)) == 0
> 
> ?

Yep, that's more accurate. With the following gotcha:

   'narrow(subject, start=2, end=-2)' will fail if 'subject'
   contains ranges that cover less than 2 positions

Not an unlikely situation e.g. if 'subject' contains TSS!

I just feel that distance() is not really appropriate to detect overlaps.

H.

> 
> 
> On 1/29/20, 12:40 PM, "Pages, Herve"  wrote:
> 
>  On 1/29/20 08:04, Jianhong Ou, Ph.D. wrote:
>  > Try
>  > dist=distance(query, subject)
>  > dist==0
>  > ?
>  
>  Please be aware that dist==0 does NOT mean that 2 ranges overlap. It
>  means that they overlap OR are **adjacent**:
>  
>   > distance(GRanges("chr1:1-20"), GRanges("chr1:21-25"))
>  [1] 0
>  
>  H.
>  
>  >
>  > On 1/29/20, 10:50 AM, "Bioc-devel on behalf of web working" 
>  wrote:
>  >
>  >  Hello,
>  >
>  >  I have two big GRanges objects and want to search for an overlap 
> of  the
>  >  first range of query with the first range of subject. Then take 
> the
>  >  second range of query and compare it with the second range of 
> subject
>  >  and so on. Here an example of my problem:
>  >
>  >  # GRanges objects
>  >  query <- GRanges(rep("chr1", 4), IRanges(c(1, 5, 9, 20), c(2, 6, 
> 10,
>  >  22)), id=1:4)
>  >  subject <- GRanges(rep("chr1",4), IRanges(c(3, 1, 1, 15), c(4, 2, 
> 2,
>  >  21)), id=1:4)
>  >
>  >  # The 2 overlaps at the first position should not be counted, 
> because
>  >  these ranges are at different rows.
>  >  countOverlaps(query, subject)
>  >
>  >  # Approach 1 (bad style. I have simplified it to understand)
>  >  dat <- as.data.frame(findOverlaps(query, subject))
>  >  indexDat <- apply(dat, 1, function(x) x[1]==x[2])
>  >  indexBool <- dat[indexDat,1]
>  >  out <- rep(FALSE, length(query))
>  >  out[indexBool] <- TRUE
>  >  as.numeric(out)
>  >
>  >  # Approach 2 (bad style and takes too long)
>  >  out <- vector("numeric", 4)
>  >  for(i in seq_along(query)) out[i] <- (overlapsAny(query[i], 
> subject[i]))
>  >  out
>  >
>  >  # Approach 3 (wrong results)
>  >  as.numeric(overlapsAny(query, subject))
>  >  as.numeric(overlapsAny(split(query, 1:4), split(subject, 1:4)))
>  >
>  >
>  >  Maybe someone has an idea to speed this up?
>  >
>  >
>  >  Best,
>  >
>  >  Tobias
>  >
>  >  ___
>  >  Bioc-devel@r-project.org mailing list
>  >  
> https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_bioc-2Ddevel=DwIDaQ=imBPVzF25OnBgGmVOlcsiEgHoG1i6YHLR0Sj_gZ4adc=PXg851DHXyo-Gs3eMIfeo49gUXVh-JSZu_MZDDxGun8=CL_4pe8tWi75jDizROxriMm7-LhebnosKRxforvK2Jo=Ft0x9f_4tOy2Ov9DHVp5KlTOSI4CeURNB8ywlrwgn9E=
>  >
>  >
>  > ___
>  > Bioc-devel@r-project.org mailing list
>  > 
> https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_bioc-2Ddevel=DwIGaQ=eRAMFD45gAfqt84VtBcfhQ=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA=mlMbcbdMyysqzyTia1k6Xb4YO7x7jyDtw2bT7ad0dyg=jPRTi7pxhHzcFnU-du42SSiHfemeYcUdEF4RZfqdCvU=
>  >
>  
>  --
>  Hervé Pagès
>  
>  Program in Computational Biology
>  Division of Public Health Sciences
>  Fred Hutchinson Cancer Research Center
>  1100 Fairview Ave. N, M1-B514
>  P.O. Box 19024
>  Seattle, WA 98109-1024
>  
>  E-mail: hpa...@fredhutch.org
>  Phone:  (206) 667-5791
>  Fax:(206) 667-1319
>  
> 

-- 
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fredhutch.org
Phone:  (206) 667-5791
Fax:(206) 667-1319
___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] How to speed up GRange comparision

2020-01-29 Thread Jianhong Ou, Ph.D.
Oh, I forget that. Thank you for reminder. 
Then how about:

distance(query, narrow(subject, start=2, end=-2)) == 0

?


On 1/29/20, 12:40 PM, "Pages, Herve"  wrote:

On 1/29/20 08:04, Jianhong Ou, Ph.D. wrote:
> Try
> dist=distance(query, subject)
> dist==0
> ?

Please be aware that dist==0 does NOT mean that 2 ranges overlap. It 
means that they overlap OR are **adjacent**:

 > distance(GRanges("chr1:1-20"), GRanges("chr1:21-25"))
[1] 0

H.

> 
> On 1/29/20, 10:50 AM, "Bioc-devel on behalf of web working" 
 wrote:
> 
>  Hello,
>  
>  I have two big GRanges objects and want to search for an overlap of  
the
>  first range of query with the first range of subject. Then take the
>  second range of query and compare it with the second range of subject
>  and so on. Here an example of my problem:
>  
>  # GRanges objects
>  query <- GRanges(rep("chr1", 4), IRanges(c(1, 5, 9, 20), c(2, 6, 10,
>  22)), id=1:4)
>  subject <- GRanges(rep("chr1",4), IRanges(c(3, 1, 1, 15), c(4, 2, 2,
>  21)), id=1:4)
>  
>  # The 2 overlaps at the first position should not be counted, because
>  these ranges are at different rows.
>  countOverlaps(query, subject)
>  
>  # Approach 1 (bad style. I have simplified it to understand)
>  dat <- as.data.frame(findOverlaps(query, subject))
>  indexDat <- apply(dat, 1, function(x) x[1]==x[2])
>  indexBool <- dat[indexDat,1]
>  out <- rep(FALSE, length(query))
>  out[indexBool] <- TRUE
>  as.numeric(out)
>  
>  # Approach 2 (bad style and takes too long)
>  out <- vector("numeric", 4)
>  for(i in seq_along(query)) out[i] <- (overlapsAny(query[i], 
subject[i]))
>  out
>  
>  # Approach 3 (wrong results)
>  as.numeric(overlapsAny(query, subject))
>  as.numeric(overlapsAny(split(query, 1:4), split(subject, 1:4)))
>  
>  
>  Maybe someone has an idea to speed this up?
>  
>  
>  Best,
>  
>  Tobias
>  
>  ___
>  Bioc-devel@r-project.org mailing list
>  
https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_bioc-2Ddevel=DwIDaQ=imBPVzF25OnBgGmVOlcsiEgHoG1i6YHLR0Sj_gZ4adc=PXg851DHXyo-Gs3eMIfeo49gUXVh-JSZu_MZDDxGun8=CL_4pe8tWi75jDizROxriMm7-LhebnosKRxforvK2Jo=Ft0x9f_4tOy2Ov9DHVp5KlTOSI4CeURNB8ywlrwgn9E=
>  
> 
> ___
> Bioc-devel@r-project.org mailing list
> 
https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_bioc-2Ddevel=DwIGaQ=eRAMFD45gAfqt84VtBcfhQ=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA=mlMbcbdMyysqzyTia1k6Xb4YO7x7jyDtw2bT7ad0dyg=jPRTi7pxhHzcFnU-du42SSiHfemeYcUdEF4RZfqdCvU=
> 

-- 
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fredhutch.org
Phone:  (206) 667-5791
Fax:(206) 667-1319


___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] How to speed up GRange comparision

2020-01-29 Thread Pages, Herve
On 1/29/20 08:04, Jianhong Ou, Ph.D. wrote:
> Try
> dist=distance(query, subject)
> dist==0
> ?

Please be aware that dist==0 does NOT mean that 2 ranges overlap. It 
means that they overlap OR are **adjacent**:

 > distance(GRanges("chr1:1-20"), GRanges("chr1:21-25"))
[1] 0

H.

> 
> On 1/29/20, 10:50 AM, "Bioc-devel on behalf of web working" 
>  wrote:
> 
>  Hello,
>  
>  I have two big GRanges objects and want to search for an overlap of  the
>  first range of query with the first range of subject. Then take the
>  second range of query and compare it with the second range of subject
>  and so on. Here an example of my problem:
>  
>  # GRanges objects
>  query <- GRanges(rep("chr1", 4), IRanges(c(1, 5, 9, 20), c(2, 6, 10,
>  22)), id=1:4)
>  subject <- GRanges(rep("chr1",4), IRanges(c(3, 1, 1, 15), c(4, 2, 2,
>  21)), id=1:4)
>  
>  # The 2 overlaps at the first position should not be counted, because
>  these ranges are at different rows.
>  countOverlaps(query, subject)
>  
>  # Approach 1 (bad style. I have simplified it to understand)
>  dat <- as.data.frame(findOverlaps(query, subject))
>  indexDat <- apply(dat, 1, function(x) x[1]==x[2])
>  indexBool <- dat[indexDat,1]
>  out <- rep(FALSE, length(query))
>  out[indexBool] <- TRUE
>  as.numeric(out)
>  
>  # Approach 2 (bad style and takes too long)
>  out <- vector("numeric", 4)
>  for(i in seq_along(query)) out[i] <- (overlapsAny(query[i], subject[i]))
>  out
>  
>  # Approach 3 (wrong results)
>  as.numeric(overlapsAny(query, subject))
>  as.numeric(overlapsAny(split(query, 1:4), split(subject, 1:4)))
>  
>  
>  Maybe someone has an idea to speed this up?
>  
>  
>  Best,
>  
>  Tobias
>  
>  ___
>  Bioc-devel@r-project.org mailing list
>  
> https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_bioc-2Ddevel=DwIDaQ=imBPVzF25OnBgGmVOlcsiEgHoG1i6YHLR0Sj_gZ4adc=PXg851DHXyo-Gs3eMIfeo49gUXVh-JSZu_MZDDxGun8=CL_4pe8tWi75jDizROxriMm7-LhebnosKRxforvK2Jo=Ft0x9f_4tOy2Ov9DHVp5KlTOSI4CeURNB8ywlrwgn9E=
>  
> 
> ___
> Bioc-devel@r-project.org mailing list
> https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_bioc-2Ddevel=DwIGaQ=eRAMFD45gAfqt84VtBcfhQ=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA=mlMbcbdMyysqzyTia1k6Xb4YO7x7jyDtw2bT7ad0dyg=jPRTi7pxhHzcFnU-du42SSiHfemeYcUdEF4RZfqdCvU=
> 

-- 
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fredhutch.org
Phone:  (206) 667-5791
Fax:(206) 667-1319
___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] How to speed up GRange comparision

2020-01-29 Thread Pages, Herve
Yes poverlaps().

Or pcompare(), which should be even faster. But only if you are not 
afraid to go low-level. See ?rangeComparisonCodeToLetter for the meaning 
of the codes returned by pcompare().

H.

On 1/29/20 08:01, Michael Lawrence via Bioc-devel wrote:
> poverlaps()?
> 
> On Wed, Jan 29, 2020 at 7:50 AM web working  wrote:
>>
>> Hello,
>>
>> I have two big GRanges objects and want to search for an overlap of  the
>> first range of query with the first range of subject. Then take the
>> second range of query and compare it with the second range of subject
>> and so on. Here an example of my problem:
>>
>> # GRanges objects
>> query <- GRanges(rep("chr1", 4), IRanges(c(1, 5, 9, 20), c(2, 6, 10,
>> 22)), id=1:4)
>> subject <- GRanges(rep("chr1",4), IRanges(c(3, 1, 1, 15), c(4, 2, 2,
>> 21)), id=1:4)
>>
>> # The 2 overlaps at the first position should not be counted, because
>> these ranges are at different rows.
>> countOverlaps(query, subject)
>>
>> # Approach 1 (bad style. I have simplified it to understand)
>> dat <- as.data.frame(findOverlaps(query, subject))
>> indexDat <- apply(dat, 1, function(x) x[1]==x[2])
>> indexBool <- dat[indexDat,1]
>> out <- rep(FALSE, length(query))
>> out[indexBool] <- TRUE
>> as.numeric(out)
>>
>> # Approach 2 (bad style and takes too long)
>> out <- vector("numeric", 4)
>> for(i in seq_along(query)) out[i] <- (overlapsAny(query[i], subject[i]))
>> out
>>
>> # Approach 3 (wrong results)
>> as.numeric(overlapsAny(query, subject))
>> as.numeric(overlapsAny(split(query, 1:4), split(subject, 1:4)))
>>
>>
>> Maybe someone has an idea to speed this up?
>>
>>
>> Best,
>>
>> Tobias
>>
>> ___
>> Bioc-devel@r-project.org mailing list
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_bioc-2Ddevel=DwICAg=eRAMFD45gAfqt84VtBcfhQ=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA=FSrHBK59_OMc6EbEtcPhkTVO0cfDgSbQBGFOXWyHhjc=3tZpvRAw7T5dP21u32TRTf4lZ4QFLtmkouKR7TUlJws=
> 
> 
> 

-- 
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fredhutch.org
Phone:  (206) 667-5791
Fax:(206) 667-1319
___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] New SE or new assay in SE?

2020-01-29 Thread Pages, Herve
Just after I pressed the "Send" button I realized that by returning a 
new SE object you probably meant returning an SE object with only the 
new assay in it. I would favor the other option i.e. 'doProcess(se)' 
adds a new assay to 'se'. I think that's what most workflows based on SE 
objects do.

This doesn't mean that you can't provide a lower-level function that 
returns the transformed data in a "naked" matrix (i.e. not wrapped 
inside an SE). This let's the (more advanced) user decide what they want 
to do with it e.g. they can add it to the original SE:

 assay(se, "normalized") <- normalized_data

or wrap it in its own new SE:

 normalized <- SummarizedExperiment(list(normalized=normalized_data))

H.

On 1/29/20 08:29, Pages, Herve wrote:
> On 1/28/20 01:37, Laurent Gatto wrote:
>> Dear all,
>>
>> Assume we have a SummarizedExperiment object `se` that contains raw count 
>> data, and a method `doProcess` that processes the data to produce a matrix 
>> of identical dimensions (for example log-transformation, normalisation, 
>> imputation, ...). What are the opinions in favour or against the following 
>> two options
>>
>> - `doProcess(se)` returns a new SE object
>> - `doProcess(se)` adds a new assay to se
> 
> Aren't these are the same?
> 
> SE objects are not reference objects i.e. they follow R standard
> copy-on-change semantic. This means that they never get modified **in
> place** (aka they're not "mutable"). So 'doProcess(se)' will always
> return a new object, whatever you do inside the function, that is, even
> if the function modifies 'se' internally e.g. with something like:
> 
> assay(se, "new_assay") <- new_assay
> 
> Note that the assay() setter itself like all setters also produces a new
> object. The parser actually replaces the following code
> 
> assay(se, "new_assay") <- new_assay
> 
> with
> 
> se <- `assay<-`(se, "new_assay", value=new_assay)
> 
> As you can see the previous `se` is replaced with the new one which
> gives the **illusion** of in-place replacement but it's not.
> 
> Hope this helps,
> H.
> 
> 
>>
>> If you are interested about the broader context about this question, see 
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_waldronlab_MultiAssayExperiment_issues_266=DwICAg=eRAMFD45gAfqt84VtBcfhQ=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA=Of3qgEC1ElS9Ji3Iu2vNk93_Fj3m50sTV2zT0dyAKvA=qimtz2YygmTlAiYZOWZJrwPMo6eMKy5E5Rew60452TQ=
>>
>> Thank you in advance for your input.
>>
>> Laurent
>>
>>
>>
>>
>> ___
>> Bioc-devel@r-project.org mailing list
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_bioc-2Ddevel=DwICAg=eRAMFD45gAfqt84VtBcfhQ=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA=Of3qgEC1ElS9Ji3Iu2vNk93_Fj3m50sTV2zT0dyAKvA=_aXY7azhIr_1UPl2s3RvX1MJp_9Xcw_73w2KOYbqBVI=
>>
> 

-- 
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fredhutch.org
Phone:  (206) 667-5791
Fax:(206) 667-1319
___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] New SE or new assay in SE?

2020-01-29 Thread Pages, Herve
On 1/28/20 01:37, Laurent Gatto wrote:
> Dear all,
> 
> Assume we have a SummarizedExperiment object `se` that contains raw count 
> data, and a method `doProcess` that processes the data to produce a matrix of 
> identical dimensions (for example log-transformation, normalisation, 
> imputation, ...). What are the opinions in favour or against the following 
> two options
> 
> - `doProcess(se)` returns a new SE object
> - `doProcess(se)` adds a new assay to se

Aren't these are the same?

SE objects are not reference objects i.e. they follow R standard 
copy-on-change semantic. This means that they never get modified **in 
place** (aka they're not "mutable"). So 'doProcess(se)' will always 
return a new object, whatever you do inside the function, that is, even 
if the function modifies 'se' internally e.g. with something like:

   assay(se, "new_assay") <- new_assay

Note that the assay() setter itself like all setters also produces a new 
object. The parser actually replaces the following code

   assay(se, "new_assay") <- new_assay

with

   se <- `assay<-`(se, "new_assay", value=new_assay)

As you can see the previous `se` is replaced with the new one which 
gives the **illusion** of in-place replacement but it's not.

Hope this helps,
H.


> 
> If you are interested about the broader context about this question, see 
> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_waldronlab_MultiAssayExperiment_issues_266=DwICAg=eRAMFD45gAfqt84VtBcfhQ=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA=Of3qgEC1ElS9Ji3Iu2vNk93_Fj3m50sTV2zT0dyAKvA=qimtz2YygmTlAiYZOWZJrwPMo6eMKy5E5Rew60452TQ=
> 
> Thank you in advance for your input.
> 
> Laurent
> 
> 
> 
> 
> ___
> Bioc-devel@r-project.org mailing list
> https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_bioc-2Ddevel=DwICAg=eRAMFD45gAfqt84VtBcfhQ=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA=Of3qgEC1ElS9Ji3Iu2vNk93_Fj3m50sTV2zT0dyAKvA=_aXY7azhIr_1UPl2s3RvX1MJp_9Xcw_73w2KOYbqBVI=
> 

-- 
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fredhutch.org
Phone:  (206) 667-5791
Fax:(206) 667-1319
___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] How to speed up GRange comparision

2020-01-29 Thread Jianhong Ou, Ph.D.
Try 
dist=distance(query, subject)
dist==0
?

On 1/29/20, 10:50 AM, "Bioc-devel on behalf of web working" 
 wrote:

Hello,

I have two big GRanges objects and want to search for an overlap of  the 
first range of query with the first range of subject. Then take the 
second range of query and compare it with the second range of subject 
and so on. Here an example of my problem:

# GRanges objects
query <- GRanges(rep("chr1", 4), IRanges(c(1, 5, 9, 20), c(2, 6, 10, 
22)), id=1:4)
subject <- GRanges(rep("chr1",4), IRanges(c(3, 1, 1, 15), c(4, 2, 2, 
21)), id=1:4)

# The 2 overlaps at the first position should not be counted, because 
these ranges are at different rows.
countOverlaps(query, subject)

# Approach 1 (bad style. I have simplified it to understand)
dat <- as.data.frame(findOverlaps(query, subject))
indexDat <- apply(dat, 1, function(x) x[1]==x[2])
indexBool <- dat[indexDat,1]
out <- rep(FALSE, length(query))
out[indexBool] <- TRUE
as.numeric(out)

# Approach 2 (bad style and takes too long)
out <- vector("numeric", 4)
for(i in seq_along(query)) out[i] <- (overlapsAny(query[i], subject[i]))
out

# Approach 3 (wrong results)
as.numeric(overlapsAny(query, subject))
as.numeric(overlapsAny(split(query, 1:4), split(subject, 1:4)))


Maybe someone has an idea to speed this up?


Best,

Tobias

___
Bioc-devel@r-project.org mailing list

https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_bioc-2Ddevel=DwIDaQ=imBPVzF25OnBgGmVOlcsiEgHoG1i6YHLR0Sj_gZ4adc=PXg851DHXyo-Gs3eMIfeo49gUXVh-JSZu_MZDDxGun8=CL_4pe8tWi75jDizROxriMm7-LhebnosKRxforvK2Jo=Ft0x9f_4tOy2Ov9DHVp5KlTOSI4CeURNB8ywlrwgn9E=
 


___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] How to speed up GRange comparision

2020-01-29 Thread Michael Lawrence via Bioc-devel
poverlaps()?

On Wed, Jan 29, 2020 at 7:50 AM web working  wrote:
>
> Hello,
>
> I have two big GRanges objects and want to search for an overlap of  the
> first range of query with the first range of subject. Then take the
> second range of query and compare it with the second range of subject
> and so on. Here an example of my problem:
>
> # GRanges objects
> query <- GRanges(rep("chr1", 4), IRanges(c(1, 5, 9, 20), c(2, 6, 10,
> 22)), id=1:4)
> subject <- GRanges(rep("chr1",4), IRanges(c(3, 1, 1, 15), c(4, 2, 2,
> 21)), id=1:4)
>
> # The 2 overlaps at the first position should not be counted, because
> these ranges are at different rows.
> countOverlaps(query, subject)
>
> # Approach 1 (bad style. I have simplified it to understand)
> dat <- as.data.frame(findOverlaps(query, subject))
> indexDat <- apply(dat, 1, function(x) x[1]==x[2])
> indexBool <- dat[indexDat,1]
> out <- rep(FALSE, length(query))
> out[indexBool] <- TRUE
> as.numeric(out)
>
> # Approach 2 (bad style and takes too long)
> out <- vector("numeric", 4)
> for(i in seq_along(query)) out[i] <- (overlapsAny(query[i], subject[i]))
> out
>
> # Approach 3 (wrong results)
> as.numeric(overlapsAny(query, subject))
> as.numeric(overlapsAny(split(query, 1:4), split(subject, 1:4)))
>
>
> Maybe someone has an idea to speed this up?
>
>
> Best,
>
> Tobias
>
> ___
> Bioc-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel



-- 
Michael Lawrence
Senior Scientist, Bioinformatics and Computational Biology
Genentech, A Member of the Roche Group
Office +1 (650) 225-7760
micha...@gene.com

Join Genentech on LinkedIn | Twitter | Facebook | Instagram | YouTube

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


[Bioc-devel] How to speed up GRange comparision

2020-01-29 Thread web working

Hello,

I have two big GRanges objects and want to search for an overlap of  the 
first range of query with the first range of subject. Then take the 
second range of query and compare it with the second range of subject 
and so on. Here an example of my problem:


# GRanges objects
query <- GRanges(rep("chr1", 4), IRanges(c(1, 5, 9, 20), c(2, 6, 10, 
22)), id=1:4)
subject <- GRanges(rep("chr1",4), IRanges(c(3, 1, 1, 15), c(4, 2, 2, 
21)), id=1:4)


# The 2 overlaps at the first position should not be counted, because 
these ranges are at different rows.

countOverlaps(query, subject)

# Approach 1 (bad style. I have simplified it to understand)
dat <- as.data.frame(findOverlaps(query, subject))
indexDat <- apply(dat, 1, function(x) x[1]==x[2])
indexBool <- dat[indexDat,1]
out <- rep(FALSE, length(query))
out[indexBool] <- TRUE
as.numeric(out)

# Approach 2 (bad style and takes too long)
out <- vector("numeric", 4)
for(i in seq_along(query)) out[i] <- (overlapsAny(query[i], subject[i]))
out

# Approach 3 (wrong results)
as.numeric(overlapsAny(query, subject))
as.numeric(overlapsAny(split(query, 1:4), split(subject, 1:4)))


Maybe someone has an idea to speed this up?


Best,

Tobias

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] Problems on BiocCredentials account_activation

2020-01-29 Thread Turaga, Nitesh
Hi,

I've activated your account from my end and set a temp password. 

I'll send you a private email with the temp password. Please make sure to 
change your password after that.

Best,

Nitesh

> On Jan 28, 2020, at 1:43 PM, Danrley Fernandes  wrote:
> 
> Hi,
> I'm the developer of the package rSWeeP, and I'm trying to activate my
> account at Bioconductor Git Credentials (
> https://git.bioconductor.org/BiocCredentials/account_activation/)
> and I keep receiving the message " danrle...@gmail.com is not associated
> with a maintainer of a Bioconductor package. Please check the spelling or
> contact bioc-devel@r-project.org for help". But both the account of the
> GitHub that submitted the package and the maintainer's email on the
> description are  danrle...@gmail.com.
> does someone know how I should proceed?
> 
>   [[alternative HTML version deleted]]
> 
> ___
> Bioc-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel



This email message may contain legally privileged and/or confidential 
information.  If you are not the intended recipient(s), or the employee or 
agent responsible for the delivery of this message to the intended 
recipient(s), you are hereby notified that any disclosure, copying, 
distribution, or use of this email message is prohibited.  If you have received 
this message in error, please notify the sender immediately by e-mail and 
delete this email message from your computer. Thank you.
___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] Still unable to access git bioconductor credentials

2020-01-29 Thread Turaga, Nitesh
Hi

I've just activated the account for you and set a temp password. I'll send that 
to you privately. 

You should then be able to access your account, add SSH keys if needed and then 
access your package.

Nitesh 

> On Jan 29, 2020, at 9:41 AM, Richard Virgen-Slane  
> wrote:
> 
> Hi All,
> 
> I tried to activate rvs.bioto...@gmail.com and I got: rvs.bioto...@gmail.com 
> is
> not associated with a maintainer of a Bioconductor package. Please check
> the spelling or contact bioc-devel@r-project.orgfor help.
> 
> I still can’t activate my account even though I copied at pasted my email
> address.
> 
>   [[alternative HTML version deleted]]
> 
> ___
> Bioc-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel



This email message may contain legally privileged and/or confidential 
information.  If you are not the intended recipient(s), or the employee or 
agent responsible for the delivery of this message to the intended 
recipient(s), you are hereby notified that any disclosure, copying, 
distribution, or use of this email message is prohibited.  If you have received 
this message in error, please notify the sender immediately by e-mail and 
delete this email message from your computer. Thank you.
___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


[Bioc-devel] Still unable to access git bioconductor credentials

2020-01-29 Thread Richard Virgen-Slane
Hi All,

I tried to activate rvs.bioto...@gmail.com and I got: rvs.bioto...@gmail.com is
not associated with a maintainer of a Bioconductor package. Please check
the spelling or contact bioc-devel@r-project.orgfor help.

I still can’t activate my account even though I copied at pasted my email
address.

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] [WARNING: UNSCANNABLE EXTRACTION FAILED]Re: Bioconductor package sagenhaft

2020-01-29 Thread Tim Beissbarth
Should be pushed now.
Best,
Tim

Am Mi., 29. Jan. 2020 um 14:47 Uhr schrieb Shepherd, Lori <
lori.sheph...@roswellpark.org>:

> Have you been able to access your account?  I still do not see any pushes
> to the git.bioconductor.org repository to fix your broken package?
>
> Lori Shepherd
>
> Bioconductor Core Team
>
> Roswell Park Comprehensive Cancer Center
>
> Department of Biostatistics & Bioinformatics
>
> Elm & Carlton Streets
>
> Buffalo, New York 14263
> --
> *From:* Turaga, Nitesh 
> *Sent:* Friday, January 24, 2020 3:01 PM
> *To:* Tim Beissbarth 
> *Cc:* Shepherd, Lori ;
> bioc-devel@r-project.org 
> *Subject:* Re: [Bioc-devel] [WARNING: UNSCANNABLE EXTRACTION FAILED]Re:
> Bioconductor package sagenhaft
>
> I've changed your email address. Please activate your account and access
> your git repository.
>
> You'd need to activate and add your SSH credentials to the BiocCredentials
> account. https://git.bioconductor.org/BiocCredentials/account_activation/
>
> Nitesh
>
> > On Jan 24, 2020, at 10:19 AM, Tim Beissbarth <
> tim.beissba...@bioinf.med.uni-goettingen.de> wrote:
> >
> >
> > Your email listed is beissba...@wehi.edu.au
> > No, it is not. Might have been in some earlier version of the package,
> but has been at least for the last ten years "
> tim.beissba...@ams.med.uni-goettingen.de". I don't have any access to the
> 15 year old email address any more. Also the password from the old SVN does
> not seem to work for the GIT.
> >
> > Best,
> > Tim
> >
> >
> >
>
>
> This email message may contain legally privileged and/or confidential
> information. If you are not the intended recipient(s), or the employee or
> agent responsible for the delivery of this message to the intended
> recipient(s), you are hereby notified that any disclosure, copying,
> distribution, or use of this email message is prohibited. If you have
> received this message in error, please notify the sender immediately by
> e-mail and delete this email message from your computer. Thank you.
>


-- 
Beißbarth, Tim (Prof. Dr.)
Director of Institute Medical Bioinformatics
University Medical Center Göttingen, Goldschmidtstr. 1, 37077 Göttingen
Tel. +49 551 3914912 Fax +49 551 3914914

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] [WARNING: UNSCANNABLE EXTRACTION FAILED]Re: Bioconductor package sagenhaft

2020-01-29 Thread Shepherd, Lori
Have you been able to access your account?  I still do not see any pushes to 
the git.bioconductor.org repository to fix your broken package?


Lori Shepherd

Bioconductor Core Team

Roswell Park Comprehensive Cancer Center

Department of Biostatistics & Bioinformatics

Elm & Carlton Streets

Buffalo, New York 14263


From: Turaga, Nitesh 
Sent: Friday, January 24, 2020 3:01 PM
To: Tim Beissbarth 
Cc: Shepherd, Lori ; bioc-devel@r-project.org 

Subject: Re: [Bioc-devel] [WARNING: UNSCANNABLE EXTRACTION FAILED]Re: 
Bioconductor package sagenhaft

I've changed your email address. Please activate your account and access your 
git repository.

You'd need to activate and add your SSH credentials to the BiocCredentials 
account. https://git.bioconductor.org/BiocCredentials/account_activation/

Nitesh

> On Jan 24, 2020, at 10:19 AM, Tim Beissbarth 
>  wrote:
>
>
> Your email listed is beissba...@wehi.edu.au
> No, it is not. Might have been in some earlier version of the package, but 
> has been at least for the last ten years 
> "tim.beissba...@ams.med.uni-goettingen.de". I don't have any access to the 15 
> year old email address any more. Also the password from the old SVN does not 
> seem to work for the GIT.
>
> Best,
> Tim
>
>
>



This email message may contain legally privileged and/or confidential 
information.  If you are not the intended recipient(s), or the employee or 
agent responsible for the delivery of this message to the intended 
recipient(s), you are hereby notified that any disclosure, copying, 
distribution, or use of this email message is prohibited.  If you have received 
this message in error, please notify the sender immediately by e-mail and 
delete this email message from your computer. Thank you.
[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] Updating R-devel on Bioconductor builders

2020-01-29 Thread Shepherd, Lori
Many reported an issue with the vignettes building when using BiocStyle
https://github.com/Bioconductor/BiocStyle/issues/71
We traced it back to an issue with knitr.
https://github.com/yihui/knitr/issues/1797

While knitr fixed the issue,  it has not been pushed officially to CRAN as of 
yet so this ERROR will mostly likely reappear in today's devel 3.11 build 
report after R-devel was updated yesterday.
We are making corrections today to install the latest version so these ERROR's 
should clear up on their own in tomorrow's report.  Sorry for the inconvenience.


Lori Shepherd

Bioconductor Core Team

Roswell Park Comprehensive Cancer Center

Department of Biostatistics & Bioinformatics

Elm & Carlton Streets

Buffalo, New York 14263


From: Bioc-devel  on behalf of Shepherd, Lori 

Sent: Tuesday, January 28, 2020 12:34 PM
To: bioc-devel@r-project.org 
Subject: [Bioc-devel] Updating R-devel on Bioconductor builders

We will be updating R-devel on the Bioconductor devel builders this afternoon 
(malbec2 and tokay2).  This may mean some intermittent download on the Single 
Package Builder.  It should not affect the build report for today. We 
appreciate your understanding.


Lori Shepherd

Bioconductor Core Team

Roswell Park Comprehensive Cancer Center

Department of Biostatistics & Bioinformatics

Elm & Carlton Streets

Buffalo, New York 14263


This email message may contain legally privileged and/or confidential 
information.  If you are not the intended recipient(s), or the employee or 
agent responsible for the delivery of this message to the intended 
recipient(s), you are hereby notified that any disclosure, copying, 
distribution, or use of this email message is prohibited.  If you have received 
this message in error, please notify the sender immediately by e-mail and 
delete this email message from your computer. Thank you.
[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


This email message may contain legally privileged and/or confidential 
information.  If you are not the intended recipient(s), or the employee or 
agent responsible for the delivery of this message to the intended 
recipient(s), you are hereby notified that any disclosure, copying, 
distribution, or use of this email message is prohibited.  If you have received 
this message in error, please notify the sender immediately by e-mail and 
delete this email message from your computer. Thank you.
[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel