Re: [Bioc-devel] DESeq2 package

2023-01-24 Thread Bernat Gel Moreno
Hi,

I'm no way related to the DESEq2 package. Its author is Mike Love. Te best way 
is to ask your questions in the bioconductor support site.

https://support.bioconductor.org/

Bernat

De: Bioc-devel  de part de Claudio A Bravo 

Enviat el: dimarts, 24 de gener de 2023 0:22
Per a: bioc-devel@r-project.org 
Tema: [Bioc-devel] DESeq2 package

[No soleu rebre correu d'cbrav...@uw.edu. Descobriu per qu? aquest fet ?s 
important a https://aka.ms/LearnAboutSenderIdentification ]

CAUTION: This email originated from outside the organization. Do not click 
links or open attachments unless you recognize the sender and know the content 
is safe.

Hi-

Could you please direct me to the person that can answer questions about this 
package?

Best,

-
Claudio Bravo, MD, FACC, FESC
Assistant Professor
Advanced Heart Failure & Transplant Cardiology
University of Washington



[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] Compatibility of S4 and tidyverse

2020-02-06 Thread Bernat Gel Moreno
Hi Stefano,

Your message to the list was completely empty, thus the joke by Michael.

I think you'll have to resend your message to the list and check it 
arrives as expected so you can get some feedback! :)

Bernat

El 2/6/20 a las 12:29 PM, stefano escribió:
> Hello,
>
> Happy to trigger good laughs, although I did not understand the irony.
>
> If anyone has a clear idea of the issue  or similar experience  and can
> help out would be great!
>
> On Thu, 6 Feb 2020, 8:46 PM Martin Maechler 
> wrote:
>
>>> Michael Lawrence via Bioc-devel
>>>  on Wed, 5 Feb 2020 20:52:52 -0800 writes:
>>  > Yep that about sums it up.
>>
>> :-) ;-)
>>
>> Thank you, Michael !!
>> I haven't laughed  as much from reading e-mails in a long while !!
>>
>> Martin
>>
>>
>>  > On Wed, Feb 5, 2020 at 8:37 PM Stefano Mangiola <
>> mangiolastef...@gmail.com>
>>  > wrote:
>>
>>  {an empty message}
>>
>>  >> ___
>>  >> Bioc-devel@r-project.org mailing list
>>  >> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>  >>
>>
>>
>>  > --
>>  > Michael Lawrence
>>  > Senior Scientist, Bioinformatics and Computational Biology
>>  > Genentech, A Member of the Roche Group
>>  > Office +1 (650) 225-7760
>>  > micha...@gene.com
>>
>   [[alternative HTML version deleted]]
>
> ___
> Bioc-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] Python Z trees to hclust

2019-11-14 Thread Bernat Gel Moreno
Hi Peter,

Thanks for the advice and pointers to the relevant files. After reading 
the Fortran code and thinking about it I ended up coding from scratch my 
own pure R function to set the plotting order of an hclust object. It 
might not be as fast as the Fortran implementation, but it works pretty 
well and as far as I can tell the resulting order is exactly the same to 
the original hclust code. If anyone is interested it's available at the 
CopyNumberPlots package at 
https://github.com/bernatgel/CopyNumberPlots/blob/5b9a1efea53e424a32616b7d82560d81c9ba4a10/R/utils.R#L933

Bernat



El 8/3/19 a las 8:41 PM, Peter Langfelder escribió:
> Hi Bernat,
>
> my advice may not be that useful, but it may be better than the
> silence so far...
>
> Regarding the ordering of objects in hclust, if you're willing to do a
> bit of hacking, have a look at the stats::hclust function; you will
> see that the ordering is computed by a call to Fortran function
> hcass2. That function could be used or perhaps adapted (it uses an
> additional step that you will probably need to skip) to give you the
> ordering. If you're good at understanding Fortran code, you may even
> be able to re-write it directly in R.
>
> Peter
>
> On Tue, Jul 30, 2019 at 12:39 AM Bernat Gel Moreno  wrote:
>> Hi,
>>
>> For one of our packages (CopyNumberPlots) we'll need to read 10X CNV
>> data in H5 format. I've read in everything I need except for the cell
>> clustering tree. It's in a format called Z format produced by SciPy
>> hierarchical clustering. The format itself is relatively easy to parse
>> and not so different from hclust return objects so it would be possible
>> to create a small function to translate the Z notation into an hclust
>> object, if needed, but I'll need to figure out the "order" vector, since
>> it's not present in Z.
>>
>> - Is the Z to hclust function available in any other package? Or
>> something equivalent to that?
>> - If I end up transforming it by hand in a custom function, Is there
>> a function somewhere to compute the order vector in an hclust object?
>>
>> Thanks
>>
>> Bernat
>>
>>
>> ___
>> Bioc-devel@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/bioc-devel

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] Import BSgenome class without attaching BiocGenerics (and others)?

2019-09-12 Thread Bernat Gel Moreno
I have updated karyoploteR and it's now (from version 1.11.9 in devel) 
possible to use a BSgenome object or a seqinfo object as genome 
definitions in plotKaryotype. In both cases, if possible, it will by 
default automatically filter the chromosomes to the canonical ones (if 
defined) and retrieve the cytobands for the genome. Or you can specify 
the exact chromosomes you want to plot. I think this should help with 
the specific question at hand.

Bernat


El 9/12/19 a las 10:09 AM, Bernat Gel Moreno escribió:
> Oh, and Aditya, take into account taht if you give karyoploteR a custom
> genome as you are planning to do, it will not paint the cytobands by
> default, you'll have to get them yourself and give them to plotKaryotype.
>
> If possible, I would recommend giving the genome by name ("hg19") and
> selecting the chromosomes to plot using "chromosomes".
>
> Bernat
>
>
>
>
> El 9/12/19 a las 8:47 AM, Bernat Gel Moreno escribió:
>> Hi all,
>>
>> I'm the developer of karyoploteR.
>>
>> @Michael: I never though about using seqinfo as the source for the
>> genome information. I'll add this as an option to define the genome.
>> Thanks for the suggestion.
>>
>> @Aditya: If you want to plot just your relevant chromosomes, you don't
>> need to alter the genome. You can use the "chromosomes" parameter to
>> give a vector of chromosome names. Is it not working for you for some
>> reason?
>>
>> Bernat
>>
>>
>> El 9/11/19 a las 2:31 PM, Michael Lawrence via Bioc-devel escribió:
>>> I'm pretty surprised that the karyoploteR package does not accept a
>>> Seqinfo since it is plotting chromosomes. But again, please consider
>>> just doing as(seqinfo(bsgenome), "GRanges").
>>>
>>> On Wed, Sep 11, 2019 at 3:59 AM Bhagwat, Aditya
>>>  wrote:
>>>> Hi Herve,
>>>>
>>>> Thank you for your responses.
>>>>From your response, it is clear that the vcountPDict use case does not 
>>>> need a BSgenome -> GRanges coercer.
>>>>
>>>> The karyoploteR use case still requires it, though, to allow plotting of 
>>>> only the chromosomal BSgenome portions:
>>>>
>>>>chromranges <- as(bsegenome, "GRanges")
>>>>kp <- karyoploteR::plotKaryotype(chromranges)
>>>>karyoploteR::kpPlotRegions(kp, crispr_target_sites)
>>>>
>>>> Or do you see any alternative for this purpose too?
>>>>
>>>> Aditya
>>>>
>>>> 
>>>> From: Pages, Herve [hpa...@fredhutch.org]
>>>> Sent: Wednesday, September 11, 2019 12:24 PM
>>>> To: Bhagwat, Aditya; bioc-devel@r-project.org
>>>> Subject: Re: [Bioc-devel] Import BSgenome class without attaching 
>>>> BiocGenerics (and others)?
>>>>
>>>> Hi Aditya,
>>>>
>>>> On 9/11/19 01:31, Bhagwat, Aditya wrote:
>>>>> Hi Herve,
>>>>>
>>>>>
>>>>> > It feels that a coercion method from BSgenome to GRanges should
>>>>> rather be defined in the BSgenome package itself.
>>>>>
>>>>> :-)
>>>>>
>>>>>
>>>>> > Patch/PR welcome on GitHub.
>>>>>
>>>>> Owkies. What pull/fork/check/branch protocol to be followed?
>>>>>
>>>>>
>>>>> > Is this what you have in mind for this coercion?
>>>>> > as(seqinfo(BSgenome.Celegans.UCSC.ce10), "GRanges")
>>>>>
>>>>> Yes.
>>>>>
>>>>> Perhaps also useful to share the wider context, allowing your and others
>>>>> feedback for improved software design.
>>>>> I wanted to subset a
>>>>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__support.bioconductor.org_p_124367=DwMFAw=eRAMFD45gAfqt84VtBcfhQ=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA=FGFwBT0tJu3lfRS_rafeatLzrPxK7PEM0aanQY4M6wY=xNa-6ZKTD1MnnfT55tntHjdK51Y1JQGQxTlzX2-OYmI=>BSgenome
>>>>> (without the _random or _unassigned), but Lori explained this is not
>>>>> possible.
>>>>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__support.bioconductor.org_p_124367=DwMFAw=eRAMFD45gAfqt84VtBcfhQ=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA=FGFwBT0tJu3lfRS_rafeatLzrPxK7PEM0aanQY4M6wY=xNa-6ZKTD1MnnfT55tntHjdK51Y1JQGQxTlzX2-OYmI=>
>>>>>
>>>>> Instead Lori suggested to coe

Re: [Bioc-devel] Import BSgenome class without attaching BiocGenerics (and others)?

2019-09-12 Thread Bernat Gel Moreno
Oh, and Aditya, take into account taht if you give karyoploteR a custom 
genome as you are planning to do, it will not paint the cytobands by 
default, you'll have to get them yourself and give them to plotKaryotype.

If possible, I would recommend giving the genome by name ("hg19") and 
selecting the chromosomes to plot using "chromosomes".

Bernat




El 9/12/19 a las 8:47 AM, Bernat Gel Moreno escribió:
> Hi all,
>
> I'm the developer of karyoploteR.
>
> @Michael: I never though about using seqinfo as the source for the
> genome information. I'll add this as an option to define the genome.
> Thanks for the suggestion.
>
> @Aditya: If you want to plot just your relevant chromosomes, you don't
> need to alter the genome. You can use the "chromosomes" parameter to
> give a vector of chromosome names. Is it not working for you for some
> reason?
>
> Bernat
>
>
> El 9/11/19 a las 2:31 PM, Michael Lawrence via Bioc-devel escribió:
>> I'm pretty surprised that the karyoploteR package does not accept a
>> Seqinfo since it is plotting chromosomes. But again, please consider
>> just doing as(seqinfo(bsgenome), "GRanges").
>>
>> On Wed, Sep 11, 2019 at 3:59 AM Bhagwat, Aditya
>>  wrote:
>>> Hi Herve,
>>>
>>> Thank you for your responses.
>>>   From your response, it is clear that the vcountPDict use case does not 
>>> need a BSgenome -> GRanges coercer.
>>>
>>> The karyoploteR use case still requires it, though, to allow plotting of 
>>> only the chromosomal BSgenome portions:
>>>
>>>   chromranges <- as(bsegenome, "GRanges")
>>>   kp <- karyoploteR::plotKaryotype(chromranges)
>>>   karyoploteR::kpPlotRegions(kp, crispr_target_sites)
>>>
>>> Or do you see any alternative for this purpose too?
>>>
>>> Aditya
>>>
>>> 
>>> From: Pages, Herve [hpa...@fredhutch.org]
>>> Sent: Wednesday, September 11, 2019 12:24 PM
>>> To: Bhagwat, Aditya; bioc-devel@r-project.org
>>> Subject: Re: [Bioc-devel] Import BSgenome class without attaching 
>>> BiocGenerics (and others)?
>>>
>>> Hi Aditya,
>>>
>>> On 9/11/19 01:31, Bhagwat, Aditya wrote:
>>>> Hi Herve,
>>>>
>>>>
>>>>> It feels that a coercion method from BSgenome to GRanges should
>>>> rather be defined in the BSgenome package itself.
>>>>
>>>> :-)
>>>>
>>>>
>>>>> Patch/PR welcome on GitHub.
>>>>
>>>> Owkies. What pull/fork/check/branch protocol to be followed?
>>>>
>>>>
>>>>> Is this what you have in mind for this coercion?
>>>>> as(seqinfo(BSgenome.Celegans.UCSC.ce10), "GRanges")
>>>>
>>>> Yes.
>>>>
>>>> Perhaps also useful to share the wider context, allowing your and others
>>>> feedback for improved software design.
>>>> I wanted to subset a
>>>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__support.bioconductor.org_p_124367=DwMFAw=eRAMFD45gAfqt84VtBcfhQ=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA=FGFwBT0tJu3lfRS_rafeatLzrPxK7PEM0aanQY4M6wY=xNa-6ZKTD1MnnfT55tntHjdK51Y1JQGQxTlzX2-OYmI=>BSgenome
>>>> (without the _random or _unassigned), but Lori explained this is not
>>>> possible.
>>>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__support.bioconductor.org_p_124367=DwMFAw=eRAMFD45gAfqt84VtBcfhQ=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA=FGFwBT0tJu3lfRS_rafeatLzrPxK7PEM0aanQY4M6wY=xNa-6ZKTD1MnnfT55tntHjdK51Y1JQGQxTlzX2-OYmI=>
>>>>
>>>> Instead Lori suggested to coerce a BSgenome into a GRanges
>>>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__support.bioconductor.org_p_123489=DwMFAw=eRAMFD45gAfqt84VtBcfhQ=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA=FGFwBT0tJu3lfRS_rafeatLzrPxK7PEM0aanQY4M6wY=6Eh73QthFfpPsfpRdPWs98pH6GHvv1Z23ORp34OCPxA=>,
>>>> which is a useful solution, but for which currently no exported S4
>>>> method exists
>>>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__support.bioconductor.org_p_124416=DwMFAw=eRAMFD45gAfqt84VtBcfhQ=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA=FGFwBT0tJu3lfRS_rafeatLzrPxK7PEM0aanQY4M6wY=H8owJlOQrNHwNFHfCxGHe27Jxu6xjxpuAMWK8JlTU4Y=>
>>>> So I defined an S4 coercer in my multicrispr package, making sure to
>>>> properly import the Bsgenome class
>>>> <https://urldefense.proof

Re: [Bioc-devel] Import BSgenome class without attaching BiocGenerics (and others)?

2019-09-12 Thread Bernat Gel Moreno
Hi all,

I'm the developer of karyoploteR.

@Michael: I never though about using seqinfo as the source for the 
genome information. I'll add this as an option to define the genome. 
Thanks for the suggestion.

@Aditya: If you want to plot just your relevant chromosomes, you don't 
need to alter the genome. You can use the "chromosomes" parameter to 
give a vector of chromosome names. Is it not working for you for some 
reason?

Bernat


El 9/11/19 a las 2:31 PM, Michael Lawrence via Bioc-devel escribió:
> I'm pretty surprised that the karyoploteR package does not accept a
> Seqinfo since it is plotting chromosomes. But again, please consider
> just doing as(seqinfo(bsgenome), "GRanges").
>
> On Wed, Sep 11, 2019 at 3:59 AM Bhagwat, Aditya
>  wrote:
>> Hi Herve,
>>
>> Thank you for your responses.
>>  From your response, it is clear that the vcountPDict use case does not need 
>> a BSgenome -> GRanges coercer.
>>
>> The karyoploteR use case still requires it, though, to allow plotting of 
>> only the chromosomal BSgenome portions:
>>
>>  chromranges <- as(bsegenome, "GRanges")
>>  kp <- karyoploteR::plotKaryotype(chromranges)
>>  karyoploteR::kpPlotRegions(kp, crispr_target_sites)
>>
>> Or do you see any alternative for this purpose too?
>>
>> Aditya
>>
>> 
>> From: Pages, Herve [hpa...@fredhutch.org]
>> Sent: Wednesday, September 11, 2019 12:24 PM
>> To: Bhagwat, Aditya; bioc-devel@r-project.org
>> Subject: Re: [Bioc-devel] Import BSgenome class without attaching 
>> BiocGenerics (and others)?
>>
>> Hi Aditya,
>>
>> On 9/11/19 01:31, Bhagwat, Aditya wrote:
>>> Hi Herve,
>>>
>>>
>>>   > It feels that a coercion method from BSgenome to GRanges should
>>> rather be defined in the BSgenome package itself.
>>>
>>> :-)
>>>
>>>
>>>   > Patch/PR welcome on GitHub.
>>>
>>> Owkies. What pull/fork/check/branch protocol to be followed?
>>>
>>>
>>>   > Is this what you have in mind for this coercion?
>>>   > as(seqinfo(BSgenome.Celegans.UCSC.ce10), "GRanges")
>>>
>>> Yes.
>>>
>>> Perhaps also useful to share the wider context, allowing your and others
>>> feedback for improved software design.
>>> I wanted to subset a
>>> BSgenome
>>> (without the _random or _unassigned), but Lori explained this is not
>>> possible.
>>> 
>>>
>>> Instead Lori suggested to coerce a BSgenome into a GRanges
>>> ,
>>> which is a useful solution, but for which currently no exported S4
>>> method exists
>>> 
>>> So I defined an S4 coercer in my multicrispr package, making sure to
>>> properly import the Bsgenome class
>>> .
>>> Then, after coercing a BSgenome into a GRanges, I can extract the
>>> chromosomes, after properly importing IRanges::`%in%`
>>> 
>> Looks like you don't need to coerce the BSgenome object to GRanges. See
>> https://support.bioconductor.org/p/123489/#124581
>>
>> H.
>>
>>> Which I can then on end to karyoploteR
>>> ,
>>> for genome-wide plots of crispr target sites.
>>>
>>> A good moment also to say thank you to all of you who helped me out, it
>>> helps me to make multicrispr fit nicely into the BioC ecosystem.
>>>
>>> Speeking of BioC design philosophy, can any of you suggest concise and
>>> to-the-point reading material to deepen my 

[Bioc-devel] Python Z trees to hclust

2019-07-30 Thread Bernat Gel Moreno
Hi,

For one of our packages (CopyNumberPlots) we'll need to read 10X CNV 
data in H5 format. I've read in everything I need except for the cell 
clustering tree. It's in a format called Z format produced by SciPy 
hierarchical clustering. The format itself is relatively easy to parse 
and not so different from hclust return objects so it would be possible 
to create a small function to translate the Z notation into an hclust 
object, if needed, but I'll need to figure out the "order" vector, since 
it's not present in Z.

   - Is the Z to hclust function available in any other package? Or 
something equivalent to that?
   - If I end up transforming it by hand in a custom function, Is there 
a function somewhere to compute the order vector in an hclust object?

Thanks

Bernat


___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] InteractionSet for structural variants

2019-05-20 Thread Bernat Gel Moreno
Hi Aaron,

Thanks for your response. So far my intention is to to plot them and I 
do not intend on performing any other operation. The first step would be 
read in the VCF file and transform it into a meaningful object and I was 
hoping there was a core package already taking care of that, but I get 
from your answer that there's no such functionality implemented.

Thanks again

Bernat





El 5/18/19 a las 4:47 AM, Aaron Lun escribió:
> I would say that it depends on what operations you intend to perform 
> on them. You can _store_ things any way you like, but the trick is to 
> ensure that operations and manipulations on those things are 
> consistent and meaningful. It is not obvious that there are meaningful 
> common operations that one might want to apply to all structural 
> variants.
>
> For example, translocations involve two genomic regions (i.e., the two 
> bits that get stuck together) and so are inherently two-dimensional. A 
> lot of useful operations will be truly translocation-specific, e.g., 
> calculation of distances between anchor regions, identification of 
> bounding boxes in two-dimensional space. These operations will be 
> meaningless to 1-dimensional variants on the linear genome, e.g., 
> CNVs, inversions. The converse also applies where operations on the 
> linear genome have no single equivalent in the two-dimensional case.
>
> So, I would be inclined to store them separately. If you must keep 
> them in one object, just lump them into a List with "translocation" 
> (GInteractions), "cnv" (GRanges) and "inversion" (another GRanges) 
> elements, and people/programs can pull out bits and pieces as needed.
>
> -A
>
>
> On 5/17/19 4:38 AM, Bernat Gel Moreno wrote:
>> Hi all,
>>
>> Is there any standard recommended container for genomic structural
>> variants? I think InteractionSet would work fine for translocation and
>> GRanges for inversions and copy number changes, but I don't know what
>> would be the recommended way to store them all together using standard
>> Bioconductor objects.
>>
>> And actually, is there any package that would load a SV VCF by lumpy or
>> delly and build that object?
>>
>> Thanks!
>>
>> Bernat
>>
>>
>> ___
>> Bioc-devel@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>
>
> ___
> Bioc-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


[Bioc-devel] InteractionSet for structural variants

2019-05-17 Thread Bernat Gel Moreno
Hi all,

Is there any standard recommended container for genomic structural 
variants? I think InteractionSet would work fine for translocation and 
GRanges for inversions and copy number changes, but I don't know what 
would be the recommended way to store them all together using standard 
Bioconductor objects.

And actually, is there any package that would load a SV VCF by lumpy or 
delly and build that object?

Thanks!

Bernat


___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


[Bioc-devel] TIMEOUTS in the submission systems

2019-04-12 Thread Bernat Gel Moreno
Hi,

I assume you are already aware of it, but there's a problem with the builder 
for the new packages. It works fine in Windows but Mac and Linux cannot access 
CRAN

checking package dependencies ...Warning: unable to access index for repository 
https://CRAN.R-project.org/src/contrib:
  cannot open URL 'https://CRAN.R-project.org/src/contrib/PACKAGES'

This causes a WARNING in Malbec2 and a TIMEOUT in Celaya2.

Thanks!

Bernat


[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] Timeout in check for a new submission

2019-04-05 Thread Bernat Gel Moreno
Thanks Lori :)

I've already written to my reviewer (Qian Liu) asking if it's possible to 
proceed with the review in the current situation.

The code in the package is quite fast (and simple) but I'll take a look at that 
link to see  if I can get a few more seconds out.

Bernat







El 4/5/19 a las 4:43 PM, Shepherd, Lori escribi�:

Also,  the reviewer might be able to glance at the code and help try to make it 
run more efficiently.  See points on 
http://bioconductor.org/developers/how-to/efficient-code/  for things like 
vectorization and pre-allocation.



Lori Shepherd

Bioconductor Core Team

Roswell Park Cancer Institute

Department of Biostatistics & Bioinformatics

Elm & Carlton Streets

Buffalo, New York 14263


From: Bioc-devel 
<mailto:bioc-devel-boun...@r-project.org> on 
behalf of Shepherd, Lori 
<mailto:lori.sheph...@roswellpark.org>
Sent: Friday, April 5, 2019 10:40:07 AM
To: Bernat Gel Moreno; bioc-devel@r-project.org<mailto:bioc-devel@r-project.org>
Subject: Re: [Bioc-devel] Timeout in check for a new submission

If the check time is passing on the other platforms,  and the windows check 
isn't incredibly over the time limit,  you should be okay.  Talk with your 
reviewer but in general we would make an exception for this warning.

Most of the time subsets of data can be used for examples and vignettes which 
also help to reduce the check time but still retain usefulness of an 
application, without knowing your data just an example  instead of running over 
all chromosomes,  limit to one smaller chromosome, etc...   If this has already 
been done then you should be able to proceed through the submission process 
despite this warning.


Lori Shepherd

Bioconductor Core Team

Roswell Park Cancer Institute

Department of Biostatistics & Bioinformatics

Elm & Carlton Streets

Buffalo, New York 14263


From: Bioc-devel 
<mailto:bioc-devel-boun...@r-project.org> on 
behalf of Bernat Gel Moreno <mailto:b...@igtp.cat>
Sent: Friday, April 5, 2019 10:27:21 AM
To: bioc-devel@r-project.org<mailto:bioc-devel@r-project.org>
Subject: [Bioc-devel] Timeout in check for a new submission


Hi all,

I've submitted a new package (CopyNumberPlots, issue 1076) and I have a
problem. It keeps giving me a warning because on windows it takes  more
than 5 minutes to check (in Lunix it works with no problems). I've
reduced the examples, removed part of the vignette... and in my machine
it take 3min 20seconds to check, of which only 30seconds are spent in
the examples and vignette and the other 2min 50 seconds are spent in all
other checks.

Other than reducing the vignette and examples I don't know how to reduce
the cheking time. Any pointers?

Thanks a lot

Bernat

___
Bioc-devel@r-project.org<mailto:Bioc-devel@r-project.org> mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


This email message may contain legally privileged and/or confidential 
information.  If you are not the intended recipient(s), or the employee or 
agent responsible for the delivery of this message to the intended 
recipient(s), you are hereby notified that any disclosure, copying, 
distribution, or use of this email message is prohibited.  If you have received 
this message in error, please notify the sender immediately by e-mail and 
delete this email message from your computer. Thank you.
[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org<mailto:Bioc-devel@r-project.org> mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

This email message may contain legally privileged and/or confidential 
information. If you are not the intended recipient(s), or the employee or agent 
responsible for the delivery of this message to the intended recipient(s), you 
are hereby notified that any disclosure, copying, distribution, or use of this 
email message is prohibited. If you have received this message in error, please 
notify the sender immediately by e-mail and delete this email message from your 
computer. Thank you.


[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


[Bioc-devel] Timeout in check for a new submission

2019-04-05 Thread Bernat Gel Moreno

Hi all,

I've submitted a new package (CopyNumberPlots, issue 1076) and I have a 
problem. It keeps giving me a warning because on windows it takes  more 
than 5 minutes to check (in Lunix it works with no problems). I've 
reduced the examples, removed part of the vignette... and in my machine 
it take 3min 20seconds to check, of which only 30seconds are spent in 
the examples and vignette and the other 2min 50 seconds are spent in all 
other checks.

Other than reducing the vignette and examples I don't know how to reduce 
the cheking time. Any pointers?

Thanks a lot

Bernat

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] Windows error "UCSC library operation failed" in package karyoploteR

2018-10-01 Thread Bernat Gel Moreno
Ops, ok, thanks. Should have reread the documentation.

Thanks

Bernat


El 01/10/2018 a las 2:00, Dario Strbenac escribió:
> Good day,
>
> The import of BigWig files does not work on Windows and is documented. 
> Execute ?BigWigFile-class and notice in the Description section: "These 
> functions do not work on Windows.".
> --
> Dario Strbenac
> University of Sydney
> Camperdown NSW 2050
> Australia
> ___
> Bioc-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] Block bootstrap for GenomicRanges

2018-08-14 Thread Bernat Gel

Hi all,

Since you are talking about genomic ranges permutation let me just say 
that the regioneR package already does that. It does not have block 
bootstrapping as defined here, but it implements two different 
randomization models (one that randomizes each region independently and 
another one based on "chromosome 'spinning' " that conserves the 
internal structure of the set of ranges) and both can take into account 
a mask defining regions where ranges cannot be randomized. It's quite 
customizable and so it would be possible to add new permutation 
strategies if needed.


It's probably not the most efficient implementation, but the 
randomization process scales decently and it has proven useful over the 
years.



Bernat

*Bernat Gel Moreno*
Bioinformatician

Hereditary Cancer Program
Program of Predictive and Personalized Medicine of Cancer (PMPPC)
Germans Trias i Pujol Research Institute (IGTP)

Campus Can Ruti
Carretera de Can Ruti, Camí de les Escoles s/n
08916 Badalona, Barcelona, Spain

Tel: (+34) 93 554 3068
Fax: (+34) 93 497 8654
08916 Badalona, Barcelona, Spain
b...@igtp.cat <mailto:b...@igtp.cat>
www.germanstrias.org <http://www.germanstrias.org/>

<http://www.germanstrias.org/>







El 08/14/2018 a las 03:06 PM, Kasper Daniel Hansen escribió:

I agree this is super important.  I think there may be multiple ways of
thinking about a decent bootstrapping or permutation of ranges, in
genomics. I am quite interested in the topic. I think it might belong in a
new package. I would be interesting in extending the conversation and have
a couple of different approaches (theoretical) that we could work on being
efficient.

Best,
Kasper

On Tue, Aug 14, 2018 at 8:27 AM Michael Love 
wrote:


dear Hervé,

Thanks again for the quick and useful reply!

I think that the theory behind the block bootstrap [Kunsch (1989), Liu
and Singh (1992), Politis and Romano (1994)], needs that the blocks be
drawn with replacement (you can get some features twice) and that the
blocks can be overlapping. In a hand-waving way, I think, it's "good"
for the variance estimation on any statistic of interest that y' may
have more or less features than y.

I will explore a bit using the solutions you've laid out.

Now that I think about it, the start-position based solution that I
was thinking about will break if two features in y share the same
start position, so that's not good.

On Mon, Aug 13, 2018 at 11:58 PM, Hervé Pagès 
wrote:

That helps. I think I start to understand what you are after.

See below...


On 08/13/2018 06:07 PM, Michael Love wrote:

dear Hervé,

Thanks for the quick reply about directions to take this.

I'm sorry for not providing sufficient detail about the goal of block
bootstrapping in my initial post. Let me try again. For a moment, let
me ignore multiple chromosomes/seqs and just focus on a single set of
IRanges.

The point of the block bootstrap is: Let's say we want to find the
number of overlaps of x and y, and then assess how surprised we are at
how large that overlap is. Both of them may have features that tend to
cluster together along the genome (independently). One method would
just be to move the features in y around to random start sites, making
y', say B times, and then calculate each of the B times the number of
overlaps between x and y'. Or we might make this better by having
blacklisted sites where the randomly shuffled features in y cannot go.

The block bootstrap is an alternative to randomly moving the start
sites, where instead we create random data, by taking big "blocks" of
features in y. Each block is a lot like a View. And the ultimate goal
is to make B versions of the data y where the features have been
shuffled around, but by taking blocks, we preserve the clumpiness of
the features in y.

Let me give some numbers to make this more concrete, so say we're
making a single block bootstrap sample of a chromosome that is 1000 bp
long. Here is the original y:

y <- IRanges(c(51,61,71,111,121,131,501,511,521,921,931,941),width=5)

If I go with my coverage approach, I should extend it all the way to
the end of the chromosome. Here I lose information if there are
overlaps of features in y, and I'm thinking of a fix I'll describe
below.

cov <- c(coverage(y), Rle(rep(0,55)))

I could make one block bootstrap sample of y (this is 1 out of B in
the ultimate procedure) by taking 10 blocks of width 100. The blocks
have random start positions from 1 to 901.

y.boot.1 <- unlist(Views(cov, start=round(runif(10,1,901)), width=100))


Choosing blocks that can overlap with each others could make y' appear
to have more features than y (by repeating some of the original
features). Also choosing blocks that can leave big gaps in the
chromosome could make y' appear to have less features than y
(by dropping some of the original ranges). Isn't that a problem?

Have you considered choosing a set of blocks that represent a
partitioni

Re: [Bioc-devel] as.list of a GRanges

2018-02-19 Thread Bernat Gel

Hi Hervé,

I completely agree with the goal of having the semantics of list-like 
operations standardised and documented to avoid surprises, and if to do 
so, the current use of as.list must be changed I'm pefectly ok with 
that. I had not seen the strange behaviour with IRanges, so I was not 
aware of the problem.


In any case, thanks for fixing (and simplifying) karyoploteR. In 
retrospective I don't know why I didn't use simple vectorization! So, thanks



Bernat

*Bernat Gel Moreno*
Bioinformatician

Hereditary Cancer Program
Program of Predictive and Personalized Medicine of Cancer (PMPPC)
Germans Trias i Pujol Research Institute (IGTP)

Campus Can Ruti
Carretera de Can Ruti, Camí de les Escoles s/n
08916 Badalona, Barcelona, Spain

Tel: (+34) 93 554 3068
Fax: (+34) 93 497 8654
08916 Badalona, Barcelona, Spain
b...@igtp.cat <mailto:b...@igtp.cat>
www.germanstrias.org <http://www.germanstrias.org/>

<http://www.germanstrias.org/>







El 02/17/2018 a las 04:19 AM, Hervé Pagès escribió:

Hi Bernat,

On 02/15/2018 11:57 PM, Bernat Gel wrote:

Hi Hervé and others,

Thanks for the responses.

I woudn't call as.list() of a GRanges an "obscure behaviour" but more 
a "works as expected, even if not clearly documented" behaviour.


Most users/developers will probably agree that as.list() worked
as expected on a GRanges object. But then they'll be surprised
and confused when they use it on an IRanges object and discover
that it does something completely different. The current effort
is to bring more consistency between GRanges and IRanges objects
and to have their list-like semantics aligned and documented so
there will be no more such surprise.



In any case I can change the code to as(gr, "GRangesList") as suggested.


I went ahead and fixed karyoploteR. This is karyoploteR 1.5.2. Make
sure to resync your GitHub repo by following the instructions here:


https://bioconductor.org/developers/how-to/git/sync-existing-repositories/ 



Note that the loop on the GRanges object (via the call to Map())
was not needed and could be replaced with a solution that uses
proper vectorization.

Best,
H.



Thanks again for the responses and discussion :)

Bernat


*Bernat Gel Moreno*
Bioinformatician

Hereditary Cancer Program
Program of Predictive and Personalized Medicine of Cancer (PMPPC)
Germans Trias i Pujol Research Institute (IGTP)

Campus Can Ruti
Carretera de Can Ruti, Camí de les Escoles s/n
08916 Badalona, Barcelona, Spain

Tel: (+34) 93 554 3068
Fax: (+34) 93 497 8654
08916 Badalona, Barcelona, Spain
b...@igtp.cat <mailto:b...@igtp.cat>
www.germanstrias.org 
<https://urldefense.proofpoint.com/v2/url?u=http-3A__www.germanstrias.org_=DwMDaQ=eRAMFD45gAfqt84VtBcfhQ=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA=Wwl42dL5uGJa8PR0aAcNnIN0t-uut5R2xLKBhl0ynV8=z45_PX78N6zLu1Bcn-mYQcyRortvXjNyQcWASriwsr0=> 



<https://urldefense.proofpoint.com/v2/url?u=http-3A__www.germanstrias.org_=DwMDaQ=eRAMFD45gAfqt84VtBcfhQ=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA=Wwl42dL5uGJa8PR0aAcNnIN0t-uut5R2xLKBhl0ynV8=z45_PX78N6zLu1Bcn-mYQcyRortvXjNyQcWASriwsr0=> 









El 02/15/2018 a las 11:19 PM, Hervé Pagès escribió:

On 02/15/2018 01:57 PM, Michael Lawrence wrote:



On Thu, Feb 15, 2018 at 1:45 PM, Hervé Pagès <hpa...@fredhutch.org 
<mailto:hpa...@fredhutch.org>> wrote:


    On 02/15/2018 11:53 AM, Cook, Malcolm wrote:

    Hi,

    Can I ask, is this change under discussion in current 
release or

    so far in Bioconductor devel only (my assumption)?


    Bioconductor devel only.


   > On 02/15/2018 08:37 AM, Michael Lawrence wrote:
   > > So is as.list() no longer supported for GRanges 
objects?

    I have found it
   > > useful in places.
   >
   > Very few places. I found a dozen of them in the entire
    software repo.

    However there are probably more in the wild...


    What as.list() was doing on a GRanges object was not 
documented. Relying

    on some kind of obscure undocumented feature is never a good idea.


There's just too much that is documented implicitly through 
inherited behaviors, or where we say things like "this data 
structure behaves as one would expect given base R". It's not fair 
to claim that those features are undocumented. Our documentation is 
not complete enough to use it as an excuse.


It's not fair to suggest that this is a widely used feature either.

I've identified all the places in the 1500 software packages where
this was used, and, as I said, there were very few places. BTW I
fixed most of them but my plan is to fix all of them. Some of the
code that is outside the Bioc package corpus might be affected but
it's fair to assume that this will be a very rare occurence. This can
be mitigated by temporary restoring as.list() on GRanges, with a
deprecation message, and wait 1 more devel cycle to replace it with

Re: [Bioc-devel] as.list of a GRanges

2018-02-15 Thread Bernat Gel

Hi Hervé and others,

Thanks for the responses.

I woudn't call as.list() of a GRanges an "obscure behaviour" but more a 
"works as expected, even if not clearly documented" behaviour.


In any case I can change the code to as(gr, "GRangesList") as suggested.

Thanks again for the responses and discussion :)

Bernat


*Bernat Gel Moreno*
Bioinformatician

Hereditary Cancer Program
Program of Predictive and Personalized Medicine of Cancer (PMPPC)
Germans Trias i Pujol Research Institute (IGTP)

Campus Can Ruti
Carretera de Can Ruti, Camí de les Escoles s/n
08916 Badalona, Barcelona, Spain

Tel: (+34) 93 554 3068
Fax: (+34) 93 497 8654
08916 Badalona, Barcelona, Spain
b...@igtp.cat <mailto:b...@igtp.cat>
www.germanstrias.org <http://www.germanstrias.org/>

<http://www.germanstrias.org/>







El 02/15/2018 a las 11:19 PM, Hervé Pagès escribió:

On 02/15/2018 01:57 PM, Michael Lawrence wrote:



On Thu, Feb 15, 2018 at 1:45 PM, Hervé Pagès <hpa...@fredhutch.org 
<mailto:hpa...@fredhutch.org>> wrote:


    On 02/15/2018 11:53 AM, Cook, Malcolm wrote:

    Hi,

    Can I ask, is this change under discussion in current release or
    so far in Bioconductor devel only (my assumption)?


    Bioconductor devel only.


   > On 02/15/2018 08:37 AM, Michael Lawrence wrote:
   > > So is as.list() no longer supported for GRanges objects?
    I have found it
   > > useful in places.
   >
   > Very few places. I found a dozen of them in the entire
    software repo.

    However there are probably more in the wild...


    What as.list() was doing on a GRanges object was not documented. 
Relying

    on some kind of obscure undocumented feature is never a good idea.


There's just too much that is documented implicitly through inherited 
behaviors, or where we say things like "this data structure behaves 
as one would expect given base R". It's not fair to claim that those 
features are undocumented. Our documentation is not complete enough 
to use it as an excuse.


It's not fair to suggest that this is a widely used feature either.

I've identified all the places in the 1500 software packages where
this was used, and, as I said, there were very few places. BTW I
fixed most of them but my plan is to fix all of them. Some of the
code that is outside the Bioc package corpus might be affected but
it's fair to assume that this will be a very rare occurence. This can
be mitigated by temporary restoring as.list() on GRanges, with a
deprecation message, and wait 1 more devel cycle to replace it with
the new behavior. I chose to disable it for now, on purpose, so I can
identify packages that break (the build report is a great tool for
that) and fix them.

I'm not using the fact that as.list() on a GRanges is not documented
as an excuse for anything. Only to help those with concerns to
relativize and relax.

H.




   > Now you should use as.list(as(gr, "GRangesList")) instead.
   > as.list() was behaving inconsistently on IRanges and
    GRanges objects,
   > which is blocking new developments. It will come back with
    a consistent
   > behavior. More generally speaking IRanges and GRanges will
    behave
   > consistently as far as their "list interpretation" is
    concerned.

    Can we please be assured to be reminded of this prominently in
    release notes?


    The changes will be announced and described on this list and in the
    NEWS files of the IRanges and GenomicRanges packages.

    H.


    Thanks!

    ~malcolm


    --     Hervé Pagès

    Program in Computational Biology
    Division of Public Health Sciences
    Fred Hutchinson Cancer Research Center
    1100 Fairview Ave. N, M1-B514
    P.O. Box 19024
    Seattle, WA 98109-1024

    E-mail: hpa...@fredhutch.org <mailto:hpa...@fredhutch.org>
    Phone: (206) 667-5791 <tel:%28206%29%20667-5791>
    Fax: (206) 667-1319 <tel:%28206%29%20667-1319>






___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


[Bioc-devel] as.list of a GRanges

2018-02-15 Thread Bernat Gel

Hi,

I'm having an error in the devel version of my package karyoploteR due 
to an error when converting a GRanges into a list of GRanges with "as.list".


This used to be possible and it was working a few weeks ago.

Am I suposed to use a different approach or is there a problem somewhere?

Thanks!

Bernat

Here's an example

rr <- GRanges(seqnames = c("chr1", "chr2"), ranges = IRanges(start=c(1, 
2), end=c(11, 12)))

as.list(rr)

And the error message

Error in (function (classes, fdef, mtable)  :
  unable to find an inherited method for function 'getListElement' for 
signature '"GRanges"'



and the sessionInfo

> sessionInfo()
R Under development (unstable) (2018-02-14 r74250)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Debian GNU/Linux 8 (jessie)

Matrix products: default
BLAS: /software/debian-8/general/R-Bioc-Devel/current/lib/R/lib/libRblas.so
LAPACK: 
/software/debian-8/general/R-Bioc-Devel/current/lib/R/lib/libRlapack.so


locale:
 [1] LC_CTYPE=en_US.UTF-8  LC_NUMERIC=C LC_TIME=C 
LC_COLLATE=en_US.utf8
 [5] LC_MONETARY=en_US.utf8    LC_MESSAGES=en_US.utf8 
LC_PAPER=es_ES.UTF-8  LC_NAME=C
 [9] LC_ADDRESS=C  LC_TELEPHONE=C LC_MEASUREMENT=en_US.utf8 
LC_IDENTIFICATION=C


attached base packages:
[1] parallel  stats4    stats graphics  grDevices utils datasets  
methods   base


other attached packages:
[1] GenomicRanges_1.31.20 GenomeInfoDb_1.15.5 IRanges_2.13.26   
S4Vectors_0.17.32 BiocGenerics_0.25.3


loaded via a namespace (and not attached):
[1] zlibbioc_1.25.0    compiler_3.5.0 XVector_0.19.8 
tools_3.5.0    GenomeInfoDbData_1.1.0

[6] RCurl_1.95-4.10    bitops_1.0-6



--

*Bernat Gel Moreno*
Bioinformatician

Hereditary Cancer Program
Program of Predictive and Personalized Medicine of Cancer (PMPPC)
Germans Trias i Pujol Research Institute (IGTP)

Campus Can Ruti
Carretera de Can Ruti, Camí de les Escoles s/n
08916 Badalona, Barcelona, Spain

Tel: (+34) 93 554 3068
Fax: (+34) 93 497 8654
08916 Badalona, Barcelona, Spain
b...@igtp.cat <mailto:b...@igtp.cat>
www.germanstrias.org <http://www.germanstrias.org/>

<http://www.germanstrias.org/>







___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] Several ssh keys ?

2017-10-02 Thread Bernat Gel
Don't want to be alarmist, but is there anything in place to prevent a 
"malicious agent" inserting a new ssh key to the form and gaining access 
to our git repo? Maybe it would be sufficient to send an email to the 
maintainer whenever a new key is added so we have a clue something has 
gone wrong?


Bernat

*Bernat Gel Moreno*
Bioinformatician

Hereditary Cancer Program
Program of Predictive and Personalized Medicine of Cancer (PMPPC)
Germans Trias i Pujol Research Institute (IGTP)

Campus Can Ruti
Carretera de Can Ruti, Camí de les Escoles s/n
08916 Badalona, Barcelona, Spain

Tel: (+34) 93 554 3068
Fax: (+34) 93 497 8654
08916 Badalona, Barcelona, Spain
b...@igtp.cat <mailto:b...@igtp.cat>
www.germanstrias.org <http://www.germanstrias.org/>

<http://www.germanstrias.org/>







El 09/29/2017 a las 02:32 PM, Turaga, Nitesh escribió:

Hi,

Good morning. You can do that.

You have to add your new key to the google form. You can do that in two ways.

1. If you have submitted an SSH public key. (This is your case)

 — Submit a new ssh public key to the google form, but with the same SVN ID 
and email address. The google form when it is processed will pick it up.

2. If you have submitted a Github ID(/ username) as a surrogate for us to fetch your 
SSH public keys for Github (located at www.github.com/.keys),

— Add your new keys to the GitHub SSH key form 
(https://help.github.com/articles/adding-a-new-ssh-key-to-your-github-account/).

— DO NOT add anything in the google form.

— When the form is processed again, it will add your new SSH keys to 
bioconductor.  You may add as many keys to GitHub as you want for access from 
different computers, and we’ll keep them all on file.

Best,

Nitesh




On Sep 29, 2017, at 4:38 AM, Samuel Wieczorek <samuel.wieczo...@cea.fr> wrote:

Hi

I would like to work on two different machines with Git and
Bioconductor. Is it possible to have two ssh keys ?


Best regards


Sam

--
*Samuel Wieczorek

Etude de la Dynamique des Protéomes (EDyP)*
*Laboratoire Biologie à Grande Echelle (BGE)*
*U1038 INSERM / CEA / UGA*
*Biosciences and Biotechnology Institute of Grenoble (BIG)*
*CEA / Grenoble*
*17 avenue des Martyrs*
*F-38054 Grenoble Cedex 9*
*/Tél. : 04.38.78.44.14/*
*/Fax : 04.38.78.50.51/*

http://www.edyp.fr/

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel



This email message may contain legally privileged and/or confidential 
information.  If you are not the intended recipient(s), or the employee or 
agent responsible for the delivery of this message to the intended 
recipient(s), you are hereby notified that any disclosure, copying, 
distribution, or use of this email message is prohibited.  If you have received 
this message in error, please notify the sender immediately by e-mail and 
delete this email message from your computer. Thank you.
___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Re: [Bioc-devel] Odd version in release

2017-08-21 Thread Bernat Gel

Ok, I'll do that.

Thanks

*Bernat Gel Moreno*
Bioinformatician

Hereditary Cancer Program
Program of Predictive and Personalized Medicine of Cancer (PMPPC)
Germans Trias i Pujol Research Institute (IGTP)

Campus Can Ruti
Carretera de Can Ruti, Camí de les Escoles s/n
08916 Badalona, Barcelona, Spain

Tel: (+34) 93 554 3068
Fax: (+34) 93 497 8654
08916 Badalona, Barcelona, Spain
b...@igtp.cat <mailto:b...@igtp.cat>
www.germanstrias.org <http://www.germanstrias.org/>

<http://www.germanstrias.org/>







El 08/21/2017 a las 03:24 PM, Martin Morgan escribió:

On 08/21/2017 09:22 AM, Bernat Gel wrote:

Hi,

It seems like a couple of months ago I fixed a bug in release version 
of karyoploteR and pushed a DESCRIPTION with the wrong version 
number, going from 1.0.1  to 1.1.6 (the one in devel at the moment). 
So now both devel and release are in 1.1.*


Should I do something to get release back to an even y number? How 
sould I proceed?


For version number x.y.z. Bump release to 1.2.0 and devel to 1.3.0, so 
that version bumps are always 'forward', 'release' is even in 'y', and 
devel is y + 1 compared to release.


Martin



Thanks

Bernat





This email message may contain legally privileged and/or confidential 
information.  If you are not the intended recipient(s), or the 
employee or agent responsible for the delivery of this message to the 
intended recipient(s), you are hereby notified that any disclosure, 
copying, distribution, or use of this email message is prohibited.  If 
you have received this message in error, please notify the sender 
immediately by e-mail and delete this email message from your 
computer. Thank you.


___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

[Bioc-devel] Odd version in release

2017-08-21 Thread Bernat Gel

Hi,

It seems like a couple of months ago I fixed a bug in release version of 
karyoploteR and pushed a DESCRIPTION with the wrong version number, 
going from 1.0.1  to 1.1.6 (the one in devel at the moment). So now both 
devel and release are in 1.1.*


Should I do something to get release back to an even y number? How sould 
I proceed?


Thanks

Bernat


--

*Bernat Gel Moreno*
Bioinformatician

Hereditary Cancer Program
Program of Predictive and Personalized Medicine of Cancer (PMPPC)
Germans Trias i Pujol Research Institute (IGTP)

Campus Can Ruti
Carretera de Can Ruti, Camí de les Escoles s/n
08916 Badalona, Barcelona, Spain

Tel: (+34) 93 554 3068
Fax: (+34) 93 497 8654
08916 Badalona, Barcelona, Spain
b...@igtp.cat <mailto:b...@igtp.cat>
www.germanstrias.org <http://www.germanstrias.org/>

<http://www.germanstrias.org/>







___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Re: [Bioc-devel] Help understanding an R performance issue

2017-06-30 Thread Bernat Gel

Ok, that makes sense

In my current use case I think I'll be able to filter out first the 
elements that will miss, so this behaviour is not triggered.


But it's good to know this happens so I can try to avoid it in the future.

Thanks.

Bernat


*Bernat Gel Moreno*
Bioinformatician

Hereditary Cancer Program
Program of Predictive and Personalized Medicine of Cancer (PMPPC)
Germans Trias i Pujol Research Institute (IGTP)

Campus Can Ruti
Carretera de Can Ruti, Camí de les Escoles s/n
08916 Badalona, Barcelona, Spain

Tel: (+34) 93 554 3068
Fax: (+34) 93 497 8654
08916 Badalona, Barcelona, Spain
b...@igtp.cat <mailto:b...@igtp.cat>
www.germanstrias.org <http://www.germanstrias.org/>

<http://www.germanstrias.org/>







El 06/30/2017 a las 03:20 PM, Michael Lawrence escribió:

The reason it's faster when shuffled vs. all that end is that when a
miss happens R compares the string to all strings before it in the
subscript. So it's a lot worse to have a miss towards the end.

As Martin wrote, there are basically two possible improvements that
are somewhat complementary:
1) Tell stringSubscript() that it is not replacing so there is no need
to do that scan. This would require passing an argument down the call
stack.
2) Do a self match on the subscript like in Martin's patch, although
it should probably be done lazily on the first miss.

Michael

On Fri, Jun 30, 2017 at 3:32 AM, Bernat Gel <b...@igtp.cat> wrote:

Ok, so it seems more like a bug somewhere than something I falied to
understand, then.

One of the surprises for me is that shuffling the data so the misses do not
happen one after the other seems to solve the issue...

Thanks,

Bernat

*Bernat Gel Moreno*
Bioinformatician

Hereditary Cancer Program
Program of Predictive and Personalized Medicine of Cancer (PMPPC)
Germans Trias i Pujol Research Institute (IGTP)

Campus Can Ruti
Carretera de Can Ruti, Camí de les Escoles s/n
08916 Badalona, Barcelona, Spain

Tel: (+34) 93 554 3068
Fax: (+34) 93 497 8654
08916 Badalona, Barcelona, Spain
b...@igtp.cat <mailto:b...@igtp.cat>
www.germanstrias.org <http://www.germanstrias.org/>

<http://www.germanstrias.org/>








El 06/30/2017 a las 11:21 AM, Hervé Pagès escribió:

Hi Bernat, Michael,

FWIW I reported this issue on R-devel a couple of times. Last time was
in 2013:

   https://stat.ethz.ch/pipermail/r-devel/2013-May/066616.html

Cheers,
H.

On 06/29/2017 11:58 PM, Bernat Gel wrote:

Yes, that would explain part of the situation. But example cc5 shows
that hash misses would account only for part of the time.

Thanks for taking a look into it

Bernat

*Bernat Gel Moreno*
Bioinformatician

Hereditary Cancer Program
Program of Predictive and Personalized Medicine of Cancer (PMPPC)
Germans Trias i Pujol Research Institute (IGTP)

Campus Can Ruti
Carretera de Can Ruti, Camí de les Escoles s/n
08916 Badalona, Barcelona, Spain

Tel: (+34) 93 554 3068
Fax: (+34) 93 497 8654
08916 Badalona, Barcelona, Spain
b...@igtp.cat <mailto:b...@igtp.cat>
www.germanstrias.org

<https://urldefense.proofpoint.com/v2/url?u=http-3A__www.germanstrias.org_=DwIGaQ=eRAMFD45gAfqt84VtBcfhQ=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA=J5Gs0N5MH_g9sSCZ6jNoZm_Dkc0EcHLbOVPcNwdqZ_4=xNWXpfkTzxBoF_c0HoPoyQ0c3v6DA9_xY2WLtwleFlA=
  >


<https://urldefense.proofpoint.com/v2/url?u=http-3A__www.germanstrias.org_=DwIGaQ=eRAMFD45gAfqt84VtBcfhQ=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA=J5Gs0N5MH_g9sSCZ6jNoZm_Dkc0EcHLbOVPcNwdqZ_4=xNWXpfkTzxBoF_c0HoPoyQ0c3v6DA9_xY2WLtwleFlA=
  >







El 06/29/2017 a las 08:48 PM, Michael Lawrence escribió:

Preliminary analysis suggests that this is due to hash misses. When
that happens, R ends up doing costly string comparisons that are on
the order of n^2 where 'n' is the length of the subscript. Looking
into it.

On Thu, Jun 29, 2017 at 10:43 AM, Bernat Gel <b...@igtp.cat> wrote:

Hi all,

This is not strictly a Bioconductor question, but I hope some of the
experts
here can help me understand what's going on with a performance issue
I've
found working on a package.

It has to do with selecting elements from a named vector.

If we have a vector with the names of the chromosomes and their order

  chrs <- setNames(1:24, paste0("chr", c(1:22, "X", "Y")))
  chrs

chr1  chr2  chr3  chr4  chr5  chr6  chr7  chr8  chr9 chr10 chr11
chr12 chr13
chr14 chr15 chr16 chr17
  1 2 3 4 5 6 7 8 9 1011
1213
14151617
chr18 chr19 chr20 chr21 chr22  chrX  chrY
 18192021222324

And we have a second vector of chromosomes (in this case, the
chromosomes
from SNP-array probes)
And we want to use the second vector to select from the first one by
name

  cc <- c(rep("chr17", 19891), rep("chr18", 21353), rep("chr19",
14726),
  rep("chr20", 18135), rep("chr21", 10068), rep("chr22",

Re: [Bioc-devel] Help understanding an R performance issue

2017-06-30 Thread Bernat Gel
Ok, so it seems more like a bug somewhere than something I falied to 
understand, then.


One of the surprises for me is that shuffling the data so the misses do 
not happen one after the other seems to solve the issue...


Thanks,

Bernat

*Bernat Gel Moreno*
Bioinformatician

Hereditary Cancer Program
Program of Predictive and Personalized Medicine of Cancer (PMPPC)
Germans Trias i Pujol Research Institute (IGTP)

Campus Can Ruti
Carretera de Can Ruti, Camí de les Escoles s/n
08916 Badalona, Barcelona, Spain

Tel: (+34) 93 554 3068
Fax: (+34) 93 497 8654
08916 Badalona, Barcelona, Spain
b...@igtp.cat <mailto:b...@igtp.cat>
www.germanstrias.org <http://www.germanstrias.org/>

<http://www.germanstrias.org/>







El 06/30/2017 a las 11:21 AM, Hervé Pagès escribió:

Hi Bernat, Michael,

FWIW I reported this issue on R-devel a couple of times. Last time was
in 2013:

  https://stat.ethz.ch/pipermail/r-devel/2013-May/066616.html

Cheers,
H.

On 06/29/2017 11:58 PM, Bernat Gel wrote:

Yes, that would explain part of the situation. But example cc5 shows
that hash misses would account only for part of the time.

Thanks for taking a look into it

Bernat

*Bernat Gel Moreno*
Bioinformatician

Hereditary Cancer Program
Program of Predictive and Personalized Medicine of Cancer (PMPPC)
Germans Trias i Pujol Research Institute (IGTP)

Campus Can Ruti
Carretera de Can Ruti, Camí de les Escoles s/n
08916 Badalona, Barcelona, Spain

Tel: (+34) 93 554 3068
Fax: (+34) 93 497 8654
08916 Badalona, Barcelona, Spain
b...@igtp.cat <mailto:b...@igtp.cat>
www.germanstrias.org
<https://urldefense.proofpoint.com/v2/url?u=http-3A__www.germanstrias.org_=DwIGaQ=eRAMFD45gAfqt84VtBcfhQ=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA=J5Gs0N5MH_g9sSCZ6jNoZm_Dkc0EcHLbOVPcNwdqZ_4=xNWXpfkTzxBoF_c0HoPoyQ0c3v6DA9_xY2WLtwleFlA= 


 >

<https://urldefense.proofpoint.com/v2/url?u=http-3A__www.germanstrias.org_=DwIGaQ=eRAMFD45gAfqt84VtBcfhQ=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA=J5Gs0N5MH_g9sSCZ6jNoZm_Dkc0EcHLbOVPcNwdqZ_4=xNWXpfkTzxBoF_c0HoPoyQ0c3v6DA9_xY2WLtwleFlA= 


 >







El 06/29/2017 a las 08:48 PM, Michael Lawrence escribió:

Preliminary analysis suggests that this is due to hash misses. When
that happens, R ends up doing costly string comparisons that are on
the order of n^2 where 'n' is the length of the subscript. Looking
into it.

On Thu, Jun 29, 2017 at 10:43 AM, Bernat Gel <b...@igtp.cat> wrote:

Hi all,

This is not strictly a Bioconductor question, but I hope some of the
experts
here can help me understand what's going on with a performance issue
I've
found working on a package.

It has to do with selecting elements from a named vector.

If we have a vector with the names of the chromosomes and their order

 chrs <- setNames(1:24, paste0("chr", c(1:22, "X", "Y")))
 chrs

chr1  chr2  chr3  chr4  chr5  chr6  chr7  chr8  chr9 chr10 chr11
chr12 chr13
chr14 chr15 chr16 chr17
 1 2 3 4 5 6 7 8 9 1011
1213
14151617
chr18 chr19 chr20 chr21 chr22  chrX  chrY
18192021222324

And we have a second vector of chromosomes (in this case, the
chromosomes
from SNP-array probes)
And we want to use the second vector to select from the first one by
name

 cc <- c(rep("chr17", 19891), rep("chr18", 21353), rep("chr19",
14726),
 rep("chr20", 18135), rep("chr21", 10068), rep("chr22", 
10252),

 rep("chrX", 17498), rep("chrY", 1296))
 print(system.time(replicate(10, chrs[cc])))

user  system elapsed
0.136   0.004   0.141

It's fast.

However, if I get the wrong names for the last two chromosomes (chr23
and
chr24 instead of chrX and chrY)

  cc2 <- c(rep("chr17", 19891), rep("chr18", 21353), rep("chr19",
14726),
 rep("chr20", 18135), rep("chr21", 10068), rep("chr22", 
10252),

 rep("chr23", 17498), rep("chr24", 1296))
  print(system.time(replicate(10, chrs[cc2])))

user  system elapsed
144.672   0.012 144.675


It is MUCH slower. (1000x)


BUT, if I shuffle the elements in the second vector

 cc3 <- sample(cc2, length(cc), replace = FALSE)
 print(system.time(replicate(10, chrs[cc3])))

user  system elapsed
0.096   0.004   0.102

It's fast again!!!



The elapsed time is related to the number of elements BEFORE the 
failing

names,

 cc4 <- c(rep("chr22", 10252), rep("chr23", 17498), rep("chr24",
1296))
 print(system.time(replicate(10, chrs[cc4])))

user  system elapsed
17.332   0.004  17.336

 cc5 <- c(rep("chr23", 17498), rep("chr24", 1296))
     print(system.time(replicate(10, chrs[cc5])))

user  system elapsed
1.872   0.000   1.901


so my guess is that it might come from moving around the vec

Re: [Bioc-devel] Help understanding an R performance issue

2017-06-30 Thread Bernat Gel
Yes, that would explain part of the situation. But example cc5 shows 
that hash misses would account only for part of the time.


Thanks for taking a look into it

Bernat

*Bernat Gel Moreno*
Bioinformatician

Hereditary Cancer Program
Program of Predictive and Personalized Medicine of Cancer (PMPPC)
Germans Trias i Pujol Research Institute (IGTP)

Campus Can Ruti
Carretera de Can Ruti, Camí de les Escoles s/n
08916 Badalona, Barcelona, Spain

Tel: (+34) 93 554 3068
Fax: (+34) 93 497 8654
08916 Badalona, Barcelona, Spain
b...@igtp.cat <mailto:b...@igtp.cat>
www.germanstrias.org <http://www.germanstrias.org/>

<http://www.germanstrias.org/>







El 06/29/2017 a las 08:48 PM, Michael Lawrence escribió:

Preliminary analysis suggests that this is due to hash misses. When
that happens, R ends up doing costly string comparisons that are on
the order of n^2 where 'n' is the length of the subscript. Looking
into it.

On Thu, Jun 29, 2017 at 10:43 AM, Bernat Gel <b...@igtp.cat> wrote:

Hi all,

This is not strictly a Bioconductor question, but I hope some of the experts
here can help me understand what's going on with a performance issue I've
found working on a package.

It has to do with selecting elements from a named vector.

If we have a vector with the names of the chromosomes and their order

 chrs <- setNames(1:24, paste0("chr", c(1:22, "X", "Y")))
 chrs

chr1  chr2  chr3  chr4  chr5  chr6  chr7  chr8  chr9 chr10 chr11 chr12 chr13
chr14 chr15 chr16 chr17
 1 2 3 4 5 6 7 8 91011 1213
14151617
chr18 chr19 chr20 chr21 chr22  chrX  chrY
18192021222324

And we have a second vector of chromosomes (in this case, the chromosomes
from SNP-array probes)
And we want to use the second vector to select from the first one by name

 cc <- c(rep("chr17", 19891), rep("chr18", 21353), rep("chr19", 14726),
 rep("chr20", 18135), rep("chr21", 10068), rep("chr22", 10252),
 rep("chrX", 17498), rep("chrY", 1296))
 print(system.time(replicate(10, chrs[cc])))

user  system elapsed
0.136   0.004   0.141

It's fast.

However, if I get the wrong names for the last two chromosomes (chr23 and
chr24 instead of chrX and chrY)

  cc2 <- c(rep("chr17", 19891), rep("chr18", 21353), rep("chr19", 14726),
 rep("chr20", 18135), rep("chr21", 10068), rep("chr22", 10252),
 rep("chr23", 17498), rep("chr24", 1296))
  print(system.time(replicate(10, chrs[cc2])))

user  system elapsed
144.672   0.012 144.675


It is MUCH slower. (1000x)


BUT, if I shuffle the elements in the second vector

 cc3 <- sample(cc2, length(cc), replace = FALSE)
 print(system.time(replicate(10, chrs[cc3])))

user  system elapsed
0.096   0.004   0.102

It's fast again!!!



The elapsed time is related to the number of elements BEFORE the failing
names,

 cc4 <- c(rep("chr22", 10252), rep("chr23", 17498), rep("chr24", 1296))
 print(system.time(replicate(10, chrs[cc4])))

user  system elapsed
17.332   0.004  17.336

 cc5 <- c(rep("chr23", 17498), rep("chr24", 1296))
 print(system.time(replicate(10, chrs[cc5])))

user  system elapsed
1.872   0.000   1.901


so my guess is that it might come from moving around the vector in memory
for each "failed" selection or something similar...

Is it correct? Is there anything I'm missing?

Thanks a lot

Bernat

--

*Bernat Gel Moreno*
Bioinformatician

Hereditary Cancer Program
Program of Predictive and Personalized Medicine of Cancer (PMPPC)
Germans Trias i Pujol Research Institute (IGTP)

Campus Can Ruti
Carretera de Can Ruti, Camí de les Escoles s/n
08916 Badalona, Barcelona, Spain

Tel: (+34) 93 554 3068
Fax: (+34) 93 497 8654
08916 Badalona, Barcelona, Spain
b...@igtp.cat <mailto:b...@igtp.cat>
www.germanstrias.org <http://www.germanstrias.org/>

<http://www.germanstrias.org/>







___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

[Bioc-devel] Help understanding an R performance issue

2017-06-29 Thread Bernat Gel

Hi all,

This is not strictly a Bioconductor question, but I hope some of the 
experts here can help me understand what's going on with a performance 
issue I've found working on a package.


It has to do with selecting elements from a named vector.

If we have a vector with the names of the chromosomes and their order

chrs <- setNames(1:24, paste0("chr", c(1:22, "X", "Y")))
chrs

chr1  chr2  chr3  chr4  chr5  chr6  chr7  chr8  chr9 chr10 chr11 chr12 
chr13 chr14 chr15 chr16 chr17
1 2 3 4 5 6 7 8 91011 12
1314151617

chr18 chr19 chr20 chr21 chr22  chrX  chrY
   18192021222324

And we have a second vector of chromosomes (in this case, the 
chromosomes from SNP-array probes)

And we want to use the second vector to select from the first one by name

cc <- c(rep("chr17", 19891), rep("chr18", 21353), rep("chr19", 14726),
rep("chr20", 18135), rep("chr21", 10068), rep("chr22", 10252),
rep("chrX", 17498), rep("chrY", 1296))
print(system.time(replicate(10, chrs[cc])))

user  system elapsed
0.136   0.004   0.141

It's fast.

However, if I get the wrong names for the last two chromosomes (chr23 
and chr24 instead of chrX and chrY)


 cc2 <- c(rep("chr17", 19891), rep("chr18", 21353), rep("chr19", 
14726),

rep("chr20", 18135), rep("chr21", 10068), rep("chr22", 10252),
rep("chr23", 17498), rep("chr24", 1296))
 print(system.time(replicate(10, chrs[cc2])))

user  system elapsed
144.672   0.012 144.675


It is MUCH slower. (1000x)


BUT, if I shuffle the elements in the second vector

cc3 <- sample(cc2, length(cc), replace = FALSE)
print(system.time(replicate(10, chrs[cc3])))

user  system elapsed
0.096   0.004   0.102

It's fast again!!!



The elapsed time is related to the number of elements BEFORE the failing 
names,


cc4 <- c(rep("chr22", 10252), rep("chr23", 17498), rep("chr24", 1296))
print(system.time(replicate(10, chrs[cc4])))

user  system elapsed
17.332   0.004  17.336

cc5 <- c(rep("chr23", 17498), rep("chr24", 1296))
print(system.time(replicate(10, chrs[cc5])))

user  system elapsed
1.872   0.000   1.901


so my guess is that it might come from moving around the vector in 
memory for each "failed" selection or something similar...


Is it correct? Is there anything I'm missing?

Thanks a lot

Bernat

--

*Bernat Gel Moreno*
Bioinformatician

Hereditary Cancer Program
Program of Predictive and Personalized Medicine of Cancer (PMPPC)
Germans Trias i Pujol Research Institute (IGTP)

Campus Can Ruti
Carretera de Can Ruti, Camí de les Escoles s/n
08916 Badalona, Barcelona, Spain

Tel: (+34) 93 554 3068
Fax: (+34) 93 497 8654
08916 Badalona, Barcelona, Spain
b...@igtp.cat <mailto:b...@igtp.cat>
www.germanstrias.org <http://www.germanstrias.org/>

<http://www.germanstrias.org/>







___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Re: [Bioc-devel] Devel landings in release URLs?

2017-04-26 Thread Bernat Gel

Ok, thanks.

*Bernat Gel Moreno*
Bioinformatician

Hereditary Cancer Program
Program of Predictive and Personalized Medicine of Cancer (PMPPC)
Germans Trias i Pujol Research Institute (IGTP)

Campus Can Ruti
Carretera de Can Ruti, Camí de les Escoles s/n
08916 Badalona, Barcelona, Spain

Tel: (+34) 93 554 3068
Fax: (+34) 93 497 8654
08916 Badalona, Barcelona, Spain
b...@igtp.cat <mailto:b...@igtp.cat>
www.germanstrias.org <http://www.germanstrias.org/>

<http://www.germanstrias.org/>







El 04/26/2017 a las 12:01 PM, Obenchain, Valerie escribió:

Yes, there is a problem with the landing pages. We hope to have this
sorted out soon.

Valerie



On 04/26/2017 02:53 AM, Bernat Gel wrote:

Hi all,

I think there might have been a problem with the generation of the
landing pages for the new release.

The release URL for the packages (limma, for example)

  https://bioconductor.org/packages/release/bioc/html/limma.html

points to the devel landing page and the devel URL

  http://bioconductor.org/packages/devel/bioc/html/limma.html

returns a Page Not Found

Am I missing something?

Thanks

Bernat




This email message may contain legally privileged and/or confidential 
information.  If you are not the intended recipient(s), or the employee or 
agent responsible for the delivery of this message to the intended 
recipient(s), you are hereby notified that any disclosure, copying, 
distribution, or use of this email message is prohibited.  If you have received 
this message in error, please notify the sender immediately by e-mail and 
delete this email message from your computer. Thank you.


___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


[Bioc-devel] Devel landings in release URLs?

2017-04-26 Thread Bernat Gel

Hi all,

I think there might have been a problem with the generation of the 
landing pages for the new release.


The release URL for the packages (limma, for example)

https://bioconductor.org/packages/release/bioc/html/limma.html

points to the devel landing page and the devel URL

http://bioconductor.org/packages/devel/bioc/html/limma.html

returns a Page Not Found

Am I missing something?

Thanks

Bernat

--

*Bernat Gel Moreno*
Bioinformatician

Hereditary Cancer Program
Program of Predictive and Personalized Medicine of Cancer (PMPPC)
Germans Trias i Pujol Research Institute (IGTP)

Campus Can Ruti
Carretera de Can Ruti, Camí de les Escoles s/n
08916 Badalona, Barcelona, Spain

Tel: (+34) 93 554 3068
Fax: (+34) 93 497 8654
08916 Badalona, Barcelona, Spain
b...@igtp.cat <mailto:b...@igtp.cat>
www.germanstrias.org <http://www.germanstrias.org/>

<http://www.germanstrias.org/>







___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Re: [Bioc-devel] Chromosome order in GRangesForUCSCGenome

2017-03-14 Thread Bernat Gel

Great, thanks

*Bernat Gel Moreno*
Bioinformatician

Hereditary Cancer Program
Program of Predictive and Personalized Medicine of Cancer (PMPPC)
Germans Trias i Pujol Research Institute (IGTP)

Campus Can Ruti
Carretera de Can Ruti, Camí de les Escoles s/n
08916 Badalona, Barcelona, Spain

Tel: (+34) 93 554 3068
Fax: (+34) 93 497 8654
08916 Badalona, Barcelona, Spain
b...@igtp.cat <mailto:b...@igtp.cat>
www.germanstrias.org <http://www.germanstrias.org/>

<http://www.germanstrias.org/>







El 03/13/2017 a las 01:57 PM, Michael Lawrence escribió:
Looks like UCSC has started sorting the chromosomes by size.  I made 
1.35.9 use sortSeqlevels() to normalize the order of them.


On Mon, Mar 13, 2017 at 3:05 AM, Bernat Gel <b...@igtp.cat 
<mailto:b...@igtp.cat>> wrote:


I'm downloading genome information from UCSC using the
GRangesForUCSCGenome from rtracklayer and it seems that the
chromosome order is incorrect (or at least non-canonical).

> seqlevels(GRangesForUCSCGenome(genome="hg19"))

[1] "chr1"  "chr2" "chr3" "chr4" "chr5"
[6] "chr6"  "chr7" "chrX" "chr8" "chr9"
[11] "chr10" "chr11" "chr12"  "chr13" "chr14"
[16] "chr15" "chr16" "chr17"  "chr18" "chr20"
[21] "chrY"  "chr19" "chr22"  "chr21" "chr6_ssto_hap7"
[26] ...

With chrX before chr8  and Y before chr19.

And the same happens with SeqinfoForUCSCGenome(genome="hg19")

I know I could reorder them manually, but I'm downloading this
from various genomes to cache them in a package (karyoploteR) and
I'd rather not rely on manual sorting for that.

I'm quite sure it used to return them in the canonical order. Is
there anything I'm missing or is it a bug somewhere?


Thanks a lot

Bernat




sessionInfo()
R Under development (unstable) (2016-11-07 r71637)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Debian GNU/Linux 8 (jessie)

locale:
 [1] LC_CTYPE=en_US.UTF-8  LC_NUMERIC=C LC_TIME=C
 LC_COLLATE=en_US.utf8 LC_MONETARY=en_US.utf8
 [6] LC_MESSAGES=en_US.utf8LC_PAPER=es_ES.UTF-8 LC_NAME=C 
   LC_ADDRESS=C LC_TELEPHONE=C

[11] LC_MEASUREMENT=en_US.utf8 LC_IDENTIFICATION=C

attached base packages:
[1] parallel  stats4stats graphics  grDevices utils
datasets  methods   base

other attached packages:
 [1] testthat_1.0.2karyoploteR_0.99.8 biovizBase_1.23.2   
 regioneR_1.7.1BSgenome_1.43.2 rtracklayer_1.35.1

 [7] Biostrings_2.43.2 XVector_0.15.0 GenomicRanges_1.27.18
GenomeInfoDb_1.11.6   IRanges_2.9.14 S4Vectors_0.13.5
[13] BiocGenerics_0.21.1   memoise_1.0.0

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.8   lattice_0.20-34
Rsamtools_1.27.11 assertthat_0.1
 [5] digest_0.6.11 mime_0.5 R6_2.2.0  
plyr_1.8.4
 [9] backports_1.0.4   acepack_1.4.1 RSQLite_1.1-2   
 httr_1.2.1

[13] ggplot2_2.2.1 BiocInstaller_1.25.3
zlibbioc_1.21.0   GenomicFeatures_1.27.6
[17] lazyeval_0.2.0data.table_1.10.0 rpart_4.1-10 
Matrix_1.2-7.1

[21] checkmate_1.8.2   splines_3.4.0
BiocParallel_1.9.4AnnotationHub_2.7.9
[25] stringr_1.1.0 foreign_0.8-67
ProtGenerics_1.7.0RCurl_1.95-4.8
[29] biomaRt_2.31.3munsell_0.4.3 shiny_0.14.2 
httpuv_1.3.3
[33] base64enc_0.1-3   htmltools_0.3.5 nnet_7.3-12   
   SummarizedExperiment_1.5.3
[37] tibble_1.2gridExtra_2.2.1 htmlTable_1.8 
   interactiveDisplayBase_1.13.0
[41] Hmisc_4.0-2   XML_3.98-1.5 crayon_1.3.2 
GenomicAlignments_1.11.6
[45] bitops_1.0-6  grid_3.4.0 xtable_1.8-2
gtable_0.2.0
[49] DBI_0.5-1 magrittr_1.5 scales_0.4.1 
stringi_1.1.2

[53] latticeExtra_0.6-28   Formula_1.2-1
RColorBrewer_1.1-2    ensembldb_1.99.10
[57] tools_3.4.0   dichromat_2.0-0 Biobase_2.35.0 
  survival_2.40-1

[61] yaml_2.1.14   AnnotationDbi_1.37.0
colorspace_1.3-2  cluster_2.0.5
[65] VariantAnnotation_1.21.14 knitr_1.15.1


-- 


*Bernat Gel Moreno*
Bioinformatician

Hereditary Cancer Program
Program of Predictive and Personalized Medicine of Cancer (PMPPC)
   

[Bioc-devel] Chromosome order in GRangesForUCSCGenome

2017-03-13 Thread Bernat Gel
I'm downloading genome information from UCSC using the 
GRangesForUCSCGenome from rtracklayer and it seems that the chromosome 
order is incorrect (or at least non-canonical).


> seqlevels(GRangesForUCSCGenome(genome="hg19"))

[1] "chr1"  "chr2" "chr3"  "chr4" "chr5"
[6] "chr6"  "chr7" "chrX"  "chr8" "chr9"
[11] "chr10" "chr11" "chr12" "chr13" 
"chr14"
[16] "chr15" "chr16" "chr17" "chr18" 
"chr20"
[21] "chrY"  "chr19" "chr22" "chr21" 
"chr6_ssto_hap7"

[26] ...

With chrX before chr8  and Y before chr19.

And the same happens with SeqinfoForUCSCGenome(genome="hg19")

I know I could reorder them manually, but I'm downloading this from 
various genomes to cache them in a package (karyoploteR) and I'd rather 
not rely on manual sorting for that.


I'm quite sure it used to return them in the canonical order. Is there 
anything I'm missing or is it a bug somewhere?



Thanks a lot

Bernat




sessionInfo()
R Under development (unstable) (2016-11-07 r71637)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Debian GNU/Linux 8 (jessie)

locale:
 [1] LC_CTYPE=en_US.UTF-8  LC_NUMERIC=C LC_TIME=C 
LC_COLLATE=en_US.utf8 LC_MONETARY=en_US.utf8
 [6] LC_MESSAGES=en_US.utf8LC_PAPER=es_ES.UTF-8 
LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C

[11] LC_MEASUREMENT=en_US.utf8 LC_IDENTIFICATION=C

attached base packages:
[1] parallel  stats4stats graphics  grDevices utils datasets  
methods   base


other attached packages:
 [1] testthat_1.0.2karyoploteR_0.99.8 biovizBase_1.23.2 
regioneR_1.7.1BSgenome_1.43.2 rtracklayer_1.35.1
 [7] Biostrings_2.43.2 XVector_0.15.0 GenomicRanges_1.27.18 
GenomeInfoDb_1.11.6   IRanges_2.9.14 S4Vectors_0.13.5

[13] BiocGenerics_0.21.1   memoise_1.0.0

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.8   lattice_0.20-34 
Rsamtools_1.27.11 assertthat_0.1
 [5] digest_0.6.11 mime_0.5 
R6_2.2.0  plyr_1.8.4
 [9] backports_1.0.4   acepack_1.4.1 
RSQLite_1.1-2 httr_1.2.1
[13] ggplot2_2.2.1 BiocInstaller_1.25.3 
zlibbioc_1.21.0   GenomicFeatures_1.27.6
[17] lazyeval_0.2.0data.table_1.10.0 
rpart_4.1-10  Matrix_1.2-7.1
[21] checkmate_1.8.2   splines_3.4.0 
BiocParallel_1.9.4AnnotationHub_2.7.9
[25] stringr_1.1.0 foreign_0.8-67 
ProtGenerics_1.7.0RCurl_1.95-4.8
[29] biomaRt_2.31.3munsell_0.4.3 
shiny_0.14.2  httpuv_1.3.3
[33] base64enc_0.1-3   htmltools_0.3.5 
nnet_7.3-12   SummarizedExperiment_1.5.3
[37] tibble_1.2gridExtra_2.2.1 
htmlTable_1.8 interactiveDisplayBase_1.13.0
[41] Hmisc_4.0-2   XML_3.98-1.5 
crayon_1.3.2  GenomicAlignments_1.11.6
[45] bitops_1.0-6  grid_3.4.0 
xtable_1.8-2  gtable_0.2.0
[49] DBI_0.5-1 magrittr_1.5 
scales_0.4.1  stringi_1.1.2
[53] latticeExtra_0.6-28   Formula_1.2-1 
RColorBrewer_1.1-2ensembldb_1.99.10
[57] tools_3.4.0   dichromat_2.0-0 
Biobase_2.35.0survival_2.40-1
[61] yaml_2.1.14   AnnotationDbi_1.37.0 
colorspace_1.3-2  cluster_2.0.5

[65] VariantAnnotation_1.21.14 knitr_1.15.1


--

*Bernat Gel Moreno*
Bioinformatician

Hereditary Cancer Program
Program of Predictive and Personalized Medicine of Cancer (PMPPC)
Germans Trias i Pujol Research Institute (IGTP)

Campus Can Ruti
Carretera de Can Ruti, Camí de les Escoles s/n
08916 Badalona, Barcelona, Spain

Tel: (+34) 93 554 3068
Fax: (+34) 93 497 8654
08916 Badalona, Barcelona, Spain
b...@igtp.cat <mailto:b...@igtp.cat>
www.germanstrias.org <http://www.germanstrias.org/>

<http://www.germanstrias.org/>







___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Re: [Bioc-devel] Efficient Random Sampling of Positions in GRanges

2016-02-16 Thread Bernat Gel

Hi Dario,

You could use a package called regioneR for that. It has functions to 
create random regions and to randomize existing sets of regions along 
the genome and it can do it taking into account a possible set of masked 
regions.


In your case, you could create a mask for the genome that is the genome 
minus your GRanges object, so the random regions will be placed only on 
your regions.


For example, with a GRanges (A) with only two regions in chromosome 1:

library(regioneR)

A <- toGRanges(data.frame(chr=c("chr1", "chr1"), start=c(20, 100), 
end=c(30, 150)))


hg19.genome <- getGenome("hg19") #You need the hg19 BSgenome package 
installed to do this

hg19.mask <- subtractRegions(hg19.genome, A)

random.regions <- createRandomRegions(nregions=10, length.mean=1, 
length.sd=0, genome=hg19.genome, mask=hg19.mask)




Although the randomization process can be a bit slow when dealing with 
thousands of regions, it will be way faster than the approach you propose.


Hope this helps

Bernat

Bioinformatician, PhD.
Genetic Variation and Cancer
Genetic Diagnostic Unit of Hereditary Cancer (UDGCH-IMPPC)
Institut de Medicina Predictiva i Personalitzada del Càncer (IMPPC)
Campus de Can Ruti
Ctra de Can Ruti, Camí de les Escoles s/n
08916 Badalona, Spain

Tel (+34) 93 557 28 36
b...@imppc.org

http://www.imppc.org/en/

On 02/16/16 12:00, Dario Strbenac wrote:

Hello,

There is no convenience function to sample nucleotide positions from a GRanges 
object. My approach is to generate a GRanges of every chromosomal position with 
a width of 1, then find the overlaps with the desired ranges (admissible 
regions), then sample the positions that overlapped. The construction of the 
GRanges object containing every chromosome position is inefficient, as is 
finding its overlaps with another GRanges object. Could an optimised function 
for this task be added to GenomicRanges ?

--
Dario Strbenac
PhD Student
University of Sydney
Camperdown NSW 2050
Australia

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel