Re: [Bioc-devel] BiocManager now on CRAN

2018-07-16 Thread Gabe Becker
Ah, of course. I should have realized that. Makes sense. I'll get that
fixed in devel soon.

Thanks and sorry for the noise.
~G

On Mon, Jul 16, 2018 at 9:37 AM, Marcel Ramos 
wrote:

> Hi Gabe,
>
> Please note that we are only making changes to packages in *bioc-devel*.
>
> BiocManager wouldn't fail for users with earlier versions of R because
> it doesn't apply to them. These users
> should be using the respective bioc-release versions and consequently
> `BiocInstaller`.
>
> BiocManger is currently supported only for `devel` and all /bioc-devel/
> users have R 3.5.0 or greater going forward.
>
>
> Regards,
> Marcel
>
>
> On 07/16/2018 11:51 AM, Gabe Becker wrote:
> > Marcel et al,
> >
> > My genbankr package is one of the ones that mentions biocLite (in
> > README.md, actually, not the vignette proper, but still...).
> > Historically this was just because I had missed your email and hadn't
> > updated it, but when I sat down to do it I ran into an issue:
> >
> > BiocManager, while a huge step forward, requires R >3.5.0. That is
> > still relatively new, and my package (along with all Bioc packages
> > from the corresponding release) works fine under 3.4.x (and previous).
> > I'm somewhat loath to completely remove the biocLite based
> > instructions because install.packages("BiocManager") will fail (well,
> > not with an error, but it doesn't install anything...) for users with
> > earlier versions of R, wherease
> > source("http://bioconductor.org/biocLite.R;) works and gets the
> > correct version for them IIRC.
> The `biocLite` instructions only apply the current and previous release
> versions of Bioconductor.
> >
> > Is there guidance on how to handle this issue?
> >
> > Thanks,
> > ~G
> >
> > On Sun, Jul 15, 2018 at 10:53 AM, Marcel Ramos
> >  > <mailto:marcel.ramospe...@roswellpark.org>> wrote:
> >
> > Hi Jason,
> >
> > Please check all of your package files and not just the vignette.
> >
> > The criteria involve a simple `grep` search of all package files for
> > the words `biocLite` and `BiocInstaller`.
> >
> > ~/Bioconductor/ClusterSignificance (master) $ grep -rn "biocLite" *
> > README.md:52:source("https://bioconductor.org/biocLite.R
> > <https://bioconductor.org/biocLite.R>")
> > README.md:53:biocLite("ClusterSignificance")
> > README.Rmd:45:source("https://bioconductor.org/biocLite.R
> > <https://bioconductor.org/biocLite.R>")
> > README.Rmd:46:biocLite("ClusterSignificance")
> >
> > As I've mentioned in the previous emails, you can use:
> >
> > install.packages("BiocManager")
> > BiocManager::install("YourPackageNameHere")
> >
> > to replace the source function call.
> >
> > You may also refer to the "Installation" section of the devel
> > landing pages
> > for an additional example:
> >
> > http://bioconductor.org/packages/devel/bioc/html/
> ClusterSignificance.html
> > <http://bioconductor.org/packages/devel/bioc/html/
> ClusterSignificance.html>
> >
> > Best regards,
> > Marcel
> >
> >
> > On 07/14/2018 03:31 AM, Jason Serviss wrote:
> > > Hello Marcel,
> > >
> > > I notice that the package I maintain, ClusterSignificance, is
> > included
> > > in this list although I am unsure why. In your previous mail you
> > say:
> > >
> > >> After the next couple of weeks or so, we will be identifying
> > packages in
> > >> bioc-devel (3.8) that still
> > >> mention BiocInstaller / biocLite.
> > >
> > > I don’t find any mention of BiocInstaller or biocLite in the
> > > ClusterSignificance vignette and it is a bit unclear to me what
> > “make
> > > changes to their ... package code to support the use of
> > `BiocManager`”
> > > specifically entails. Would you mind expanding on what criteria,
> > other
> > > than usage of BiocInstaller or biocLite in the vignette, that might
> > > cause packages to appear in your gist?
> > >
> > > Kind Regards,
> > > Jason Serviss
> > >
> > >
> > >> On 13 Jul 2018, at 23:11, Marcel Ramos
> > >>  > <mailto:marcel.ramospe...@roswellpark.org>
> > >> &

Re: [Bioc-devel] BiocManager now on CRAN

2018-07-16 Thread Gabe Becker
Marcel et al,

My genbankr package is one of the ones that mentions biocLite (in
README.md, actually, not the vignette proper, but still...). Historically
this was just because I had missed your email and hadn't updated it, but
when I sat down to do it I ran into an issue:

BiocManager, while a huge step forward, requires R >3.5.0. That is still
relatively new, and my package (along with all Bioc packages from the
corresponding release) works fine under 3.4.x (and previous). I'm somewhat
loath to completely remove the biocLite based instructions because
install.packages("BiocManager") will fail (well, not with an error, but it
doesn't install anything...) for users with earlier versions of R, wherease
source("http://bioconductor.org/biocLite.R;) works and gets the correct
version for them IIRC.

Is there guidance on how to handle this issue?

Thanks,
~G

On Sun, Jul 15, 2018 at 10:53 AM, Marcel Ramos <
marcel.ramospe...@roswellpark.org> wrote:

> Hi Jason,
>
> Please check all of your package files and not just the vignette.
>
> The criteria involve a simple `grep` search of all package files for
> the words `biocLite` and `BiocInstaller`.
>
> ~/Bioconductor/ClusterSignificance (master) $ grep -rn "biocLite" *
> README.md:52:source("https://bioconductor.org/biocLite.R;)
> README.md:53:biocLite("ClusterSignificance")
> README.Rmd:45:source("https://bioconductor.org/biocLite.R;)
> README.Rmd:46:biocLite("ClusterSignificance")
>
> As I've mentioned in the previous emails, you can use:
>
> install.packages("BiocManager")
> BiocManager::install("YourPackageNameHere")
>
> to replace the source function call.
>
> You may also refer to the "Installation" section of the devel landing pages
> for an additional example:
>
> http://bioconductor.org/packages/devel/bioc/html/ClusterSignificance.html
>
> Best regards,
> Marcel
>
>
> On 07/14/2018 03:31 AM, Jason Serviss wrote:
> > Hello Marcel,
> >
> > I notice that the package I maintain, ClusterSignificance, is included
> > in this list although I am unsure why. In your previous mail you say:
> >
> >> After the next couple of weeks or so, we will be identifying packages in
> >> bioc-devel (3.8) that still
> >> mention BiocInstaller / biocLite.
> >
> > I don’t find any mention of BiocInstaller or biocLite in the
> > ClusterSignificance vignette and it is a bit unclear to me what “make
> > changes to their ... package code to support the use of `BiocManager`”
> > specifically entails. Would you mind expanding on what criteria, other
> > than usage of BiocInstaller or biocLite in the vignette, that might
> > cause packages to appear in your gist?
> >
> > Kind Regards,
> > Jason Serviss
> >
> >
> >> On 13 Jul 2018, at 23:11, Marcel Ramos
> >>  >> > wrote:
> >>
> >>> biocLite
> >
>
>
>
> This email message may contain legally privileged and/or...{{dropped:4}}
>
> ___
> Bioc-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>
>


-- 
Gabriel Becker, Ph.D
Scientist
Bioinformatics and Computational Biology
Genentech Research

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] modify _R_CHECK_FORCE_SUGGESTS_ ?

2018-05-18 Thread Gabe Becker
I guess I was unclear. what I was asking is if it is, e.g., *required* for
your vignette, e.g. to create a data structure that is then used by your
functions.

That is a pretty standard use of Suggests, though obviously not the only
one. My point was really that the package only being suggests is not
sufficient information, by itself, to expect it to pass check with the
suggests check turned off. There are types of valid suggests that would
fail and types that would succeed in that case with the package not being
present.

Best,
~G

On Fri, May 18, 2018 at 2:13 PM, Michael Lawrence <lawrence.mich...@gene.com
> wrote:

> Only if Rsubread is used unconditionally.
>
> On Fri, May 18, 2018 at 2:00 PM, Gabe Becker <becker.g...@gene.com> wrote:
> > Vivek,
> >
> > Why (ie in what sense) is the package suggested? Is it used in your
> tests,
> > examples, or vignette?
> >
> > If so, those would also fail during R CMD check if the package is not
> > available even if that environment variable is were false, wouldn't they?
> >
> > ~G
> >
> > On Fri, May 18, 2018 at 1:55 PM, Bhardwaj, Vivek <
> > bhard...@ie-freiburg.mpg.de> wrote:
> >
> >> Hi All
> >>
> >>
> >> My package is in review and the build is failing since a suggested
> package
> >> (Rsubread) is not available on windows. Is there a way for me to
> instruct
> >> the build machine on bioc to use: _R_CHECK_FORCE_SUGGESTS_ = FALSE ??
> >>
> >>
> >>
> >> Best,
> >>
> >> Vivek
> >>
> >> 
> >>
> >> Vivek Bhardwaj
> >> PhD Candidate | International Max Planck Research School
> >> Max Planck Institute of Immunobiology and Epigenetics
> >> Stübeweg 51, Freiburg
> >>
> >> [[alternative HTML version deleted]]
> >>
> >>
> >> ___
> >> Bioc-devel@r-project.org mailing list
> >> https://stat.ethz.ch/mailman/listinfo/bioc-devel
> >>
> >>
> >
> >
> > --
> > Gabriel Becker, Ph.D
> > Scientist
> > Bioinformatics and Computational Biology
> > Genentech Research
> >
> > [[alternative HTML version deleted]]
> >
> > ___
> > Bioc-devel@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/bioc-devel
> >
>



-- 
Gabriel Becker, Ph.D
Scientist
Bioinformatics and Computational Biology
Genentech Research

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] modify _R_CHECK_FORCE_SUGGESTS_ ?

2018-05-18 Thread Gabe Becker
Vivek,

Why (ie in what sense) is the package suggested? Is it used in your tests,
examples, or vignette?

If so, those would also fail during R CMD check if the package is not
available even if that environment variable is were false, wouldn't they?

~G

On Fri, May 18, 2018 at 1:55 PM, Bhardwaj, Vivek <
bhard...@ie-freiburg.mpg.de> wrote:

> Hi All
>
>
> My package is in review and the build is failing since a suggested package
> (Rsubread) is not available on windows. Is there a way for me to instruct
> the build machine on bioc to use: _R_CHECK_FORCE_SUGGESTS_ = FALSE ??
>
>
>
> Best,
>
> Vivek
>
> 
>
> Vivek Bhardwaj
> PhD Candidate | International Max Planck Research School
> Max Planck Institute of Immunobiology and Epigenetics
> Stübeweg 51, Freiburg
>
> [[alternative HTML version deleted]]
>
>
> ___
> Bioc-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>
>


-- 
Gabriel Becker, Ph.D
Scientist
Bioinformatics and Computational Biology
Genentech Research

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] Missed deadline

2018-04-04 Thread Gabe Becker
Kenneth,

I agree with Kasper. I generally like the approach of getting the software
out there sooner rather than later. Especially if the paper you are talking
about is a method paper about the software algorithm, rather than a result
paper. In that case, getting it into a public, DOI'ed repository quickly
protects you from being scooped (if that is a concern of yours, it's
generally not a super large one of mine).

This will also give the community and package reviewers a chance to give
you feedback, resulting in the paper being written about a better piece of
software when it does happen. Just like manuscripts, no (ok, *vanishingly
few *but almost surely not yours or mine) software is perfect after its
first development pass, so i'd strongly advise you not to think of your
software as 'complete' the moment you hit submit. Consider it an important
part of the development process.

Best,
~G

On Wed, Apr 4, 2018 at 4:34 AM, Kasper Daniel Hansen <
kasperdanielhan...@gmail.com> wrote:

> This is a subjective question. As a paper reviewer I like to see the
> package accepted. That increases trust.  As a package reviewer I like some
> idea of what the package actually does, so a statement like "we implement X
> which is described in (XX, in preparation), is also irritating.
>
> Unless you're trying to not show anything prior to publication (which
> happens) I like submitting the package first.
>
> On Wed, Apr 4, 2018 at 12:31 PM, Kenneth Condon <roonysga...@gmail.com>
> wrote:
>
>> Hi Gabe & Levi,
>>
>> Here is my current plan:
>>
>> 1 - complete the requirements checklist (
>> http://www.bioconductor.org/developers/package-submission/)
>> 2 - get feedback the in-house NGS team, and then from the rest of in-house
>> bioinformatics (others who use R more may spot some issues)
>> 3 - set up pull requests release on github for community testing
>> 4 - advertise github repo on bioconductor and biostars forums
>> 5 - compare to other packages
>> 6 - write paper (decide which journal)
>> 7 - have submission of paper + package ready for October deadline.
>>
>> Regarding the sequence of events - do other authors usually release on
>> bioconductor before submission of a paper or at the same time?
>> What would you recommend?
>>
>> Thanks for the help
>>
>> Kenneth
>>
>> On Tue, Apr 3, 2018 at 4:56 PM, Gabe Becker <becker.g...@gene.com> wrote:
>>
>> > Indeed, and to be a bit more explicit about Levi's point, you *can*
>> > publish your package to bioconductor any time after the deadline, it
>> will
>> > simply go to the development repo for ~6 months, which, as he points
>> out,
>> > may not be a bad thing if it's not ready yet.
>> >
>> > On Tue, Apr 3, 2018 at 8:06 AM, Levi Waldron <
>> lwaldron.resea...@gmail.com>
>> > wrote:
>> >
>> >> On Tue, Apr 3, 2018 at 5:32 AM, Kenneth Condon <roonysga...@gmail.com>
>> >> wrote:
>> >>
>> >> > Have I missed the deadline for the latest release? I have created a
>> >> > package, that runs great but there are a number of errors still from
>> R
>> >> CMD
>> >> > check that I am sorting out.
>> >> >
>> >> > This is my first R package so I'm not sure if development is far
>> enough
>> >> > along, although I suspect it might be.
>> >> >
>> >>
>> >> IMHO, when you're not sure a package is mature enough, and especially
>> for
>> >> a
>> >> first package, it's actually better to miss the release deadline and
>> allow
>> >> bioc-devel users test your package for 6 months before entering the
>> >> release
>> >> cycle. Making significant bug fixes and other changes becomes more
>> >> complicated and more of a pain for you and your users once you are in
>> the
>> >> release...
>> >>
>> >> [[alternative HTML version deleted]]
>> >>
>> >> ___
>> >> Bioc-devel@r-project.org mailing list
>> >> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>> >>
>> >>
>> >
>> >
>> > --
>> > Gabriel Becker, Ph.D
>> > Scientist
>> > Bioinformatics and Computational Biology
>> > Genentech Research
>> >
>>
>> [[alternative HTML version deleted]]
>>
>> ___
>> Bioc-devel@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>
>
>


-- 
Gabriel Becker, Ph.D
Scientist
Bioinformatics and Computational Biology
Genentech Research

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] Missed deadline

2018-04-03 Thread Gabe Becker
Indeed, and to be a bit more explicit about Levi's point, you *can* publish
your package to bioconductor any time after the deadline, it will simply go
to the development repo for ~6 months, which, as he points out, may not be
a bad thing if it's not ready yet.

On Tue, Apr 3, 2018 at 8:06 AM, Levi Waldron 
wrote:

> On Tue, Apr 3, 2018 at 5:32 AM, Kenneth Condon 
> wrote:
>
> > Have I missed the deadline for the latest release? I have created a
> > package, that runs great but there are a number of errors still from R
> CMD
> > check that I am sorting out.
> >
> > This is my first R package so I'm not sure if development is far enough
> > along, although I suspect it might be.
> >
>
> IMHO, when you're not sure a package is mature enough, and especially for a
> first package, it's actually better to miss the release deadline and allow
> bioc-devel users test your package for 6 months before entering the release
> cycle. Making significant bug fixes and other changes becomes more
> complicated and more of a pain for you and your users once you are in the
> release...
>
> [[alternative HTML version deleted]]
>
> ___
> Bioc-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>
>


-- 
Gabriel Becker, Ph.D
Scientist
Bioinformatics and Computational Biology
Genentech Research

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] [R-pkg-devel] DESCRIPTION file building problem

2018-03-21 Thread Gabe Becker
Dario,

Does this ever happen if you build in the parent directory? I don't think
building within the package directory is best practice and I'm not sure
it's (officially) supported at all. The error message seems to support that.

~G

On Wed, Mar 21, 2018 at 9:40 AM, Dario Righelli 
wrote:

> Hi,
>
> I've a package with a DESCRIPTION file inside of it, as usual.
>
> It's a couple of times that, when I try to build the package on my local
> machine or on travis-ci, I get this error:
>
> Building package
> Building with: R CMD build
> 0.74s$ R CMD build .
> * checking for file ‘./DESCRIPTION’ ... OK
> * preparing ‘mypackage’:
> * checking DESCRIPTION meta-information ...OK
> * cleaning src
> Error in .read_description(ldpath) :
> file 'mypackage/DESCRIPTION' does not exist
> Execution halted
>
> Someone knows what does it depends on?
>
> Thanks,
> dario
>
>
> [[alternative HTML version deleted]]
>
> ___
> Bioc-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>
>


-- 
Gabriel Becker, Ph.D
Scientist
Bioinformatics and Computational Biology
Genentech Research

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] IGV VCF demo, other suggestions? [was Re: IGV - a new package in preparation]

2018-03-14 Thread Gabe Becker
Paul,

I don't think these are necessarily in conflict. If myigv represents the
IGV session/state, then add_track(myigv, vcfobj) could call down to
add_track(myigv,VariantTrack(vcf)) so you'd get the default behaviors. you
could also support add_track(myigv, vcf, title = "bla", homVarColor =
"whateverman") which would call down to add_track(myigv, VariantTrack(vcf,
title = "bla", homVarColor = "whateverman"))

This is easy to do (I'm assume the IGVSession class name but replace it
with whatever class add_track is endomorphic in...):

setMethod("add_track", signature = c("IGVSession", "VCF"), function(igv,
track, ...) add_track(igv, VariantTrack(track, ...)))

setMethod("add_track", signature = c("IGVSession", "BAM", function(igv,
track, ...) add_track(igv, AlignmentTrack(track, ...)))

This would, as Michael points out, give you the default values of the
parameter when you just call add_track(myigv, vcfobj)

Does that make sense?

~G


On Wed, Mar 14, 2018 at 12:40 PM, Paul Shannon <
paul.thurmond.shan...@gmail.com> wrote:

> Hi Michael,
>
> Set me straight if I got this wrong.   You suggest:
>
> > There should be no need to explicitly construct a track; just rely on
> dispatch and class semantics, i.e., passing a VCF object to add_track()
> would create a variant track automatically.
>
> But wouldn’t
>
>displayTrack(vcf)
>
> preclude any easy specification of options - which vary across track types
> - which are straightforward, easily managed and checked, by a set of track
> constructors?
>
> Two examples:
>
>displayTrack(VariantTrack(vcf, title=“mef2c eqtl”, height=“300”,
> homrefColor=“lightGray”,
>  homVarColor=“darkRed”,
> hetVarColor=“lightRed”))
>
>displayTrack(AlignmentTrack(x, title=“bam 32”, viewAsPairs=TRUE,
> insertionColor=“black”))
>
>
> So I suggest that the visualization of tracks has lots of
> track-type-specific settings which the user will want to control, and which
> would be messy to handle with an open-ended set of optional “…” args to a
> dispatch-capable single “displayTrack” method.
>
>  - Paul
>



-- 
Gabriel Becker, Ph.D
Scientist
Bioinformatics and Computational Biology
Genentech Research

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] IGV - a new package in preparation

2018-03-07 Thread Gabe Becker
Paul,

Sounds cool! My one note after a quick first pass is that here:

On Wed, Mar 7, 2018 at 2:15 PM, Paul Shannon 
wrote:
>
> Note that though igv.js typically gets its track data from CORS/indexed
> webservers, the IGV package will also support locally created R data.frames
> describing either bed or wig tracks - annotation and quantitative,
> respectively - without any need to host those tracks on a pre-existing
> webserver.  httpuv includes a minimal webserver which can adequately serve
> the temporary files IGV creates from your data.frames.
>

It seems to me that those data.frames should be replaced with the core
Bioconductor object classes which represent the types of information being
displayed.  You might look to epivizr for  inspiration here, which (IIRC)
allows "tracks" within epiviz to be backed by bioconductor objects.

Best,
~G

>
>
>  - Paul
> ___
> Bioc-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>



-- 
Gabriel Becker, Ph.D
Scientist
Bioinformatics and Computational Biology
Genentech Research

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] as.list fails on IRanges inside of lapply(, blah)

2018-02-20 Thread Gabe Becker
Herve,

Thanks for the response. The looping across a ranges that's still in tehre
is:

  dss = switch(seqtype,
 bp = DNAStringSet(*lapply(ranges(srcs)*, function(x)
origin[x])),
 aa = AAStringSet(*lapply(ranges(srcs),* function(x)
origin[x])),
 stop("Unrecognized origin sequence type: ", seqtype)
 )

(Line 495 in genbankReader.R)

srcs is a GRanges, making ranges(srcs) an IRanges, so this lapply fails.
I'm not sure what I'm meant to do here as there's not an already vectorized
version that I know of that does the rigth thing (I want separate
DNAStrings for each range, so origin[ranges(srcs)] doesn't work).

I mean I can force the conversion to list issue with lapply(1:length(srcs),
function(i) ranges(srcs)[i]) or similar but that seems pretty ugly...

As for the other issue with the build not working in release, that is a bug
in the rentrez (which is on CRAN, not Bioc). I've submitted a PR to fix
that, and we'll see what the response is as to whether I need to remove
that integration or not.

~G






On Tue, Feb 20, 2018 at 10:48 AM, Hervé Pagès <hpa...@fredhutch.org> wrote:

> Hi Gabe,
>
> I made a couple of changes to genbankr (1.7.2) to avoid those looping
> e.g. I replaced things like
>
> sapply(gr, width)
>
> with
>
> width(gr)
>
> I can't run a full 'R CMD build' + 'R CMD check' on the package though
> because the code in the vignette seems to fail for reasons unrelated
> to the recent changes to IRanges / GenomicRanges (I get the same error
> with the release version, see release build report).
>
> The previous behavior of as.list() on IRanges ans GRanges objects will
> be restored (with a deprecation warning) once all the packages that
> need a fix get one (only 7 packages left on my list). I should be done
> with them in the next couple of days.
>
> H.
>
>
> On 02/20/2018 09:41 AM, Gabe Becker wrote:
>
>> All,
>>
>> I'm trying to track down the new failure in my genbankr package and it
>> appears to come down to the fact  that i'm trying to lapply over an
>> IRanges, which fails in the IRanges to list (or List?) conversion. The
>> particular case that fails in my example is an IRanges of length 1 but
>> that
>> does not appear to matter, as lapply fails over IRanges of length >1 as
>> well.
>>
>> Is this intentional? If so, it seems a change of this magnitude would
>> warrant a deprecation cycle at least. If not, please let me know so I can
>> leave the code as is and wait for the fix.
>>
>> rng1 = IRanges(start = 1, end = 5)
>>>
>>
>> rng2 = IRanges(start = c(1, 7), end = c(3, 10))
>>>
>>
>> rng1
>>>
>>
>> IRanges object with 1 range and 0 metadata columns:
>>
>>start   end width
>>
>>  
>>
>>[1] 1 5 5
>>
>> rng2
>>>
>>
>> IRanges object with 2 ranges and 0 metadata columns:
>>
>>start   end width
>>
>>  
>>
>>[1] 1 3 3
>>
>>[2] 710 4
>>
>> lapply(rng1, identity)
>>>
>>
>> *Error in (function (classes, fdef, mtable)  : *
>>
>> *  unable to find an inherited method for function ‘getListElement’ for
>> signature ‘"IRanges"’*
>>
>> lapply(rng2, identity)
>>>
>>
>> *Error in (function (classes, fdef, mtable)  : *
>>
>> *  unable to find an inherited method for function ‘getListElement’ for
>> signature ‘"IRanges"’*
>>
>> sessionInfo()
>>>
>>
>> R Under development (unstable) (2018-02-16 r74263)
>>
>> Platform: x86_64-apple-darwin15.6.0 (64-bit)
>>
>> Running under: OS X El Capitan 10.11.6
>>
>>
>> Matrix products: default
>>
>> BLAS:
>> /Users/beckerg4/local/Rdevel/R.framework/Versions/3.5/Resour
>> ces/lib/libRblas.dylib
>>
>> LAPACK:
>> /Users/beckerg4/local/Rdevel/R.framework/Versions/3.5/Resour
>> ces/lib/libRlapack.dylib
>>
>>
>> locale:
>>
>> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
>>
>>
>> attached base packages:
>>
>> [1] stats4parallel  stats graphics  grDevices utils datasets
>>
>> [8] methods   base
>>
>>
>> other attached packages:
>>
>> *[1] IRanges_2.13.26 S4Vectors_0.17.33   BiocGenerics_0.25.3*
>>
>>
>> loaded via a namespace (a
>> <https://maps.google.com/?q=d+via+a+namespace+(a=gmail=g>nd
>> not attached):
>>
>> [1] compiler_3.5.0 tools_3.5.0
>>
>>
>>
>> Best,
>> ~G
>>
>>
>>
>>
> --
> Hervé Pagès
>
> Program in Computational Biology
> Division of Public Health Sciences
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N, M1-B514
> P.O. Box 19024
> Seattle, WA 98109-1024
>
> E-mail: hpa...@fredhutch.org
> Phone:  (206) 667-5791
> Fax:(206) 667-1319
>
>


-- 
Gabriel Becker, Ph.D
Scientist
Bioinformatics and Computational Biology
Genentech Research

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


[Bioc-devel] as.list fails on IRanges inside of lapply(, blah)

2018-02-20 Thread Gabe Becker
All,

I'm trying to track down the new failure in my genbankr package and it
appears to come down to the fact  that i'm trying to lapply over an
IRanges, which fails in the IRanges to list (or List?) conversion. The
particular case that fails in my example is an IRanges of length 1 but that
does not appear to matter, as lapply fails over IRanges of length >1 as
well.

Is this intentional? If so, it seems a change of this magnitude would
warrant a deprecation cycle at least. If not, please let me know so I can
leave the code as is and wait for the fix.

> rng1 = IRanges(start = 1, end = 5)

> rng2 = IRanges(start = c(1, 7), end = c(3, 10))

> rng1

IRanges object with 1 range and 0 metadata columns:

  start   end width



  [1] 1 5 5

> rng2

IRanges object with 2 ranges and 0 metadata columns:

  start   end width



  [1] 1 3 3

  [2] 710 4

> lapply(rng1, identity)

*Error in (function (classes, fdef, mtable)  : *

*  unable to find an inherited method for function ‘getListElement’ for
signature ‘"IRanges"’*

> lapply(rng2, identity)

*Error in (function (classes, fdef, mtable)  : *

*  unable to find an inherited method for function ‘getListElement’ for
signature ‘"IRanges"’*

> sessionInfo()

R Under development (unstable) (2018-02-16 r74263)

Platform: x86_64-apple-darwin15.6.0 (64-bit)

Running under: OS X El Capitan 10.11.6


Matrix products: default

BLAS:
/Users/beckerg4/local/Rdevel/R.framework/Versions/3.5/Resources/lib/libRblas.dylib

LAPACK:
/Users/beckerg4/local/Rdevel/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib


locale:

[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8


attached base packages:

[1] stats4parallel  stats graphics  grDevices utils datasets

[8] methods   base


other attached packages:

*[1] IRanges_2.13.26 S4Vectors_0.17.33   BiocGenerics_0.25.3*


loaded via a namespace (and not attached):

[1] compiler_3.5.0 tools_3.5.0



Best,
~G



-- 
Gabriel Becker, Ph.D
Scientist
Bioinformatics and Computational Biology
Genentech Research

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] BiocParallel and AnnotationDbi: database disk image is malformed

2018-01-19 Thread Gabe Becker
IT seems like you could also force a copy of the reference object via
$copy() and then force a refresh of the conn slot by assigning a
new db connection into it.

I'm having trouble confirming that this would work, however, because I
actually can't reproduce the error. The naive way works for me on my mac
laptop (which is running an old R and Bioconductor) and on the linux
cluster I have access to (running Bioc 3.6):


(cluster)

> getSymbol <- function ( x ) {

+ return( AnnotationDbi::mget( x , hgu95av2SYMBOL ) )

+ }

>

> x <- list( "36090_at" , "38785_at" )

>

> mclapply( x , getSymbol )

[[1]]

[[1]]$`36090_at`

[1] "TBL2"



[[2]]

[[2]]$`38785_at`

[1] "MUC1"



>

> sessionInfo()

R version 3.4.3 (2017-11-30)

Platform: x86_64-pc-linux-gnu (64-bit)

Running under: Red Hat Enterprise Linux Server release 6.6 (Santiago)


Matrix products: default

BLAS:
/gnet/is2/p01/apps/R/3.4.3-20171201-current/x86_64-linux-2.6-rhel6/lib64/R/lib/libRblas.so

LAPACK:
/gnet/is2/p01/apps/R/3.4.3-20171201-current/x86_64-linux-2.6-rhel6/lib64/R/lib/libRlapack.so


locale:

 [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C

 [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8

 [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8

 [7] LC_PAPER=en_US.UTF-8   LC_NAME=C

 [9] LC_ADDRESS=C   LC_TELEPHONE=C

[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C


attached base packages:

[1] stats4parallel  stats graphics  grDevices utils datasets

[8] methods   base


other attached packages:

[1] hgu95av2.db_3.2.3org.Hs.eg.db_3.5.0   AnnotationDbi_1.40.0

[4] IRanges_2.12.0   S4Vectors_0.16.0 Biobase_2.38.0

[7] BiocGenerics_0.24.0


loaded via a namespace (and not attached):

 [1] Rcpp_0.12.14digest_0.6.14   DBI_0.7 RSQLite_2.0

 [5] pillar_1.1.0rlang_0.1.6 blob_1.1.0  bit64_0.9-8

 [9] bit_1.1-13  compiler_3.4.3  pkgconfig_2.0.1 memoise_1.1.0

[13] tibble_1.4.1

>


~G

On Fri, Jan 19, 2018 at 9:23 AM, Vincent Carey 
wrote:

> good question
>
> some of the discussion on
>
> http://sqlite.1065341.n5.nabble.com/Parallel-access-to-
> read-only-in-memory-database-td91814.html
>
> seems relevant.
>
> converting the relatively small annotation package content to pure R
> read-only tables on the master before parallelizing
> might be very simple?
>
> On Fri, Jan 19, 2018 at 11:43 AM, Ludwig Geistlinger <
> ludwig.geistlin...@sph.cuny.edu> wrote:
>
> > Hi,
> >
> > Within a package I am developing, I would like to enable parallel probe
> to
> > gene mapping for a compendium of microarray datasets.
> >
> > This accordingly makes use of annotation packages such as hgu133a.db,
> > which in turn connect to the SQLite database via AnnotationDbi.
> >
> > When running in multi-core mode (i.e. using a MulticoreParam with
> > BiocParallel) using more than 2 cores, this causes the error:
> >
> > database disk image is malformed
> >
> >
> > In a very similar problem:
> >
> > https://support.bioconductor.org/p/38541/
> >
> > Adi Tarca and Dan Tenenbaum identified and resolved this problem by
> > ensuring that each process has its own unique database connection, i.e.
> > AnnotationDbi is not loaded before sending the job to the workers.
> >
> > This solution was easily realized as this analysis was carried out within
> > a script and not a package.
> >
> > However, within my package, AnnotationDbi is loaded as a dependency of my
> > package's imports.
> >
> > How to resolve this here?
> > I am not sure whether I perfectly understand the underlying mechanisms,
> > but is there a way to make my workers load their own version of
> > AnnotationDbi instead of using the one of the parent process?
> > Or am I supposed to unload all packages depending on AnnotationDbi, and
> > AnnotationDbi itself, before sending the job to the workers (and reload
> all
> > of them after the job has finished?)
> >
> > Thanks a lot,
> > Ludwig
> >
> >
> >
> > --
> > Dr. Ludwig Geistlinger
> > CUNY School of Public Health
> >
> > [[alternative HTML version deleted]]
> >
> > ___
> > Bioc-devel@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/bioc-devel
> >
>
> [[alternative HTML version deleted]]
>
> ___
> Bioc-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>
>


-- 
Gabriel Becker, Ph.D
Scientist
Bioinformatics and Computational Biology
Genentech Research

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] Confusion with how to maintain release/devel files on local computer.

2017-11-01 Thread Gabe Becker
Arman,

Not on the Bioc team per se, but I would say only have a checkout of the
release branch when you need it, ie a bug is reported, you have fixed it in
devel, and you are ready to push the very narrow bugfix to release. I only
keep "master" checkouts of my packages on a permanent basis.

You generally shouldn't need a checkout of release, imho, because no
development should be happening there with exception of the case above.

Hope that helps,
~G

On Wed, Nov 1, 2017 at 1:36 PM, Arman Shahrisa 
wrote:

> I’m confused with development process.
>
> At first, I need to have a folder with accepted packaged. Then I need to
> pull
> origion RELEASE_3_6?
>
> Then in another folder, I need to pull origion master?
>
> So that by opening each folder, I know what I’m editing.
> Also during push, I need to be careful about where I’m pushing changes.
> Origion is bioc’s git address of my package whereas master is the package
> directory in GitHub?
>
> Am I getting it correct?
> Is there anywhere that contains whole the process and codes in steps?
>
> Best regards,
> Arman
>
>
>
> [[alternative HTML version deleted]]
>
>
> ___
> Bioc-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>



-- 
Gabriel Becker, Ph.D
Scientist
Bioinformatics and Computational Biology
Genentech Research

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Re: [Bioc-devel] Old package versions / Bioc archive of package's *.tar.gz releases?

2017-10-05 Thread Gabe Becker
On Thu, Oct 5, 2017 at 10:03 PM, Gabe Becker <becke...@gene.com> wrote:

> Oh, and the svn/git walking knows about release branches so it will find
> hot-patch versions correctly, as well.
>

In the Bioc case, I mean, not generally.

And with that flurry of emails I'm off to bed.

Best,
~G

>
> ~G
>
> On Thu, Oct 5, 2017 at 10:01 PM, Gabe Becker <becke...@gene.com> wrote:
>
>> Correct. The actual order of checks is:
>>
>>1. did switchr already retrieve that exact version for something else
>>and keep it around,
>>2. A GRANRepository object if one is specified (don't worry much
>>about this one)
>>3. the manifest itself (cran and bioc source types are ignored, but
>>it would walk SCM if it had git/svn type manifest entry for the package)
>>4. cran repo and cran archives,
>>5. bioc repositories (all of them in descending order),
>>6. bioc git (bioc SVN for the version on CRAN, which appeared to
>>still work last time I checked it a week or two ago)
>>
>> See the code in https://github.com/gmbecker/switchr/blob/master/R/
>> retrievePkgVersion.R for details.
>>
>> Best,
>> ~G
>>
>>
>>
>>
>> On Oct 5, 2017 9:06 PM, "Henrik Bengtsson" <henrik.bengts...@gmail.com>
>> wrote:
>>
>> That's really nice; I didn't know it could do all that.  For my
>> clarification, when using PkgManifest(..., type = "bioc") it'll search
>> (i) the CRAN archives, (ii) the Bioconductor repos(es), and then (iii)
>> the Bioconductor Git repos - is that a correct observation? (I
>> installed from https://github.com/gmbecker/switchr)
>>
>> /Henrik
>>
>>
>>
>>
>> On Thu, Oct 5, 2017 at 4:15 PM, Gabe Becker <becker.g...@gene.com> wrote:
>> > In point of fact, it looks like IRanges 2.6.0 is an instance of that
>> > weakness, so was probably a bad example. 2.6.1 installs correctly, or
>> would
>> > in it's native R/base bioc environment... (it fails for me in the
>> library I
>> > have...)
>> >
>> > Also, the version on CRAN uses the bioc SVN, so may not work for recent
>> > versions.
>> >
>> > On Thu, Oct 5, 2017 at 3:58 PM, Gabe Becker <becke...@gene.com> wrote:
>> >>
>> >> Henrik et al.,
>> >>
>> >> My switchr package (on CRAN, github at:
>> >> http://github.com/gmbecker/switchr,  preprint of the paper here:
>> >> https://arxiv.org/abs/1501.02284) can do this.
>> >>
>> >> In fact, installing (cohorts of) old versions of packages is one of
>> it's
>> >> primary purposes. Specifically, it can install old source versions of
>> >> packages from CRAN, Bioconductor, and general Git and SVN repos you
>> tell it
>> >> about.
>> >>
>> >> With the caveat that it's a bad idea in the general case to specify an
>> old
>> >> version of one package without specifying versions of its dependencies
>> >> (switchr allows you to do this via a manifest, which can be
>> constructed from
>> >> sessionInfo output or guessed in the case of a CRAN package), you can
>> just
>> >> do
>> >>
>> >> > man = PkgManifest(name="IRanges", type="bioc")
>> >>
>> >> > install_packages("IRanges", man, versions = c(IRanges = "2.6.0"))
>> >>
>> >>
>> >> And you will successfully completely break your Bioc installation by
>> >> installing IRanges 2.6.0 into it. ;-)
>> >>
>> >> Switchr also gives you tools to more easilly maintain multiple
>> libraries
>> >> which contain, for example, different bioc versions in them.
>> >>
>> >> NB: switchr is subject to the caveat Martin pointed out and will fail
>> to
>> >> retrieve a buildable version of the package if said buildable version
>> is not
>> >> the first commit in SCM bearing that version in its DESCRIPTION file.
>> >>
>> >> Hope that helps.
>> >>
>> >> Best,
>> >> ~G
>> >>
>> >> On Thu, Oct 5, 2017 at 2:21 PM, Martin Morgan
>> >> <martin.mor...@roswellpark.org> wrote:
>> >>>
>> >>> On 10/05/2017 05:14 PM, Henrik Bengtsson wrote:
>> >>>>
>> >>>> On Thu, Oct 5, 2017 at 1:46 PM, Martin Morgan
>> >>>> <martin.mor...@roswellpark.org> wrote:
>> >>

Re: [Bioc-devel] Old package versions / Bioc archive of package's *.tar.gz releases?

2017-10-05 Thread Gabe Becker
Oh, and the svn/git walking knows about release branches so it will find
hot-patch versions correctly, as well.

~G

On Thu, Oct 5, 2017 at 10:01 PM, Gabe Becker <becke...@gene.com> wrote:

> Correct. The actual order of checks is:
>
>1. did switchr already retrieve that exact version for something else
>and keep it around,
>2. A GRANRepository object if one is specified (don't worry much about
>this one)
>3. the manifest itself (cran and bioc source types are ignored, but it
>would walk SCM if it had git/svn type manifest entry for the package)
>4. cran repo and cran archives,
>5. bioc repositories (all of them in descending order),
>6. bioc git (bioc SVN for the version on CRAN, which appeared to still
>work last time I checked it a week or two ago)
>
> See the code in https://github.com/gmbecker/switchr/blob/master/
> R/retrievePkgVersion.R for details.
>
> Best,
> ~G
>
>
>
>
> On Oct 5, 2017 9:06 PM, "Henrik Bengtsson" <henrik.bengts...@gmail.com>
> wrote:
>
> That's really nice; I didn't know it could do all that.  For my
> clarification, when using PkgManifest(..., type = "bioc") it'll search
> (i) the CRAN archives, (ii) the Bioconductor repos(es), and then (iii)
> the Bioconductor Git repos - is that a correct observation? (I
> installed from https://github.com/gmbecker/switchr)
>
> /Henrik
>
>
>
>
> On Thu, Oct 5, 2017 at 4:15 PM, Gabe Becker <becker.g...@gene.com> wrote:
> > In point of fact, it looks like IRanges 2.6.0 is an instance of that
> > weakness, so was probably a bad example. 2.6.1 installs correctly, or
> would
> > in it's native R/base bioc environment... (it fails for me in the
> library I
> > have...)
> >
> > Also, the version on CRAN uses the bioc SVN, so may not work for recent
> > versions.
> >
> > On Thu, Oct 5, 2017 at 3:58 PM, Gabe Becker <becke...@gene.com> wrote:
> >>
> >> Henrik et al.,
> >>
> >> My switchr package (on CRAN, github at:
> >> http://github.com/gmbecker/switchr,  preprint of the paper here:
> >> https://arxiv.org/abs/1501.02284) can do this.
> >>
> >> In fact, installing (cohorts of) old versions of packages is one of it's
> >> primary purposes. Specifically, it can install old source versions of
> >> packages from CRAN, Bioconductor, and general Git and SVN repos you
> tell it
> >> about.
> >>
> >> With the caveat that it's a bad idea in the general case to specify an
> old
> >> version of one package without specifying versions of its dependencies
> >> (switchr allows you to do this via a manifest, which can be constructed
> from
> >> sessionInfo output or guessed in the case of a CRAN package), you can
> just
> >> do
> >>
> >> > man = PkgManifest(name="IRanges", type="bioc")
> >>
> >> > install_packages("IRanges", man, versions = c(IRanges = "2.6.0"))
> >>
> >>
> >> And you will successfully completely break your Bioc installation by
> >> installing IRanges 2.6.0 into it. ;-)
> >>
> >> Switchr also gives you tools to more easilly maintain multiple libraries
> >> which contain, for example, different bioc versions in them.
> >>
> >> NB: switchr is subject to the caveat Martin pointed out and will fail to
> >> retrieve a buildable version of the package if said buildable version
> is not
> >> the first commit in SCM bearing that version in its DESCRIPTION file.
> >>
> >> Hope that helps.
> >>
> >> Best,
> >> ~G
> >>
> >> On Thu, Oct 5, 2017 at 2:21 PM, Martin Morgan
> >> <martin.mor...@roswellpark.org> wrote:
> >>>
> >>> On 10/05/2017 05:14 PM, Henrik Bengtsson wrote:
> >>>>
> >>>> On Thu, Oct 5, 2017 at 1:46 PM, Martin Morgan
> >>>> <martin.mor...@roswellpark.org> wrote:
> >>>>>
> >>>>> On 10/05/2017 01:50 PM, Henrik Bengtsson wrote:
> >>>>>>
> >>>>>>
> >>>>>> Is there an easily accessible archive for Bioconductor packages
> >>>>>> similar to what is provided on CRAN where you can find all released
> >>>>>> versions of a package, e.g.
> >>>>>> https://cran.r-project.org/src/contrib/Archive/PSCBS/?
> >>>>>>
> >>>>>> Say I want to access the source code for affy 1.18.0.  Here are the
> >>>>>&g

Re: [Bioc-devel] Old package versions / Bioc archive of package's *.tar.gz releases?

2017-10-05 Thread Gabe Becker
Correct. The actual order of checks is:

   1. did switchr already retrieve that exact version for something else
   and keep it around,
   2. A GRANRepository object if one is specified (don't worry much about
   this one)
   3. the manifest itself (cran and bioc source types are ignored, but it
   would walk SCM if it had git/svn type manifest entry for the package)
   4. cran repo and cran archives,
   5. bioc repositories (all of them in descending order),
   6. bioc git (bioc SVN for the version on CRAN, which appeared to still
   work last time I checked it a week or two ago)

See the code in
https://github.com/gmbecker/switchr/blob/master/R/retrievePkgVersion.R for
details.

Best,
~G




On Oct 5, 2017 9:06 PM, "Henrik Bengtsson" <henrik.bengts...@gmail.com>
wrote:

That's really nice; I didn't know it could do all that.  For my
clarification, when using PkgManifest(..., type = "bioc") it'll search
(i) the CRAN archives, (ii) the Bioconductor repos(es), and then (iii)
the Bioconductor Git repos - is that a correct observation? (I
installed from https://github.com/gmbecker/switchr)

/Henrik




On Thu, Oct 5, 2017 at 4:15 PM, Gabe Becker <becker.g...@gene.com> wrote:
> In point of fact, it looks like IRanges 2.6.0 is an instance of that
> weakness, so was probably a bad example. 2.6.1 installs correctly, or
would
> in it's native R/base bioc environment... (it fails for me in the library
I
> have...)
>
> Also, the version on CRAN uses the bioc SVN, so may not work for recent
> versions.
>
> On Thu, Oct 5, 2017 at 3:58 PM, Gabe Becker <becke...@gene.com> wrote:
>>
>> Henrik et al.,
>>
>> My switchr package (on CRAN, github at:
>> http://github.com/gmbecker/switchr,  preprint of the paper here:
>> https://arxiv.org/abs/1501.02284) can do this.
>>
>> In fact, installing (cohorts of) old versions of packages is one of it's
>> primary purposes. Specifically, it can install old source versions of
>> packages from CRAN, Bioconductor, and general Git and SVN repos you tell
it
>> about.
>>
>> With the caveat that it's a bad idea in the general case to specify an
old
>> version of one package without specifying versions of its dependencies
>> (switchr allows you to do this via a manifest, which can be constructed
from
>> sessionInfo output or guessed in the case of a CRAN package), you can
just
>> do
>>
>> > man = PkgManifest(name="IRanges", type="bioc")
>>
>> > install_packages("IRanges", man, versions = c(IRanges = "2.6.0"))
>>
>>
>> And you will successfully completely break your Bioc installation by
>> installing IRanges 2.6.0 into it. ;-)
>>
>> Switchr also gives you tools to more easilly maintain multiple libraries
>> which contain, for example, different bioc versions in them.
>>
>> NB: switchr is subject to the caveat Martin pointed out and will fail to
>> retrieve a buildable version of the package if said buildable version is
not
>> the first commit in SCM bearing that version in its DESCRIPTION file.
>>
>> Hope that helps.
>>
>> Best,
>> ~G
>>
>> On Thu, Oct 5, 2017 at 2:21 PM, Martin Morgan
>> <martin.mor...@roswellpark.org> wrote:
>>>
>>> On 10/05/2017 05:14 PM, Henrik Bengtsson wrote:
>>>>
>>>> On Thu, Oct 5, 2017 at 1:46 PM, Martin Morgan
>>>> <martin.mor...@roswellpark.org> wrote:
>>>>>
>>>>> On 10/05/2017 01:50 PM, Henrik Bengtsson wrote:
>>>>>>
>>>>>>
>>>>>> Is there an easily accessible archive for Bioconductor packages
>>>>>> similar to what is provided on CRAN where you can find all released
>>>>>> versions of a package, e.g.
>>>>>> https://cran.r-project.org/src/contrib/Archive/PSCBS/?
>>>>>>
>>>>>> Say I want to access the source code for affy 1.18.0.  Here are the
>>>>>> two approaches I'm aware of and none of them are particularly
>>>>>> appealing to me.  Does anyone know of a better approach?
>>>>>
>>>>>
>>>>>
>>>>> The only option is to scrape, and that's approximate. One could build
>>>>> an
>>>>> archive
>>>>>
>>>>> pkg,version,branch,from_svn_rev,to_svn_rev
>>>>>
>>>>> and then consult that. Packages are supposed to increment the 'z' of
>>>>> x.y.z,
>>>>> but I'm sure there are many exceptions. I believe Jim Hester has an
svn
>>>>> script for this, but I wasn't able 

Re: [Bioc-devel] Old package versions / Bioc archive of package's *.tar.gz releases?

2017-10-05 Thread Gabe Becker
In point of fact, it looks like IRanges 2.6.0 is an instance of that
weakness, so was probably a bad example. 2.6.1 installs correctly, or would
in it's native R/base bioc environment... (it fails for me in the library I
have...)

Also, the version on CRAN uses the bioc SVN, so may not work for recent
versions.

On Thu, Oct 5, 2017 at 3:58 PM, Gabe Becker <becke...@gene.com> wrote:

> Henrik et al.,
>
> My switchr package (on CRAN, github at: http://github.com/gmbecker/switchr,
> preprint of the paper here: https://arxiv.org/abs/1501.02284
> <https://arxiv.org/abs/1501.02284>) can do this.
>
> In fact, installing (cohorts of) old versions of packages is one of it's
> primary purposes. Specifically, it can install old source versions of
> packages from CRAN, Bioconductor, and general Git and SVN repos you tell it
> about.
>
> With the caveat that it's a bad idea in the general case to specify an old
> version of one package without specifying versions of its dependencies
> (switchr allows you to do this via a manifest, which can be constructed
> from sessionInfo output or guessed in the case of a CRAN package), you can
> just do
>
> > man = PkgManifest(name="IRanges", type="bioc")
>
> > install_packages("IRanges", man, versions = c(IRanges = "2.6.0"))
>
>
> And you will successfully completely break your Bioc installation by
> installing IRanges 2.6.0 into it. ;-)
>
> Switchr also gives you tools to more easilly maintain multiple libraries
> which contain, for example, different bioc versions in them.
>
> NB: switchr is subject to the caveat Martin pointed out and will fail to
> retrieve a buildable version of the package if said buildable version is
> not the first commit in SCM bearing that version in its DESCRIPTION file.
>
> Hope that helps.
>
> Best,
> ~G
>
> On Thu, Oct 5, 2017 at 2:21 PM, Martin Morgan <
> martin.mor...@roswellpark.org> wrote:
>
>> On 10/05/2017 05:14 PM, Henrik Bengtsson wrote:
>>
>>> On Thu, Oct 5, 2017 at 1:46 PM, Martin Morgan
>>> <martin.mor...@roswellpark.org> wrote:
>>>
>>>> On 10/05/2017 01:50 PM, Henrik Bengtsson wrote:
>>>>
>>>>>
>>>>> Is there an easily accessible archive for Bioconductor packages
>>>>> similar to what is provided on CRAN where you can find all released
>>>>> versions of a package, e.g.
>>>>> https://cran.r-project.org/src/contrib/Archive/PSCBS/?
>>>>>
>>>>> Say I want to access the source code for affy 1.18.0.  Here are the
>>>>> two approaches I'm aware of and none of them are particularly
>>>>> appealing to me.  Does anyone know of a better approach?
>>>>>
>>>>
>>>>
>>>> The only option is to scrape, and that's approximate. One could build an
>>>> archive
>>>>
>>>> pkg,version,branch,from_svn_rev,to_svn_rev
>>>>
>>>> and then consult that. Packages are supposed to increment the 'z' of
>>>> x.y.z,
>>>> but I'm sure there are many exceptions. I believe Jim Hester has an svn
>>>> script for this, but I wasn't able to locate it; it would be fast in
>>>> git.
>>>>
>>>
>>> Thanks.  About 'z' not being increased.  Does the Bioc build servers
>>> release (a) continuously or (b) only when it detects a version change
>>> x.y.z -> x.y.z+1?  If it does it continuously, then what x.y.z is
>>> installed does matter on when it was downloaded/installed, correct?
>>> On the other hand, if it only builds in when a version bump is
>>> detected, then one can at least narrow it down to a much narrow set of
>>> x.y.z submits (if multiple exists).
>>>
>>
>> The builder only pushes for upward increments, so a commit without a
>> 'positive' version bump would be built but not pushed to the public
>> repository. I'm not sure how rigorously this policy was enforced before,
>> e.g., 2005.
>>
>> Of course there are exceptions, e.g., it is occasionally (at most one or
>> two times a release cycle) necessary to flush the public repository
>> entirely, and then whatever is built is pushed. And there is nothing
>> stopping the user from doing a check-out from svn. Perhaps others will
>> chime in with the gory / correct details.
>>
>> Martin
>>
>>
>>
>>>
>>>> For your future self, this
>>>>
>>>>https://bioconductor.org/packages/3.5/bioc/src/contrib/Arch
>>>> ive/S4Vectors/
>>>>
>&

Re: [Bioc-devel] Old package versions / Bioc archive of package's *.tar.gz releases?

2017-10-05 Thread Gabe Becker
Henrik et al.,

My switchr package (on CRAN, github at: http://github.com/gmbecker/switchr,
preprint of the paper here: https://arxiv.org/abs/1501.02284
) can do this.

In fact, installing (cohorts of) old versions of packages is one of it's
primary purposes. Specifically, it can install old source versions of
packages from CRAN, Bioconductor, and general Git and SVN repos you tell it
about.

With the caveat that it's a bad idea in the general case to specify an old
version of one package without specifying versions of its dependencies
(switchr allows you to do this via a manifest, which can be constructed
from sessionInfo output or guessed in the case of a CRAN package), you can
just do

> man = PkgManifest(name="IRanges", type="bioc")

> install_packages("IRanges", man, versions = c(IRanges = "2.6.0"))


And you will successfully completely break your Bioc installation by
installing IRanges 2.6.0 into it. ;-)

Switchr also gives you tools to more easilly maintain multiple libraries
which contain, for example, different bioc versions in them.

NB: switchr is subject to the caveat Martin pointed out and will fail to
retrieve a buildable version of the package if said buildable version is
not the first commit in SCM bearing that version in its DESCRIPTION file.

Hope that helps.

Best,
~G

On Thu, Oct 5, 2017 at 2:21 PM, Martin Morgan  wrote:

> On 10/05/2017 05:14 PM, Henrik Bengtsson wrote:
>
>> On Thu, Oct 5, 2017 at 1:46 PM, Martin Morgan
>>  wrote:
>>
>>> On 10/05/2017 01:50 PM, Henrik Bengtsson wrote:
>>>

 Is there an easily accessible archive for Bioconductor packages
 similar to what is provided on CRAN where you can find all released
 versions of a package, e.g.
 https://cran.r-project.org/src/contrib/Archive/PSCBS/?

 Say I want to access the source code for affy 1.18.0.  Here are the
 two approaches I'm aware of and none of them are particularly
 appealing to me.  Does anyone know of a better approach?

>>>
>>>
>>> The only option is to scrape, and that's approximate. One could build an
>>> archive
>>>
>>> pkg,version,branch,from_svn_rev,to_svn_rev
>>>
>>> and then consult that. Packages are supposed to increment the 'z' of
>>> x.y.z,
>>> but I'm sure there are many exceptions. I believe Jim Hester has an svn
>>> script for this, but I wasn't able to locate it; it would be fast in git.
>>>
>>
>> Thanks.  About 'z' not being increased.  Does the Bioc build servers
>> release (a) continuously or (b) only when it detects a version change
>> x.y.z -> x.y.z+1?  If it does it continuously, then what x.y.z is
>> installed does matter on when it was downloaded/installed, correct?
>> On the other hand, if it only builds in when a version bump is
>> detected, then one can at least narrow it down to a much narrow set of
>> x.y.z submits (if multiple exists).
>>
>
> The builder only pushes for upward increments, so a commit without a
> 'positive' version bump would be built but not pushed to the public
> repository. I'm not sure how rigorously this policy was enforced before,
> e.g., 2005.
>
> Of course there are exceptions, e.g., it is occasionally (at most one or
> two times a release cycle) necessary to flush the public repository
> entirely, and then whatever is built is pushed. And there is nothing
> stopping the user from doing a check-out from svn. Perhaps others will
> chime in with the gory / correct details.
>
> Martin
>
>
>
>>
>>> For your future self, this
>>>
>>>https://bioconductor.org/packages/3.5/bioc/src/contrib/Arch
>>> ive/S4Vectors/
>>>
>>> provides a hint of a change coming with the next release -- archives of
>>> all
>>> RELEASE package versions, starting in Bioc 3.6. (Kudos to Val for
>>> implementing this)
>>>
>>
>> This is great!  Thanks Val for this.
>>
>> Thanks
>>
>> Henrik
>>
>>
>>> Martin
>>>
>>>

 # APPROACH 1: Download from http://bioconductor.org

 The best approach I know now is to try to guess the date when this was
 released in order to identify the Bioconductor release version.
 Something like this:

 1. Guess around 2010.

 2. Go to http://bioconductor.org/about/release-announcements/ and see
 what R versions were in use during 2010.  I find R 2.6.x and R 2.7.x.
 The Bioc version for those R versions (same URL) are Bioc 2.1 and Bioc
 2.2.  Let's focus on Bioc 2.2 (because I happen to know that is the
 one)

 3. Following the Bioc 2.2 link on above URL to get to
 http://bioconductor.org/packages/2.2/BiocViews.html.

 4. Click through, one eventually gets to
 http://bioconductor.org/packages/2.2/bioc/html/affy.html

 5. The "Source" link points to
 http://bioconductor.org/packages/2.2/bioc/src/contrib/affy_
 1.18.2.tar.gz

 Say I wanted affy 1.16.0 instead and I made the wrong guess in Step 2,
 I can extrapolate from (Bioc 

Re: [Bioc-devel] Can copy right holder and correspondence be two differentindividuals?

2017-05-20 Thread Gabe Becker
Sorry I think I read that too fast. Imo tlif a pesos contributes
 substantially to the design of a package they can be listed in the
authors. I am (non-first) author for a couple of packages for which I
heavily contributed to the design but did not write any code. Note that the
design contributions should generally be at lesst domewhat substantial for
this approach imo.

If that is the case for your supervisor in my opinion she belongs in the
author list. If she manages you but wasn't rrally involved in deciding how
the package should behave or what it shoukd do and didn't do any
implementation then it gets murkier.

Anyway that's how I treat authorship for r packages.

Hope that helps,

~G


On May 20, 2017 10:29 AM, "Gabe Becker" <becke...@gene.com> wrote:

> So r packages don't really have a formal "corresponding author" concept,
> though they do have a number of related concepts that may do what you need:
>
> Authors/creators/contributors go in the authors (@R) field in the
> description. These are the people who created the package.
>
> Copyright holder is a formally separate concept. This is the person or
> entity that *owns* the software. This goes in the copyright field in the
> description.
>
> Maintainer is the person or persons that bug reports, feature requests etc
> should go to. They maintain the software and need not be an author (I
> think), but very often will be. These go into the maintainer field in the
> descrptuon. I think this is closest to what you mean by correspondence but
> I'm not certain.
>
> Hope that helps,
> ~G
>
> On May 20, 2017 9:40 AM, "Arman Sh" <sh88.ar...@gmail.com> wrote:
>
>> Hi everyone,
>>
>> I’m developing a Bioconductor package in which I’m creator and author but
>> someone else is the corresponding autho. Is the following code correct for
>> the R package description file? Is the corresponding author being cited?
>> She is supervising the overall procedure but she is not a programmer.
>>
>> Best regards,
>> Arman
>>
>> Authors@R: c(
>>  person(
>>given = "A", family = "S**",
>>email = "**@hotmail.com",
>>role = c("aut", "cre", "cph")
>>  ),
>>  person(
>>given = "M**", family = "***",
>>email = "*@yahoo.com",
>>role = c("rcp")
>>  )
>>)
>>
>>
>>
>>
>> [[alternative HTML version deleted]]
>>
>> ___
>> Bioc-devel@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>
>

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Re: [Bioc-devel] Can copy right holder and correspondence be two differentindividuals?

2017-05-20 Thread Gabe Becker
So r packages don't really have a formal "corresponding author" concept,
though they do have a number of related concepts that may do what you need:

Authors/creators/contributors go in the authors (@R) field in the
description. These are the people who created the package.

Copyright holder is a formally separate concept. This is the person or
entity that *owns* the software. This goes in the copyright field in the
description.

Maintainer is the person or persons that bug reports, feature requests etc
should go to. They maintain the software and need not be an author (I
think), but very often will be. These go into the maintainer field in the
descrptuon. I think this is closest to what you mean by correspondence but
I'm not certain.

Hope that helps,
~G

On May 20, 2017 9:40 AM, "Arman Sh"  wrote:

> Hi everyone,
>
> I’m developing a Bioconductor package in which I’m creator and author but
> someone else is the corresponding autho. Is the following code correct for
> the R package description file? Is the corresponding author being cited?
> She is supervising the overall procedure but she is not a programmer.
>
> Best regards,
> Arman
>
> Authors@R: c(
>  person(
>given = "A", family = "S**",
>email = "**@hotmail.com",
>role = c("aut", "cre", "cph")
>  ),
>  person(
>given = "M**", family = "***",
>email = "*@yahoo.com",
>role = c("rcp")
>  )
>)
>
>
>
>
> [[alternative HTML version deleted]]
>
> ___
> Bioc-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Re: [Bioc-devel] R6 and Bioconductor

2017-05-12 Thread Gabe Becker
On May 12, 2017 4:23 PM, "Garth Ilsley"  wrote:

Thank you.

> One place where one might think of using R6 is in the implementation of a
mutable data model underlying a GUI like a Shiny app. > If mutable
semantics are required, consider using S4 reference classes, as they offer
more features than R6 and will integrate
> directly with Bioconductor S4 classes.

If I understand correctly, you are saying that it is fine to use Reference
classes (mutable semantics) in Bioconductor. A GUI is one clear place for
this. However, what about a large dataset that is subject to progressive
analysis with various fields updated as the analysis proceeds? The typical
Bioconductor approach (as far as I have seen) is to call a method defined
for an S4 functional class that produces a new object of the same class,
with the result assigned to the same name as the original object.  For a
project considered in isolation, it wouldn't be unreasonable to use a
Reference class for this instead, but that's not what I'm asking. My
question is about the standards and approach that Bioconductor has agreed
on - to ensure consistency. Is a Reference Class permissible in this
situation?


I dont speak for the project, but i would suggest that reference classes
are really best/(almost) only useful for encoding state in
complex/unusual-for-r package code. Having user-facing objects with these
mechanics violates a pretty central idiom of R (copy on write) and thus is
imo substantially more damaging than it is worth in general.

One of the things that makes r simpler for beginners than other languages
is that when they pass an object to a function that function "can't" change
the version they have in their workspace.

If not, case closed. If they are permitted, I would suggest that R6
semantics are consistent with Reference Class semantics, but with the added
benefit of private members and "active bindi
 ngs" (they look like fields, but call a function).


Refence classes absolutely can have active binding fields. It is pretty
standard practice I think.

As for private fields, no they don't have that, but I've never really been
convinced you need them in the vast vast majority of cases. R is designed
such that the user owns their data (ie the contents of their objects). I've
never really heard a good augment why that shouldn't be the case.

That said the typical idiom in all of my code is to have paired fields, an
active binding which is a function. That does some checking/processing and
a classed field with the same name prepended with a . That it corresponds
to.

Also R6 aren't really compatible with reference class/S4 mechanics because
the fields are not classed. This may sound like a small thing but imo it's
actually quite important.

Best,
~G


This is nice and simple (for the creator and user of the class), but if not
desired (for consistency etc.), then I presume Reference Classes will do
fine.


___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] PROTECT errors in Bioconductor packages

2017-04-07 Thread Gabe Becker
On Fri, Apr 7, 2017 at 12:46 PM,  wrote:

> On Fri, 7 Apr 2017, Hervé Pagès wrote:
>
> On 04/07/2017 05:37 AM, luke-tier...@uiowa.edu wrote:
>>
>>>  On Fri, 7 Apr 2017, Hervé Pagès wrote:
>>>
>>> >  On 04/06/2017 03:29 AM, Michael Lawrence wrote:
>>> > >  On Thu, Apr 6, 2017 at 2:59 AM, Martin Morgan
>>> > >   wrote:
>>> > > >  On 04/06/2017 05:33 AM, Aaron Lun wrote:
>>> > > > > > > > > > >  The tool is not perfect, so assess each report
>>> carefully.
>>> > >  I get a lot of warnings because the tool seems to consider that
>>> >  extracting an attribute (with getAttrib(x, ...)) or extracting the
>>> >  slot of an S4 object (with GET_SLOT(x, ...) or R_do_slot(x, ...))
>>> >  returns an SEXP that needs protection. I always assumed that it
>>> >  didn't because my understanding is that the returned SEXP is pointing
>>> >  to a part of a pre-existing object ('x') and not to a newly created
>>> >  one. So I decided I could treat it like the SEXP returned by
>>> >  VECTOR_ELT(), which, AFAIK, doesn't need protection.
>>> > >  So I suspect these warnings are false positives but I'm not 100%
>>> sure.
>>>
>>>  If you are not 100% sure then you should protect :-)
>>>
>>>  There are some cases, in particular related to compact row names on
>>>  data frames, where getAttrib will allocate.
>>>
>>
>> Seriously? So setAttrib(x, ..., getAttrib) is not going to be a no-op
>> anymore? Should I worry that VECTOR_ELT() will also expand some sort
>> of compact list element? Why not keep these things low-level
>> getters/setters that return whatever the real thing is and use
>> higher-level accessors for returning the expanded version of the thing?
>>
>
> Seriously: it's been that way since r37807 in 2006.
>
> If you want to read about some related future directions you can look at
> https://svn.r-project.org/R/branches/ALTREP/ALTREP.html.


Indeed. I was wondering whether to bring this up here.

In a (hopefully near future) version of R-devel, doing, e.g., INTEGER(x)
could allocate.  There is a way to ask it to give you NULL instead of
allocating if it would need to, but the point being it's probably going to
get much harder to safely be clever about avoiding PROTECT'ing. (Luke put
in temporary suspension of GC in some places, but I don't recall the exact
details of where that was used).

As a side note to the above, you'll need to do INTEGER(x) less often than
you did before. There will be a new INTEGER_ELT and INTEGER_GET_REGION
macros which (I think) will be guaranteed to not cause SEXP allocation.

In terms of why, at least in the ALTREP case, it's so that these things can
be passed directly to the R internals and be treated like whatever
lowl-level type of thing they are (e.g. numeric vector, string vector,
list, etc). This seamless backwards compatiblity requires that code which
doesn't use the INTEGER_ELT and INTEGER_GET_REGION (or analogues) macros
needs to have INTEGER(X) work and give it the pointer it expects, which
won't necessarily exist before the first time it is required.

Best,
~G


>
> luke
>
>
>
>
>> Thanks,
>> H.
>>
>>
>>>  Best,
>>>
>>>  luke
>>>
>>> > > > > > > > > > > > > >  I also get a warning on almost every C++
>>> function I've written,
>>> > > > >  because
>>> > > > >  I use the following code to handle exceptions:
>>> > > > > > > > >   SEXP output=PROTECT(allocVector(...));
>>> > > > >   try {
>>> > > > >   // do something that might raise an exception
>>> > > > >   } catch (std::exception& e) {
>>> > > > >   UNPROTECT(1);
>>> > > > >   throw; // break out of this part of the function
>>> > > > >   }
>>> > > > >   UNPROTECT(1);
>>> > > > >   return output;
>>> > > > > > > > >  Presumably the check doesn't account for transfer of
>>> control to > > > >  the
>>> > > > >  catch block. I find that R itself is pretty good at complaining
>>> > > > >  about
>>> > > > >  stack imbalances during execution of tests, examples, etc.
>>> > > > > > > > > >  'My' packages
>>> > > > > >  (Rsamtools, DirichletMultinomial) had several false positives
>>> > > > > >  (all
>>> > > > > >  associated with use of an attribute of a protected SEXP), one
>>> > > > > >  subtle
>>> > > > > >  problem (a symbol from a PROTECT'ed package name space; the >
>>> > > > >  symbol
>>> > > > > >  could
>>> > > > > >  in theory be an active binding and the value obtained not
>>> > > > > >  PROTECTed by
>>> > > > > >  the name space), and a genuine bug
>>> > > > > > > > > > >  tag = NEW_CHARACTER(n);
>>> > > > > >  for (int j = 0; j < n; ++j)
>>> > > > > >  SET_STRING_ELT(tag, j, NA_STRING);
>>> > > > > >  if ('A' == aux[0]) {
>>> > > > > >  buf_A = R_alloc(2, sizeof(char));  # <<-
>>> bug
>>> > > > > >  buf_A[1] = '\0';
>>> > > > > >  }
>>> > > > > >  ...
>>> > > > > >  

Re: [Bioc-devel] Citation of an accompanying paper

2017-03-22 Thread Gabe Becker
Alina,

Typically in cases like the one you describe, people want users to use the
paper citation when citing use of the package. Whether this is what they
"should" want is somewhat debatable, but at least it seems reasonable, as
by using the package users are, assumedly, applying the method your package
implements.

That said, a package can (from a mechanical perspective) have list more
than one citation in it's CITATION file. Xie's knitr does this, for example
(seehttps://github.com/yihui/knitr/blob/master/inst/CITATION and what is
returned from citation(package="knitr") ) .

Whether it's good to do this, and more generally whether package users
should be expected to cite a (theory-based) methods paper and software
which implements the method, particularly when they are by the same author,
is again debatable. I have my thoughts on that but it's somewhat tangential
to your question.

It might be valuable for the Bioconductor team to have guidelines/an
official view of how to navigate these issues.

Best,
~G

On Wed, Mar 22, 2017 at 9:36 AM, Kasper Daniel Hansen <
kasperdanielhan...@gmail.com> wrote:

> If you do not have a CITATION file, a citation is automatically generated.
> So yes, effectively it would overwrite.  See details in R-exts.
>
> On Wed, Mar 22, 2017 at 6:30 AM, Alina Selega 
> wrote:
>
> > Hi Monther,
> >
> > Thank you for your reply!
> >
> > Would that overwrite the package citation that currently shows up on the
> > page though? My paper doesn't cite the Bioconductor package (as it was
> > accepted before I had a valid link to include), it just refers to the
> same
> > name of the computational method. Is there a way to include both
> citations?
> > And if not, which one should I keep?
> >
> > Thanks,
> > Alina
> >
> > On 21 March 2017 at 22:13, Monther Alhamdoosh 
> > wrote:
> >
> > > Hi Alina,
> > >
> > > I think you need to add a file named CITATION in your package (usually
> > > under the inst folder) and use bibentry as follows
> > >
> > > bibentry(bibtype = "Article",
> > >
> > >  title = "Combining multiple tools outperforms individual
> methods
> > > in gene set enrichment analyses",
> > >
> > >  author = c(person("Monther", "Alhamdoosh"),
> > >
> > > person("Milica", "Ng"),
> > >
> > > person("Nicholas", "Wilson"),
> > >
> > > person("Julie", "Sheridan"),
> > >
> > > person("Huy", "Huynh"),
> > >
> > > person("Michael", "Wilson"),
> > >
> > > person("Matthew", "Ritchie")),
> > >
> > >  journal = "Bioinformatics",
> > >
> > >  page = "414-424",
> > >
> > >  volume = 33,
> > >
> > >  number = 3,
> > >
> > >  year = 2017,
> > >
> > >  doi = "10.1093/bioinformatics/btw623")
> > >
> > >
> > >
> > > Cheers,
> > >
> > > Monther
> > >
> > > On Wed, Mar 22, 2017 at 8:11 AM, Alina Selega 
> > > wrote:
> > >
> > >> Hi,
> > >>
> > >> My methods paper (doi:10.1038/nmeth.4068) associated with my package
> > >> BUMHMM
> > >> was accepted before I submitted the package for revision at
> > Bioconductor.
> > >> I
> > >> would like the package page to also hold the citation to the paper.
> What
> > >> is
> > >> the best way to add this paper citation? (I cite it in the vignette,
> but
> > >> it
> > >> would be nice to also have it on the main page.)
> > >>
> > >> Thank you,
> > >> Alina Selega
> > >>
> > >> [[alternative HTML version deleted]]
> > >>
> > >> ___
> > >> Bioc-devel@r-project.org mailing list
> > >> https://stat.ethz.ch/mailman/listinfo/bioc-devel
> > >>
> > >
> > >
> >
> > [[alternative HTML version deleted]]
> >
> > ___
> > Bioc-devel@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/bioc-devel
> >
>
> [[alternative HTML version deleted]]
>
> ___
> Bioc-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>



-- 
Gabriel Becker, Ph.D
Associate Scientist
Bioinformatics and Computational Biology
Genentech Research

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] any interest in a BiocMatrix core package?

2017-02-24 Thread Gabe Becker
On Fri, Feb 24, 2017 at 1:26 PM, Aaron Lun  wrote:

> Hi everyone,
>
>
> I was thinking of something that you could supply any supported matrix
> representation to a registered function via .Call; the C++ constructor
> would recognise the type of matrix during class instantiation; and
> operations (row/column/random read access, also possibly various ways of
> writing a matrix) would be overloaded and behave as required for the class.
> Only the implementation of the API would need to care about the nitty
> gritty of each representation, and we would all be free to write code that
> actually does the interesting analytical stuff.
>

This seems (at least moderately) related to the alternative atomic-vector
representation work I have been doing with R-core. See
https://www.r-project.org/dsc/2016/slides/customvectors.html
and  ALTREP.md in https://svn.r-project.org/R/branches/ALTREP/ (not
necessarily fully up-to-date, you can also look at src/main/altrep.c for
implementation).

I'd also say there may be a pretty strong impedence mismatch if you want
something customizable at both the R and C levels. That's just a suspicion
at this point, though, I'm writing very quickly and will send out a more
reasoned response later because I don't have the time to do so right this
second.

Best,
~G



> Anyway, just throwing some thoughts out there. Any comments appreciated.
>
> Cheers,
>
> Aaron
>
> [[alternative HTML version deleted]]
>
>
> ___
> Bioc-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>



-- 
Gabriel Becker, Ph.D
Associate Scientist
Bioinformatics and Computational Biology
Genentech Research

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] Feedback wanted on design of fixed-width Ranges class

2016-11-23 Thread Gabe Becker
Hey all,

I just wanted to chime in on this as it relates to some work I'm doing with
Luke Tierney and Tomas Kalibera. There's another approach to this that will
be available in the near future (we hope).

Alternative internal representations of atomic vectors, including compact
representations, are coming to R, hopefully (though not guaranteed) in the
2017 release.

See https://svn.r-project.org/R/branches/ALTREP/ALTREP.html for more
details.

With this approach, we could simply have a length N integer vector for
width that only took 1 integer in memory for it's payload (so long as it's
data was accessed properly). There would likely be some gotchas but it
would let a "normal" GRanges/IRanges object exhibit the behavior you want.

It could very well be worth doing separately in the interim, but it might
also be good to leverage the machinery for this that R will be getting soon.

I'm pretty excited about exploring applications to this stuff. I'm pretty
confident we'll be able to find more ways for it to synergize with the
Bioconductor infrastructure.

Best,
~G

On Wed, Nov 23, 2016 at 5:01 PM, Ryan  wrote:

> Is it possible to allow the width slot of IRanges to be either a normal
> vector or an Rle?
>
>
>
> On 11/23/16 6:18 PM, Peter Hickey wrote:
>
>> I've been toying with the idea of a fixed/constant width Ranges
>> subclass. The motivation comes from storing DNA methylation data at CH
>> loci (non-CpG methylation): there are 1.1 billion CH loci in the human
>> genome, so to store these as a GRanges object requires 2 x 1.1 billion
>> integer vectors, one for the @start and one for the @width slots of
>> the IRanges object in the @ranges slot. But in this case, and perhaps
>> others, such as storing SNP data, we have a situation where all loci
>> have the same width, namely 1. Of course, you might argue such a
>> 2-fold reduction in size is purely academic, but I think it could be a
>> nice efficiency that's worth pursuing.
>>
>> I've sketched out two different prototypes, neither of which I've
>> worked up to a complete implementation; I'd like to get some feedback
>> on these two designs, along with a variation that I've not yet even
>> tried implementing, before I decide how/whether to proceed.
>>
>> The two approaches are:
>>
>> 1. A new Ranges subclass, FWRanges (fixed-width Ranges, open to better
>> name suggestions).
>> a. The @width slot would be an integer vector of length 1
>> b. [variation not yet implemented] The @width slot would be an Rle
>> vector parallel to @start
>> 2. Modifying the IRanges class. The @width slot may be a integer
>> vector of length 1 or a vector parallel to @start
>>
>> [Upon reflection, I suppose there could be a '2b' where the @width
>> slot is an Rle, but I'm going to ignore this for now since in general
>> it would be inefficient when the ranges have (random) variable widths]
>>
>> # Pros of 1
>>
>> - It seems the proper thing is to create a new Ranges subclass
>> - No dangers associated with stuffing around with internals of the
>> IRanges class and clean code separation
>>
>> # Pros of 1b compared to 1a
>>
>> - Like for IRanges, the @width slot would remain parallel to the @start
>> slot
>>
>> # Cons of 1
>>
>> - Can't immediately use in a GRanges object because the @ranges slot
>> is classed as an IRanges object
>> - Perhaps this could be changed to allow a Ranges object in the
>> @ranges slot of a GRanges object?
>> - Otherwise, would also need to implement a subclass of GenomicRanges
>> (say, FWGRanges) that used a FWRanges object in the @ranges slot. This
>> would necessitate a fair bit of code duplicated from GRanges methods.
>> - Methods like start<-, end<-, width<- would either have to
>> - (A) return an error if the new object no longer has fixed/constant
>> widths
>> - (B) coerce it to an IRanges object (with or without warning) thus
>> meaning these operations would not be strict endomorphisms
>> - Users would only get the space-savings of the FWRanges class if they
>> explicitly construct a FWRanges object or coerce a compatible IRanges
>> object to an FWRanges object
>> - Clean code separation from the IRanges class may also lead to
>> duplicated code
>>
>> # Cons of 1b compared to 1a
>>
>> - Endomorphic versions of methods like start<-, end<-, width<- could
>> create a @width slot that is twice the 'necessary' size (e.g., an Rle
>> representation of a vector that contains no 'runs').
>>
>> # Pros of 2
>>
>> - If properly implemented, the user wouldn't need to think about
>> whether the ranges were fixed or variable width, they'd just get the
>> most efficient representation
>>
>> # Cons of 2
>>
>> - This is fairly obvious, 2 would be a major (internal) change to a
>> core Bioconductor class
>> - The @width slot would no longer necessarily be parallel to @start
>> slot, e.g., code that does direct slot access via @width could easily
>> break (of course, the width() getter would be modified to return a
>> parallel vector to the 

Re: [Bioc-devel] GitHub and svn

2016-10-15 Thread Gabe Becker
Mani,

Related to what Kasper said, one thing you can do is commit directly to the
canonical repo for your package (which again is not on github once the
package is accepted) from rstudio. It supports svn.

~G

On Oct 15, 2016 11:38 AM, "Kasper Daniel Hansen" <
kasperdanielhan...@gmail.com> wrote:

Not at the moment.  We are in the (long) process of changing this, but
there is no ETA for it.

The complications we currently have, as soon as a package is accepted in
Bioconductor, is that the "true" repository then becomes Bioconductor SVN
and your Github repository is just a way for you to develop.  This is not
the case during package submission.

Best,
Kasper


On Sat, Oct 15, 2016 at 12:19 PM, S Manimaran 
wrote:

> Hi,
>
> I never understood the github mirror setup and the instructions below look
> unnecessarily complicated to me. I see that the current package submission
> process with the automatic hook added to github is the most easiest of all
> with every commit to github automatically triggering a build at
> Bioconductor. Now, my question is: Can't this same procedure be carried
> over once the Bioconductor 3.4 is released as well i.e commits to github
> automatically resulting in triggering a build at Bioconductor? The main
use
> case that I am looking for is an easy way to commit directly from inside
> R-Studio. With R-Studio setup for GitHub project, it directly commits to
> GitHub, but now for having to commit to BioConductor, if the automatic
> trigger works well as is the case with the new package submission process,
> all is well and good. But if I have to do as what the page in git-mirror
> says, then it looks like that I have to get out of R-Studio to do some
> overly complicated process to achieve the s!
>  ame. It will be really helpful if I can continue to use the automatic
> trigger to automatically build after Bioconductor 3.4 release as well.
>
>
> http://bioconductor.org/developers/how-to/git-mirror/
>
> Scenario 2: Set Up Your Own GitHub Repository
> If you do not already have a public git repository for package REPO the
> simplest thing to do is navigate to https://github.com/
> Bioconductor-mirror/REPO and click the Fork button in the upper right.
> This will create a copy of the repository on your personal account. You
may
> want to re-enable issue tracking in your repository (it's disabled in the
> read-only mirrors and forks inherit this setting). To do this, go to
> Settings and then click the Issues checkbox. Then perform the following
> steps in your terminal.
>
>   1.  git clone https://github.com/USER/REPO to clone the repository to
> your machine.
>   2.  cd REPO to switch to the REPO directory.
>   3.  bash /path/to/update_remotes.sh to setup the git remotes.
>   4.  Commit to git and push to GitHub as you normally would.
>   5.  Each time you want to push git commits to svn:
>  *   git checkout devel to switch to the devel branch. (use
> release-X.X for release branches)
>  *   git svn rebase to get the latest SVN changes.
>  *   git merge master --log to merge your changes from the master
> branch or skip this step and work directly on the current branch.
>  *   git svn dcommit --add-author-from to sync and commit your changes
> to svn. You may be prompted here for your SVN username and password.
> When you're done, be sure and merge any changes from svn back into the git
> master branch:
> git checkout master
> git merge devel
>
> Thanks,
> Mani
>
>
>
> [[alternative HTML version deleted]]
>
> ___
> Bioc-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] S4 overwrite inspector of virtual class

2016-08-15 Thread Gabe Becker
Ah, yeah we usually call them validator functions or (more commonly)
validity functions/validity method.

The issue with what you want is that if A is a superclass of B (B inherits
from  or "contains" A) then objects of class B must be valid objects of
class A. From ?validObject (emphasis mine)

 Validity testing takes place 'bottom up': Optionally, if
 'complete=TRUE', the validity of the object's slots, if any, is
 tested.  Then, *in all cases, for each of the classes that this*
* class extends (the 'superclasses'), the explicit validity method*
* of that class is called, if one exists.*  Finally, the validity
 method of 'object''s class is called, if there is one.

So the system is specifically designed not to allow what you're asking, as
it violates one of the tenets of R's version of OOP. The virtual class
should contain only slots, validity checks, etc that apply to literally all
subclasses.

There are ways to get around this, such as if a slot indicates what version
it is in your chromosome example, the validity check in the virtual class
could behave differently based on the value of that slot. This blurs the
line between differentiating behavior on objects by class or by contents,
though, so should (I think) be used sparingly and with care. Generally you
want to use method dispatch for class-specific differences in behavior.

Basically if a check should be different for different subclasses, it
shouldn't go in the validity method for the virtual, it should be explicit
in the the validity methods for each subclass. Keep in mind you can assign
an existing R closure (function) there, so you don't have to repeat code,
just do

.lcaseCheck = function(object) {blabla}
.ucaseCheck = function(object) {BlaBla}


and use .lcaseCheck and .ucaseCheck either as the validity functions for
your classes if they that is the only check:

setClass("CoolThing", ..., validity=.lcaseCheck)
setClass("EvenCoolerThing", ..., validity = .lcaseCheck)
setClass("DifferentCoolThing", ..., validity = .ucaseCheck)


or within them as necessary. You can dispatch on the class within a
validity method from what I can see, but that is probably overkill for a
case like this.


HTH,
~G

On Mon, Aug 15, 2016 at 9:56 AM, Zach Skidmore <zskid...@wustl.edu> wrote:

> maybe validator is the proper term in R? i've highlighted what I refer to
> as the inspector below as an example:
>
> setClass("file",
> contains="file_virtual",
> validity=function(object){
> # Check that object is as expected
>
> }
> )
>
> On 8/15/16 11:46 AM, Gabe Becker wrote:
>
> Zach,
>
> Is an inspector a method you define on your classes? I'm not quite
> following what you mean by your question. AFAIK inspectors are not
> generally a thing in R (at least that go by that name).
>
> ~G
>
> On Mon, Aug 15, 2016 at 9:42 AM, Zach Skidmore <zskid...@wustl.edu> 
> <zskid...@wustl.edu> wrote:
>
>
> Hi All,
>
> I'm currently transforming the GenVisR package into an Object Oriented
> system. Currently I have a virtual class and several child-classes. I am
> wondering if there is a way to tell R to use the inspector of the virtual
> class only if the inspector of a child class in not defined.
>
> For example say I had a class to store versions of a file-type and I have
> a slot in the class to store the position. Between different versions of
> the file-type there may be small differences (for example Chromosome may be
> capitalized in version 2,3,4 but not version 1). Ideally the child classes
> for 2,3,4 would be able to inherit the inspector from the virtual class to
> check the chromosome name and I would define a separate inspector for
> version 1 which is different.
>
> Any thoughts? Currently both inspectors are called (virtual and the
> appropriate sub-class), meaning if i added more versions in the future I
> would have to re-write the virtual and child class. Whereas if I could say
> ignore the virtual (i.e. default) inspector if another is defined I would
> only have to write the child class inspector in the future.
>
> Hopefully this makes sense, let me know if it doesn't or if i'm violating
> a core OO principle, i'm relatively new to object oriented programming.
>
> Thanks, Zach!
>
> 
> The materials in this message are private and may contain Protected
> Healthcare Information or other information of a sensitive nature. If you
> are not the intended recipient, be advised that any unauthorized use,
> disclosure, copying or the taking of any action in reliance on the contents
> of this information is strictly prohibited. If you have received this email
> in error, ple

Re: [Bioc-devel] S4 overwrite inspector of virtual class

2016-08-15 Thread Gabe Becker
Zach,

Is an inspector a method you define on your classes? I'm not quite
following what you mean by your question. AFAIK inspectors are not
generally a thing in R (at least that go by that name).

~G

On Mon, Aug 15, 2016 at 9:42 AM, Zach Skidmore  wrote:

> Hi All,
>
> I'm currently transforming the GenVisR package into an Object Oriented
> system. Currently I have a virtual class and several child-classes. I am
> wondering if there is a way to tell R to use the inspector of the virtual
> class only if the inspector of a child class in not defined.
>
> For example say I had a class to store versions of a file-type and I have
> a slot in the class to store the position. Between different versions of
> the file-type there may be small differences (for example Chromosome may be
> capitalized in version 2,3,4 but not version 1). Ideally the child classes
> for 2,3,4 would be able to inherit the inspector from the virtual class to
> check the chromosome name and I would define a separate inspector for
> version 1 which is different.
>
> Any thoughts? Currently both inspectors are called (virtual and the
> appropriate sub-class), meaning if i added more versions in the future I
> would have to re-write the virtual and child class. Whereas if I could say
> ignore the virtual (i.e. default) inspector if another is defined I would
> only have to write the child class inspector in the future.
>
> Hopefully this makes sense, let me know if it doesn't or if i'm violating
> a core OO principle, i'm relatively new to object oriented programming.
>
> Thanks, Zach!
>
> 
> The materials in this message are private and may contain Protected
> Healthcare Information or other information of a sensitive nature. If you
> are not the intended recipient, be advised that any unauthorized use,
> disclosure, copying or the taking of any action in reliance on the contents
> of this information is strictly prohibited. If you have received this email
> in error, please immediately notify the sender via telephone or return mail.
>
> [[alternative HTML version deleted]]
>
> ___
> Bioc-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>
>


-- 
Gabriel Becker, Ph.D
Associate Scientist
Bioinformatics and Computational Biology
Genentech Research

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] memory inefficiency problem of building MSPC packages

2016-08-08 Thread Gabe Becker
Jurat,

Have you tried posting on the bioconductor support site (
support.bioconductor.org)? That is the appropriate venue for usage
questions such as yours, and I suspect you may get a better response there.

The bioc-devel mailinglist is intended for a different type of question.

Best,
~G

On Mon, Aug 1, 2016 at 11:53 PM, Jurat Shayidin  wrote:

> Bioc-devel:
> I haven been developing Bioconductor Package for multiple sample peak
> calling, and all unit test for my packages is done efficiently. However, I
> have one minor problem that cause memory inefficiency when building the
> packages in my machines. To get straight, I am going to find overlap for
> multiple GRanges objects simultaneously and proceed joint analysis for
> multiple ChIP-Seq sample to rescue weak enriched region by helping with
> co-localized evidence of multiple GRanges . After I reviewed all my source
> code, indeed some paired overlap repeated many times that cause unnecessary
> memory usage.
> This is my custom function that I developed, it works perfectly in my
> current workflow, but cause memory inefficiency problem.
>
> grs <- GRangeslist(gr1, gr2, gr3, gr4, ...)
>
> overlap <- function(grs, idx=1L, FUN=which.min) {
>   chosen <- grs[[idx]]
>   que.hit <- as(findOverlaps(chosen), "List")
>   sup.hit <- lapply(grs[-idx], function(ele_) {
> ans <- as(findOverlaps(chosen, ele_), "List")
> out.idx0 <- as(FUN(extractList(ele_$p.value, ans)), "List")
> out.idx0 <- out.idx0[!is.na(out.idx0)]
> ans <- ans[out.idx0]
>   })
>   res <- c(list(que.hit), sup.hit)
>   return(res)
> }
>
> How can I optimize my custom function without memory inefficiency? How can
> I get rid of repeated overlapped paired GRanges? How can I efficiently
> solve this issue? Can anyone propose possible ideas to get through this
> problem? Thanks a lot
>
>
>
> --
> Jurat Shahidin
> Ph.D. candidate
> Dipartimento di Elettronica, Informazione e Bioingegneria
> Politecnico di Milano
>
> [[alternative HTML version deleted]]
>
> ___
> Bioc-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>
>


-- 
Gabriel Becker, Ph.D
Associate Scientist
Bioinformatics and Computational Biology
Genentech Research

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] Imports version that's only available on github?

2016-08-08 Thread Gabe Becker
Alex (et al),

Yihui is currently working towards a CRAN release for a modern version of
DT, so sometime "soon" (I have no insight into what definition of soon is
in use here) things should work in your particular case.

~G

On Fri, Aug 5, 2016 at 11:02 AM, Martin Morgan <
martin.mor...@roswellpark.org> wrote:

> On 08/05/2016 01:55 PM, Alex Pickering wrote:
>
>> My package requires a version of 'DT' that's only available on github. I
>> tried following the answer to this SO
>> > package-that-depends-on-another-r-package-located-on-github>
>> (specifying 'Remotes' in the DESCRIPTION in addition to the version needed
>> in 'Imports'). Build failed with "Package required and available but
>> unsuitable version: 'DT'". How should this be handled? Thank you,
>>
>
> Packages must be on CRAN or Bioconductor; the rationale is that these
> represent stable, tested, and somehow mature packages, rather than an
> arbitrary package of unknown stability. The StackOverflow question is a
> solution for devtools, but biocLite() uses install.packages().
>
> Martin
>
>
>> [[alternative HTML version deleted]]
>>
>> ___
>> Bioc-devel@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>
>>
>
> This email message may contain legally privileged and/or...{{dropped:2}}
>
>
> ___
> Bioc-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>



-- 
Gabriel Becker, Ph.D
Associate Scientist
Bioinformatics and Computational Biology
Genentech Research

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] [devteam-bioc] Segfault in Rsamtools when R built with static libz

2016-07-27 Thread Gabe Becker
Martin and Michael,

It looks like I have been conflating two issues when trying to deal with
these things. The problem of grabbing the system libz seems to have gone
away in the Rsamtools case earlier than in the libR.so case during my
process. wwI will check back in if something raises it's head again, but
Michael appears to be correct that just -L is sufficient (it didn't seem to
be for a long time but I think that's because libR.so wasn't getting my -L
when I thought it was).

So lets just leave it at I'm sorry for the noise and I'll be really happy
when I'm not elbow-deep in linker calls for days on end anymore...

Best,
~G



On Wed, Jul 27, 2016 at 8:05 AM, Michael Lawrence  wrote:

> Maybe Gabe could share the linker line. I think (from the man page) as
> long as the directory with the static lib comes first with -L, it
> should find the static lib, not the shared object.
>
> On Wed, Jul 27, 2016 at 6:48 AM, Martin Morgan
>  wrote:
> > Hi Gabe --
> >
> > On 07/21/2016 12:08 PM, Maintainer wrote:
> >>
> >> Hi all,
> >>
> >> I build the R installations on our research cluster. Unfortunately we
> >> are running an older OS so the system versions of various libraries
> >> (libz, bz2, pcre and libcurl, specifically) are not modern enough to
> >> build R with.
> >>
> >> For protection from ABI incompatability when R is interacting with other
> >> programs on the system, I have built static versions of those libraries
> >> and linked them directly into R. This works fine once a few gotchas are
> >> taken care of.
> >>
> >> After an inordinant amount of work, I have tracked an intermittent
> >> segfault we have been getting to Rsamtools, and specifically the version
> >> of libz that it grabs during linking.
> >>
> >> The problem is that Rsamtools is hardcoded to have -lz in it's PKG_LIBS
> >> variable by Makevars (I believe this is because the embedded version of
> >> samtools needs libz). Because there is no way (that I know of) to take
> >> the system libz out of the path, and it is an so, it will ALWAYS be used
> >> instead of the static one I want it to use instead. Furthermore, AFAICS
> >> there is no way to override the PKG_LIBS construction with an
> >> environment variable.
> >>
> >> Can someone please make Rsamtools' Makevars a bit more polite for those
> >> of us stuck in old OSes?
> >> Barring that (and until that lands) I am stuck downloading and modifying
> >> the package locally, which I really don't like doing.
> >
> >
> > Sorry to be slow at this. I guess this could be done 'elegantly' via
> > configure.ac, but that introduces some complexity. I was wondering...
> >
> > Other packages hard-code -lz, including e.g., rtracklayer,
> > VariantAnnotation, Rsubread, and R itself (including grDevices). So I
> guess
> > this is a general problem?
> >
> > Can you build R with static linkage, and set LDFLAGS to include a
> 'custom'
> > location before the system-wide location?
> >
> > Martin
> >
> >>
> >> Thanks,
> >> ~G
> >>
> >>
> >> --
> >> Gabriel Becker, Ph.D
> >> Associate Scientist
> >> Bioinformatics and Computational Biology
> >> Genentech Research
> >>
> >>
> >> 
> >> devteam-bioc mailing list
> >> To unsubscribe from this mailing list send a blank email to
> >> devteam-bioc-le...@lists.fhcrc.org
> >> You can also unsubscribe or change your personal options at
> >> https://lists.fhcrc.org/mailman/listinfo/devteam-bioc
> >>
> >
> >
> > This email message may contain legally privileged and/or...{{dropped:2}}
> >
> >
> > ___
> > Bioc-devel@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/bioc-devel
> >
>



-- 
Gabriel Becker, Ph.D
Associate Scientist
Bioinformatics and Computational Biology
Genentech Research

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


[Bioc-devel] Segfault in Rsamtools when R built with static libz

2016-07-21 Thread Gabe Becker
Hi all,

I build the R installations on our research cluster. Unfortunately we are
running an older OS so the system versions of various libraries (libz, bz2,
pcre and libcurl, specifically) are not modern enough to build R with.

For protection from ABI incompatability when R is interacting with other
programs on the system, I have built static versions of those libraries and
linked them directly into R. This works fine once a few gotchas are taken
care of.

After an inordinant amount of work, I have tracked an intermittent segfault
we have been getting to Rsamtools, and specifically the version of libz
that it grabs during linking.

The problem is that Rsamtools is hardcoded to have -lz in it's PKG_LIBS
variable by Makevars (I believe this is because the embedded version of
samtools needs libz). Because there is no way (that I know of) to take the
system libz out of the path, and it is an so, it will ALWAYS be used
instead of the static one I want it to use instead. Furthermore, AFAICS
there is no way to override the PKG_LIBS construction with an environment
variable.

Can someone please make Rsamtools' Makevars a bit more polite for those of
us stuck in old OSes?
Barring that (and until that lands) I am stuck downloading and modifying
the package locally, which I really don't like doing.

Thanks,
~G


-- 
Gabriel Becker, Ph.D
Associate Scientist
Bioinformatics and Computational Biology
Genentech Research

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] C library or C package API for regular expressions

2016-01-26 Thread Gabe Becker
Jirka,

Do you mean with millions of different patterns (motifs)? If not, the
R-level regular expression functions are vectorized, and so the looping
will already happen for you in C.

Also, have you confirmed that the R evaluation overhead will actually
dominate the pattern matching here if you just do it in R? That very well
may be, but it's not obvious to me that it would depending on details about
what you're doing that I'm not privy to.

Best,
~G

On Tue, Jan 26, 2016 at 3:25 AM, Jiří Hon 
wrote:

> Hi Dan,
>
> nice to hear, I didn't notice. The only problem could be missing header
> files, but its bundling would solve it I hope.
>
> Jirka
>
> Dne 25.1.2016 v 23:38 Dan Tenenbaum napsal(a):
>
> R requires PCRE to build, therefore perhaps it is available for use within
>> packages?
>> Dan
>>
>>
>> - Original Message -
>>
>>> From: "Jiří Hon" 
>>> To: "bioc-devel" 
>>> Sent: Saturday, January 23, 2016 1:56:52 AM
>>> Subject: [Bioc-devel] C library or C package API for regular expressions
>>>
>>
>> Dear package developers,
>>>
>>> I would like to ask you for advice. Please, what is the most seamless
>>> way to use regular expressions in C/C++ code of R/Bioconductor package?
>>> Is it allowed to bundle some C/C++ library for that (like PCRE or
>>> Boost.Regex)? Or is there existing C API of some package I can depend on
>>> and import?
>>>
>>> Thank you a lot for your attention and please have a nice day :)
>>>
>>> Jiri Hon
>>>
>>> ___
>>> Bioc-devel@r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>>
>>
> ___
> Bioc-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>



-- 
Gabriel Becker, Ph.D
Associate Scientist
Bioinformatics and Computational Biology
Genentech Research

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

[Bioc-devel] A bioconductor package archive?

2015-11-18 Thread Gabe Becker
Hi all,

Unless I'm missing something, there is no archive of prior source packages
for Bioconductor, in the vein of CRAN's Web Archive.  It has generally been
accepted, I think, that if you need a version of a package - e.g., for
precise reproducibility - you can go to the still-existing release
repository for that Bioc release, or to the svn.

The difficulty arises in determining where in the svn to look when the
old-release repo has a later version of the package. In my switchr package
I take the first commit with the requested version. This is generally
correct, but it isn't always (for example, I recently found it does not
work with IRanges version 1.20.5). Without a rule to automate the search
process that will work generally, the svn repository while  in principle
guaranteeing that a particular previous package can be re-created, does not
provide this in practice.

I know that going backwards and backfilling an archive of previous
Bioconductor package versions is not feasible, but would it be possible to
start populating an archive now moving forward - with the expectation that
they would be preserved indefinitely - to prevent this in the future?
Automated tagging of package releases within the SVN would probably also
work.

Best,
~G




Gabriel Becker, Ph.D
Associate Scientist
Bioinformatics and Computational Biology
Genentech Research

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] remove history vignettes

2015-11-17 Thread Gabe Becker
Well, You shouldn't expect the release branches to change at all based on
anything you do on trunk. They never will.

The release branch is (almost completely frozen), you shouldn't be making
changes there generally. Changes in documentation are a grey area, I
suppose, but I try to avoid those as well. Basically, once released the
package should be "as-is", and exactly the same for all people using it at
any point in that release cycle.

I'm not sure why the vignettes are still there on the devel landing page if
you have in fact bumped the package version in svn though. That likely
comes down to exactly how those splash pages are generated, which I don't
know.

Best,
~G

On Tue, Nov 17, 2015 at 8:23 AM, 顾祖光 <joker...@gmail.com> wrote:

> Thanks for your reply! I have already updated the package (also with
> bumping version numbers) for many times but these old vignettes are still
> there. They even exist in the release branch:
> http://www.bioconductor.org/packages/release/bioc/html/ComplexHeatmap.html
>
>
> On Tue, Nov 17, 2015 at 3:44 PM, Gabe Becker <becker.g...@gene.com> wrote:
>
>> Zuguang,
>>
>> From what I can see, you have not updated the version number of your
>> package, so I suspect you'll find that none of your changes have propogated
>> at all, though I can't entirely swear to that.
>>
>> I believe Bioc packages only update in the repo (which I assume is also
>> when the splash-pages are updated) upon version bump.
>>
>> ~G
>>
>> On Tue, Nov 17, 2015 at 4:35 AM, 顾祖光 <joker...@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> I am developing a package called ComplexHeatmap and re-format the
>>> vignettes
>>> recently. I deleted the old vignettes and created several new vignettes.
>>> But when I look at the web page of the package (
>>> http://bioconductor.org/packages/devel/bioc/html/ComplexHeatmap.html),
>>> the
>>> old vignette files (the bottom three vignettes) are still there but they
>>> are already removed in svn
>>> (
>>>
>>> https://hedgehog.fhcrc.org/bioconductor/trunk/madman/Rpacks/ComplexHeatmap/vignettes/
>>> )
>>>
>>> I guess it may because some rules of svn but I am not experienced for
>>> this
>>> and I commit changes through github mirror. I want to remove the bottom
>>> three vignettes which start without numbers ("Making Complex Heatmap",
>>> and
>>> the two "Quick Examples of Making Complex Heatmaps"). Can anyone give
>>> some
>>> clue how to do this?
>>>
>>> Thanks!
>>> Zuguang
>>>
>>> [[alternative HTML version deleted]]
>>>
>>> ___
>>> Bioc-devel@r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>>
>>
>>
>>
>> --
>> Gabriel Becker, Ph.D
>> Associate Scientist
>> Bioinformatics and Computational Biology
>> Genentech Research
>>
>
>


-- 
Gabriel Becker, Ph.D
Associate Scientist
Bioinformatics and Computational Biology
Genentech Research

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Re: [Bioc-devel] is.unsorted method for GRanges objects

2015-11-02 Thread Gabe Becker
Pete,

What does sorted mean for granges? If the starts  are sorted but the ends
aren't does that count? What if only the ends are but the ranges are on the
negative strand?

Do we consider seqlevels to be ordinal in the order the levels are returned
from seqlevels ()? That usually makes sense, but does it always?

In essence I'm asking if sortedness is a well enough defined term for an
is.sorted method to make sense.

Best,
~G
On Nov 2, 2015 4:27 PM, "Peter Hickey"  wrote:

> Hi all,
>
> I sometimes want to test whether a GRanges object (or some object with
> a GRanges slot, e.g., a SummarizedExperiment object) is (un)sorted.
> There is no is.unsorted,GRanges-method or, rather, it defers to
> is.unsorted,ANY-method. I'm unsure that deferring to the
> is.unsorted,ANY-method is what is really desired when a user calls
> is.unsorted on a GRanges object, and it will certainly return a
> (possibly unrelated) warning - "In is.na(x) : is.na() applied to
> non-(list or vector) of type 'S4'".
>
>
> For this reason, I tend to use is.unsorted(order(x)) when x is a
> GRanges object. This workaround is also used, for example, by minfi
> (https://github.com/kasperdanielhansen/minfi/blob/master/R/blocks.R#L121).
> However, this is slow because it essentially sorts the object to test
> whether it is already sorted.
>
>
> So, to my questions:
>
> 1. Have I overlooked a fast way to test whether a GRanges object is sorted?
> 2a. Could a is.unsorted,GenomicRanges-method be added to the
> GenomicRanges package? Side note, I'm unsure at which level to define
> this method, e.g., GRanges vs. GenomicRanges.
> 2b. Is it possible to have a sensible definition and implementation
> for is.unsorted,GRangesList-method?
> 2c. Could a is.unsorted,RangedSummarizedExperiment-method be added to
> the SummarizedExperiment package?
>
> I started working on a patch for 2a/2c, but wanted to ensure I hadn't
> overlooked something obvious. Also, I'm sure 2a/2b/2c could be written
> much more efficiently at the C-level but I'm afraid this might be a
> bit beyond my abilities to integrate nicely with the existing code.
>
> Thanks,
> Pete
>
> ___
> Bioc-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] Shouldn't we distinguish between package-specific and dependency errors?]

2015-09-24 Thread Gabe Becker
I agree with Michael. External dependencies are not avoidable in our field,
but they do put the user in a bad situation with respect to trusting their
software (and performing reproducible analyses).

If I have KEGGREST version x,y,z installed, and it was passing all it's
tests when it was deployed (and, we hope, later when I installed it), but
KEGG changes and KEGGREST no longer works, that is very much a problem for
me, as the user - it means the package is broken. Note that the package
being broken does not - directly - speak to the talent, dedication, etc of
the developer. Ludwig, you're correct in your (implied?) assertion that the
breakage isn't the *fault* of the package author, or any other package
developer, but I'm skeptical that that matters at all to the user.
Ultimately, the package is either working (and thus safe to use) or
"broken" (and thus in a use-at-your-own-risk sort of state).

As for reviewers, a good reviewer should assess the exact version of your
software that the paper is about (assuming you provided one). If *that
version* of the package works, the review shouldn't be negatively impacted.
If it doesn't, honestly I think the review *should* be negatively impacted,
even if the breakage is not really the author's fault. The reviewer's
allegiance should lie with the journal, and by extension the future
readers, who would expect the software to work as described.

~G

On Thu, Sep 24, 2015 at 11:50 AM, Michael Lawrence <
lawrence.mich...@gene.com> wrote:

> The important question is whether the package actually works, as
> distributed. if not, it's a user matter. If a build is failing because
> there is a problem with the "next" version of the package, or something
> specific to the build machine, it's a developer/admin matter. I'm guessing
> we don't routinely test packages without version bumps, but perhaps we
> should, at least when their deps change. Maybe certain packages that depend
> on external resources could be tested on a regular but less frequent basis,
> regardless.
>
>
> On Thu, Sep 24, 2015 at 11:19 AM, Ludwig Geistlinger <
> ludwig.geistlin...@bio.ifi.lmu.de> wrote:
>
> > Dan, thanks for clarifying.
> > With 'we can hardly do much about it', I meant that we cannot prevent
> that
> > for external dependencies in the way we can prevent it for
> dependendencies
> > within Bioc.
> >
> > Question remains whether the landing page for the USER of the package is
> > the right place to alert the DEVELOPER of the package.
> >
> > Best,
> > Ludwig
> >
> >
> > - Original Message -
> > > From: "Ludwig Geistlinger" 
> > > To: "Dan Tenenbaum" 
> > > Sent: Thursday, September 24, 2015 10:52:29 AM
> > > Subject: Re: [Bioc-devel] Shouldn't we distinguish between
> > package-specific and dependency errors?
> > >
> > >
> > > Well, I guess, Dan, that basically means that breaking cannot happen
> > > within Bioc (as broken packages do not propagate to the repository)
> > > and such cases are exclusively due to breaking of external
> > > dependencies such as observed with KEGGREST and KEGG (where we can
> > > hardly do much about it).
> > >
> > >
> > > Thus, it remains to clarify on the purpose of the ‚build‘ shield as
> > > Wolfgang pointed out.
> > > While it is surely helpful for the developer to grasp what is going
> > > on at a glance, this might be misleading for users and reviewers as
> > > described earlier.
> > >
> > >
> >
> > The purpose of the build shield is to alert you to the fact that the
> build
> > is broken. If the build is broken due to a dependency, it's not true that
> > there is nothing you can do about it; as Michael points out, you can
> alert
> > the maintainer of the broken package or you can (as I did) contact KEGG
> > who promptly fixed their issue. This benefits the community as a whole.
> >
> > There are other types of dependency-related errors, for example if a
> > package you depend on changes its API and you do not adapt to those
> > changes, your package will break, but YOU need to fix your package,
> nobody
> > else's package needs to change.
> >
> > I think it is exceedingly difficult to determine programmatically whether
> > a given failure was caused by a dependency or by the package itself, and
> > I'm not sure it's a good idea to try.
> >
> > I recognize that it can be bad for a reviewer to see the red build
> shield.
> > But the purpose is to alert the DEVELOPER to problems and I would
> > reiterate that there is always something you as the package author can
> do,
> > whether it's alerting the upstream developer to the problem, or if that
> > doesn't work, removing the dependency.
> >
> > Dan
> >
> >
> > > Ludwig
> > >
> > >
> > >
> > >
> > >
> > >
> > > Am 24.09.2015 um 19:31 schrieb Dan Tenenbaum < dtene...@fredhutch.org
> > > >:
> > >
> > >
> > >
> > > - Original Message -
> > >
> > >
> > > From: "Andrzej OleÅ›" < andrzej.o...@gmail.com >
> > > To: "Dan Tenenbaum" < 

Re: [Bioc-devel] version increments for unchanged packages

2015-06-11 Thread Gabe Becker
Stephanie,

As far as I know, it is so that package versions are unique to specific
releases of bioconductor. This has the benefits of

   1. providing assurances that that particular version of a package is
   tested and confirmed to work within that release, and
   2. enforces that the source code/data for a particular version of a
   package appears only and exactly once within the Bioconductor SVN
   structure.


Together, these prevent there being ambiguity when a package needs to be
updated/fixed in the context of a particular release, both in terms of what
version needs to be fixed and where that fix needs to be applied.

Best,
~G

On Thu, Jun 11, 2015 at 10:40 AM, Stephanie M. Gogarten 
sdmor...@u.washington.edu wrote:

 Why is it that packages with no changes still get new version numbers at
 every release? For example, my experiment data package GWASdata has not
 changed since the last release, but the version was bumped from 1.4.0 to
 1.6.0. I imagine most users expect that a change in version number
 indicates some change in content.

 Stephanie

 ___
 Bioc-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/bioc-devel




-- 
Gabriel Becker, Ph.D
Computational Biologist
Genentech Research

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] chromosome lengths (seqinfo) for supported BSgenome builds into GenomeInfoDb?

2015-06-05 Thread Gabe Becker
Herve,

This is probably a naive question, but what usecases are there for creating
an object with the wrong seqinfo for its genome?

~G

On Fri, Jun 5, 2015 at 11:43 AM, Michael Lawrence lawrence.mich...@gene.com
 wrote:

 On Thu, Jun 4, 2015 at 11:48 PM, Hervé Pagès hpa...@fredhutch.org wrote:
  I also think that we're heading towards something like that.
 
  So genome(gr) - hg19 would:
 
(a) Add any missing information to the seqinfo.
(b) Sort the seqlevels in canonical order.
(c) Change the seqlevels style to UCSC style if they are not.
 
  The 3 tasks are orthogonal. I guess most of the times people want
  an easy way to perform them all at once.
 
  We could easily support (a) and (b). This assumes that the current
  seqlevels are already valid hg19 seqlevels:
 
si1 - Seqinfo(c(chrX, chrUn_gl000249, chr2, chr6_cox_hap2))
gr1 - GRanges(seqinfo=si1)
hg19_si - Seqinfo(genome=hg19)
 
## (a):
seqinfo(gr1) - merge(seqinfo(gr1), hg19_si)[seqlevels(gr1)]
seqinfo(gr1)
# Seqinfo object with 4 sequences (1 circular) from hg19 genome:
#   seqnames   seqlengths isCircular genome
#   chrX155270560  FALSE   hg19
#   chrUn_gl000249  38502  FALSE   hg19
#   chr2243199373  FALSE   hg19
#   chr6_cox_hap2 4795371  FALSE   hg19
 
## (b):
seqlevels(gr1) - intersect(seqlevels(hg19_si), seqlevels(gr1))
seqinfo(gr1)
# Seqinfo object with 4 sequences (1 circular) from hg19 genome:
#   seqnames   seqlengths isCircular genome
#   chr2243199373  FALSE   hg19
#   chrX155270560  FALSE   hg19
#   chr6_cox_hap2 4795371  FALSE   hg19
#   chrUn_gl000249  38502  FALSE   hg19
 
  (c) is harder because seqlevelsStyle() doesn't know how to rename
  scaffolds yet:
 
si2 - Seqinfo(c(X, HSCHRUN_RANDOM_CTG42, 2,
 HSCHR6_MHC_COX_CTG1))
gr2 - GRanges(seqinfo=si2)
 
seqlevelsStyle(gr2)
# [1] NCBI
 
seqlevelsStyle(gr2) - UCSC
seqlevels(gr2)
# [1] chrX HSCHRUN_RANDOM_CTG42 chr2
# [4] HSCHR6_MHC_COX_CTG1
 
  So we need to work on this.
 
  I'm not sure about using genome(gr) - hg19 for this. Right now
  it sets the genome column of the seqinfo with the supplied string
  and nothing else. Aren't there valid use cases for this?

 Not sure. People would almost always want the seqname style and order
 to be consistent with the given genome.

  What about
  using seqinfo(gr) - hg19 instead? It kind of suggests that the
  whole seqinfo component actually gets filled.
 

 Yea, but genome is so intuitive compared to seqinfo.



  H.
 
  On 06/04/2015 06:30 PM, Tim Triche, Jr. wrote:
 
  that's kind of always been my goal...
 
 
  Statistics is the grammar of science.
  Karl Pearson http://en.wikipedia.org/wiki/The_Grammar_of_Science
 
  On Thu, Jun 4, 2015 at 6:29 PM, Michael Lawrence
  lawrence.mich...@gene.com mailto:lawrence.mich...@gene.com wrote:
 
  Maybe this could eventually support setting the seqinfo with:
 
  genome(gr) - hg19
 
  Or is that being too clever?
 
  On Thu, Jun 4, 2015 at 4:28 PM, Hervé Pagès hpa...@fredhutch.org
  mailto:hpa...@fredhutch.org wrote:
Hi,
   
FWIW I started to work on supporting quick generation of a
  standalone
Seqinfo object via Seqinfo(genome=hg38) in GenomeInfoDb.
   
It already supports hg38, hg19, hg18, panTro4, panTro3, panTro2,
bosTau8, bosTau7, bosTau6, canFam3, canFam2, canFam1, musFur1,
  mm10,
mm9, mm8, susScr3, susScr2, rn6, rheMac3, rheMac2, galGal4,
  galGal3,
gasAcu1, danRer7, apiMel2, dm6, dm3, ce10, ce6, ce4, ce2,
 sacCer3,
and sacCer2. I'll add more.
   
See ?Seqinfo for some examples.
   
Right now it fetches the information from internet every time you
call it but maybe we should just store that information in the
GenomeInfoDb package as Tim suggested?
   
H.
   
   
On 06/03/2015 12:54 PM, Tim Triche, Jr. wrote:
   
That would be perfect actually.  And it would radically reduce 
modularize maintenance.  Maybe that's the best way to go after
  all.  Quite
sensible.
   
--t
   
On Jun 3, 2015, at 12:46 PM, Vincent Carey
  st...@channing.harvard.edu mailto:st...@channing.harvard.edu
wrote:
   
It really isn't hard to have multiple OrganismDb packages in
  place -- the
process of making new ones is documented and was given as an
  exercise in
the EdX course.  I don't know if we want to institutionalize it
  and
distribute such -- I think we might, so that there would be
  Hs19, Hs38,
mm9, etc. packages.  They have very little content, they just
  coordinate
interactions with packages that you'll already have.
   
On Wed, Jun 3, 2015 at 3:26 PM, Tim 

Re: [Bioc-devel] chromosome lengths (seqinfo) for supported BSgenome builds into GenomeInfoDb?

2015-06-05 Thread Gabe Becker
I dunno, standardizeSeqInfo just seems really long for a function name
users are going to have to call.

At the risk of annoying Herve further, what about

gr - castSeqInfo(gr, gh19)

?

~G

On Fri, Jun 5, 2015 at 1:46 PM, Tim Triche, Jr. tim.tri...@gmail.com
wrote:

 maybe standardizeSeqinfo or fixSeqinfo is clearer after all

 Statistics is the grammar of science.
 Karl Pearson http://en.wikipedia.org/wiki/The_Grammar_of_Science

 On Fri, Jun 5, 2015 at 1:41 PM, Gabe Becker becker.g...@gene.com wrote:



 On Fri, Jun 5, 2015 at 1:39 PM, Tim Triche, Jr. tim.tri...@gmail.com
 wrote:

 how about just

  gr - addSeqinfo(gr, hg19)


  Add sounds like it's, well, adding rather than replacing (Which it
 sometimes would do.

  gr - fixSeqInfo(gr, hg19)

  instead?

  ~G


  --
   Gabriel Becker, Ph.D
 Computational Biologist
 Genentech Research





-- 
Gabriel Becker, Ph.D
Computational Biologist
Genentech Research

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] chromosome lengths (seqinfo) for supported BSgenome builds into GenomeInfoDb?

2015-06-05 Thread Gabe Becker
That sounds like it calls for an (class-style) inheritence/genome-union
 model to me. I should probably stop talking now before the people who
would have to implement that start throwing things at me, though.

~G

On Fri, Jun 5, 2015 at 12:54 PM, Kasper Daniel Hansen 
kasperdanielhan...@gmail.com wrote:

 In WGBS we frequently sequence a human with spikein from the lambda
 genome.  In this case, most of the chromosomes of the Granges are from
 human, except one.  This is a usecase where genome(GR) is not constant.  I
 suggest, partly for compatibility, to keep genome, but perhaps do something
 like
   standardizeGenome()
 or something like this.

 I would indeed love, love, love a function which just cleans it up.

 Kasper

 On Fri, Jun 5, 2015 at 2:51 PM, Gabe Becker becker.g...@gene.com wrote:

 Herve,

 This is probably a naive question, but what usecases are there for
 creating
 an object with the wrong seqinfo for its genome?

 ~G

 On Fri, Jun 5, 2015 at 11:43 AM, Michael Lawrence 
 lawrence.mich...@gene.com
  wrote:

  On Thu, Jun 4, 2015 at 11:48 PM, Hervé Pagès hpa...@fredhutch.org
 wrote:
   I also think that we're heading towards something like that.
  
   So genome(gr) - hg19 would:
  
 (a) Add any missing information to the seqinfo.
 (b) Sort the seqlevels in canonical order.
 (c) Change the seqlevels style to UCSC style if they are not.
  
   The 3 tasks are orthogonal. I guess most of the times people want
   an easy way to perform them all at once.
  
   We could easily support (a) and (b). This assumes that the current
   seqlevels are already valid hg19 seqlevels:
  
 si1 - Seqinfo(c(chrX, chrUn_gl000249, chr2, chr6_cox_hap2))
 gr1 - GRanges(seqinfo=si1)
 hg19_si - Seqinfo(genome=hg19)
  
 ## (a):
 seqinfo(gr1) - merge(seqinfo(gr1), hg19_si)[seqlevels(gr1)]
 seqinfo(gr1)
 # Seqinfo object with 4 sequences (1 circular) from hg19 genome:
 #   seqnames   seqlengths isCircular genome
 #   chrX155270560  FALSE   hg19
 #   chrUn_gl000249  38502  FALSE   hg19
 #   chr2243199373  FALSE   hg19
 #   chr6_cox_hap2 4795371  FALSE   hg19
  
 ## (b):
 seqlevels(gr1) - intersect(seqlevels(hg19_si), seqlevels(gr1))
 seqinfo(gr1)
 # Seqinfo object with 4 sequences (1 circular) from hg19 genome:
 #   seqnames   seqlengths isCircular genome
 #   chr2243199373  FALSE   hg19
 #   chrX155270560  FALSE   hg19
 #   chr6_cox_hap2 4795371  FALSE   hg19
 #   chrUn_gl000249  38502  FALSE   hg19
  
   (c) is harder because seqlevelsStyle() doesn't know how to rename
   scaffolds yet:
  
 si2 - Seqinfo(c(X, HSCHRUN_RANDOM_CTG42, 2,
  HSCHR6_MHC_COX_CTG1))
 gr2 - GRanges(seqinfo=si2)
  
 seqlevelsStyle(gr2)
 # [1] NCBI
  
 seqlevelsStyle(gr2) - UCSC
 seqlevels(gr2)
 # [1] chrX HSCHRUN_RANDOM_CTG42 chr2
 # [4] HSCHR6_MHC_COX_CTG1
  
   So we need to work on this.
  
   I'm not sure about using genome(gr) - hg19 for this. Right now
   it sets the genome column of the seqinfo with the supplied string
   and nothing else. Aren't there valid use cases for this?
 
  Not sure. People would almost always want the seqname style and order
  to be consistent with the given genome.
 
   What about
   using seqinfo(gr) - hg19 instead? It kind of suggests that the
   whole seqinfo component actually gets filled.
  
 
  Yea, but genome is so intuitive compared to seqinfo.
 
 
 
   H.
  
   On 06/04/2015 06:30 PM, Tim Triche, Jr. wrote:
  
   that's kind of always been my goal...
  
  
   Statistics is the grammar of science.
   Karl Pearson http://en.wikipedia.org/wiki/The_Grammar_of_Science
  
   On Thu, Jun 4, 2015 at 6:29 PM, Michael Lawrence
   lawrence.mich...@gene.com mailto:lawrence.mich...@gene.com
 wrote:
  
   Maybe this could eventually support setting the seqinfo with:
  
   genome(gr) - hg19
  
   Or is that being too clever?
  
   On Thu, Jun 4, 2015 at 4:28 PM, Hervé Pagès 
 hpa...@fredhutch.org
   mailto:hpa...@fredhutch.org wrote:
 Hi,

 FWIW I started to work on supporting quick generation of a
   standalone
 Seqinfo object via Seqinfo(genome=hg38) in GenomeInfoDb.

 It already supports hg38, hg19, hg18, panTro4, panTro3,
 panTro2,
 bosTau8, bosTau7, bosTau6, canFam3, canFam2, canFam1, musFur1,
   mm10,
 mm9, mm8, susScr3, susScr2, rn6, rheMac3, rheMac2, galGal4,
   galGal3,
 gasAcu1, danRer7, apiMel2, dm6, dm3, ce10, ce6, ce4, ce2,
  sacCer3,
 and sacCer2. I'll add more.

 See ?Seqinfo for some examples.

 Right now it fetches the information from internet every time
 you
 call it but maybe we should just store that information in the
 GenomeInfoDb package as Tim suggested?

 H

Re: [Bioc-devel] chromosome lengths (seqinfo) for supported BSgenome builds into GenomeInfoDb?

2015-06-05 Thread Gabe Becker
On Fri, Jun 5, 2015 at 1:19 PM, Michael Lawrence lawrence.mich...@gene.com
wrote:

 To support the multi-genome case, one could set the genome as a
 vector, one value for each seqname, and it would fix the
 style/seqlength per seqname. It could sort by the combination of
 seqname and species. Presumably it would do nothing for unknown
 genomes.


That is one way. Another would be it takes a vector of the length of
genomes being unioned, and each genome knows it's seqinfo and fixes things
within it's domain. The seqlevels, for example, would be
c(seqlevels(genome1), ..., seqlevels(genomeK)). There shouldn't be overlap,
because if there is there was already an identifiability problem in the
data which is basically guaranteed to be an error (I think).

If combinations of genomes were more formally modeled, though, you could do
fun things like

genome(x, strict=FALSE) = GRCh38

Which would do nothing if the genome on x was already a union containing
GRCh38, and otherwise would fix the human part of the seqinfo to be
GRCh37, but would leave anything that had been unioned on alone.

~G



 But I agree that a standardizeSeqinfo() that amounts to genome(x) -
 genome(x) would make sense.

 I don't think people sort too often by seqnames (except to the
 natural ordering), but I could be wrong. I do sympathize though with
 the need for a low-level accessor. At least one would want a parameter
 for disabling the standardization.

 On Fri, Jun 5, 2015 at 12:54 PM, Kasper Daniel Hansen
 kasperdanielhan...@gmail.com wrote:
  In WGBS we frequently sequence a human with spikein from the lambda
 genome.
  In this case, most of the chromosomes of the Granges are from human,
 except
  one.  This is a usecase where genome(GR) is not constant.  I suggest,
 partly
  for compatibility, to keep genome, but perhaps do something like
standardizeGenome()
  or something like this.
 
  I would indeed love, love, love a function which just cleans it up.
 
  Kasper
 
  On Fri, Jun 5, 2015 at 2:51 PM, Gabe Becker becker.g...@gene.com
 wrote:
 
  Herve,
 
  This is probably a naive question, but what usecases are there for
  creating
  an object with the wrong seqinfo for its genome?
 
  ~G
 
  On Fri, Jun 5, 2015 at 11:43 AM, Michael Lawrence
  lawrence.mich...@gene.com
   wrote:
 
   On Thu, Jun 4, 2015 at 11:48 PM, Hervé Pagès hpa...@fredhutch.org
   wrote:
I also think that we're heading towards something like that.
   
So genome(gr) - hg19 would:
   
  (a) Add any missing information to the seqinfo.
  (b) Sort the seqlevels in canonical order.
  (c) Change the seqlevels style to UCSC style if they are not.
   
The 3 tasks are orthogonal. I guess most of the times people want
an easy way to perform them all at once.
   
We could easily support (a) and (b). This assumes that the current
seqlevels are already valid hg19 seqlevels:
   
  si1 - Seqinfo(c(chrX, chrUn_gl000249, chr2,
 chr6_cox_hap2))
  gr1 - GRanges(seqinfo=si1)
  hg19_si - Seqinfo(genome=hg19)
   
  ## (a):
  seqinfo(gr1) - merge(seqinfo(gr1), hg19_si)[seqlevels(gr1)]
  seqinfo(gr1)
  # Seqinfo object with 4 sequences (1 circular) from hg19 genome:
  #   seqnames   seqlengths isCircular genome
  #   chrX155270560  FALSE   hg19
  #   chrUn_gl000249  38502  FALSE   hg19
  #   chr2243199373  FALSE   hg19
  #   chr6_cox_hap2 4795371  FALSE   hg19
   
  ## (b):
  seqlevels(gr1) - intersect(seqlevels(hg19_si), seqlevels(gr1))
  seqinfo(gr1)
  # Seqinfo object with 4 sequences (1 circular) from hg19 genome:
  #   seqnames   seqlengths isCircular genome
  #   chr2243199373  FALSE   hg19
  #   chrX155270560  FALSE   hg19
  #   chr6_cox_hap2 4795371  FALSE   hg19
  #   chrUn_gl000249  38502  FALSE   hg19
   
(c) is harder because seqlevelsStyle() doesn't know how to rename
scaffolds yet:
   
  si2 - Seqinfo(c(X, HSCHRUN_RANDOM_CTG42, 2,
   HSCHR6_MHC_COX_CTG1))
  gr2 - GRanges(seqinfo=si2)
   
  seqlevelsStyle(gr2)
  # [1] NCBI
   
  seqlevelsStyle(gr2) - UCSC
  seqlevels(gr2)
  # [1] chrX HSCHRUN_RANDOM_CTG42 chr2
  # [4] HSCHR6_MHC_COX_CTG1
   
So we need to work on this.
   
I'm not sure about using genome(gr) - hg19 for this. Right now
it sets the genome column of the seqinfo with the supplied string
and nothing else. Aren't there valid use cases for this?
  
   Not sure. People would almost always want the seqname style and order
   to be consistent with the given genome.
  
What about
using seqinfo(gr) - hg19 instead? It kind of suggests that the
whole seqinfo component actually gets filled.
   
  
   Yea, but genome is so intuitive compared to seqinfo.
  
  
  
H.
   
On 06/04/2015 06:30 PM, Tim Triche, Jr. wrote:
   
that's kind of always been my goal

Re: [Bioc-devel] chromosome lengths (seqinfo) for supported BSgenome builds into GenomeInfoDb?

2015-06-05 Thread Gabe Becker
On Fri, Jun 5, 2015 at 1:39 PM, Tim Triche, Jr. tim.tri...@gmail.com
wrote:

 how about just

 gr - addSeqinfo(gr, hg19)


Add sounds like it's, well, adding rather than replacing (Which it
sometimes would do.

gr - fixSeqInfo(gr, hg19)

instead?

~G


-- 
Gabriel Becker, Ph.D
Computational Biologist
Genentech Research

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


[Bioc-devel] Meta-information about bioc and bioc-package releases

2015-05-27 Thread Gabe Becker
Hey all,

I've been wondering if bioc could offer a way to query information about
itself similar - though not necessarily identical -  to what Gabor Csardi's
(cc'ed) crandb offers http://www.r-pkg.org/services#api
http://www.r-pkg.org/services#api

Basically, I would really like two things, in increasing order of
difficulty:

#1: A webservice to easily determine the Bioc release version associated
with a particular version of R.

R knows how to do this itself, but it doesn't share the wealth (function
not exported) so I can't get at that without a NOTE which blocks submission
to CRAN.


Note that I explicitly do not want to require or depend on the presence of
BiocInstaller. Depending on BiocInstaller would prevent switchr from
allowing context switching between Bioc versions, which is a major use-case
for the software.

Something as simple as a web service which accepts an R version string and
spits out the Bioc release number would be sufficient, though a way to get
the full repo URLs would be a nice bonus

#2: A crandb-like database of Bioc package version releases.

The queries that I would specifically like, in the context of
reproducibility with my switchr package, are:


   - Which exact package versions were available on a specific date
   - Start and end dates for a given release of bioc
   - What date was a specific package version first released, and the span
   dates it was available via (a) repo
   - Dependency and reverse dependency information for each release of each
   package
   - Given a package version, the smallest set of packages necessary to
   install the package, and the versions of those dependencies that were
   concurrent with the initial release, midpoint, and last moment before being
   superceded for the package in question. (crandb does not offer this; I can
   do it with many seperate calls, but it seems like the computation should be
   server-side for this). Peferably with or without including Suggests.
   Depending on implementation this could be completely or partially
   pre-computed for efficiency.
   - A mapping from package versions in repositories to SVN branch and
   commit, if possible.
   - Possibly other stuff crandb tracks.

Would one or both of these be possible?

I looked at using the crandb code directly, but it pulls from
cranmirror/src/contrib/Meta/archive.rds to get all of this information,
and that file either doesn't exist or isn't readable in the Bioc
repositories.  If we were to change that, I think a lot of what crandb
offers would come nearly for free, the only work being adding things that
crandb doesn't offer (the tree-shaking and version to svn mapping, for
example).

Best
~G

-- 
Gabriel Becker, Ph.D
Computational Biologist
Genentech Research

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] Meta-information about bioc and bioc-package releases

2015-05-27 Thread Gabe Becker
Dan,

Thanks for the quick response. Unfortunately I am hoping for something I
can access with base R, including older versions, which precludes https and
curl. Is there any way to have a public facing url that hosts that file, or
a webservice which returns it directly?

Best,
~G

On Wed, May 27, 2015 at 10:47 AM, Dan Tenenbaum dtene...@fredhutch.org
wrote:



 - Original Message -
  From: Gabe Becker becker.g...@gene.com
  To: bioc-devel@r-project.org
  Cc: csardi gabor csardi.ga...@gmail.com
  Sent: Wednesday, May 27, 2015 10:31:43 AM
  Subject: [Bioc-devel] Meta-information about bioc and bioc-package
 releases
 
  Hey all,
 
  I've been wondering if bioc could offer a way to query information
  about
  itself similar - though not necessarily identical -  to what Gabor
  Csardi's
  (cc'ed) crandb offers http://www.r-pkg.org/services#api
  http://www.r-pkg.org/services#api
 
  Basically, I would really like two things, in increasing order of
  difficulty:
 
  #1: A webservice to easily determine the Bioc release version
  associated
  with a particular version of R.
 
  R knows how to do this itself, but it doesn't share the wealth
  (function
  not exported) so I can't get at that without a NOTE which blocks
  submission
  to CRAN.
 

 Here's a web service to do this:

  curl -u readonly:readonly
 https://hedgehog.fhcrc.org/bioconductor/trunk/bioconductor.org/config.yaml

 This returns a yaml file containing a list/hash called
 r_ver_for_bioc_ver. It also contains the current release
 (release_version) and devel (devel_version) of Bioconductor. Putting
 this information together you should be able to
 determine what you want.

 
  Note that I explicitly do not want to require or depend on the
  presence of
  BiocInstaller. Depending on BiocInstaller would prevent switchr from
  allowing context switching between Bioc versions, which is a major
  use-case
  for the software.
 
  Something as simple as a web service which accepts an R version
  string and
  spits out the Bioc release number would be sufficient, though a way
  to get
  the full repo URLs would be a nice bonus
 
  #2: A crandb-like database of Bioc package version releases.
 
  The queries that I would specifically like, in the context of
  reproducibility with my switchr package, are:
 
 
 - Which exact package versions were available on a specific date
 - Start and end dates for a given release of bioc
 - What date was a specific package version first released, and the
 span
 dates it was available via (a) repo
 - Dependency and reverse dependency information for each release
 of each
 package
 - Given a package version, the smallest set of packages necessary
 to
 install the package, and the versions of those dependencies that
 were
 concurrent with the initial release, midpoint, and last moment
 before being
 superceded for the package in question. (crandb does not offer
 this; I can
 do it with many seperate calls, but it seems like the computation
 should be
 server-side for this). Peferably with or without including
 Suggests.
 Depending on implementation this could be completely or partially
 pre-computed for efficiency.
 - A mapping from package versions in repositories to SVN branch
 and
 commit, if possible.
 - Possibly other stuff crandb tracks.
 
  Would one or both of these be possible?
 
  I looked at using the crandb code directly, but it pulls from
  cranmirror/src/contrib/Meta/archive.rds to get all of this
  information,
  and that file either doesn't exist or isn't readable in the Bioc
  repositories.  If we were to change that, I think a lot of what
  crandb
  offers would come nearly for free, the only work being adding things
  that
  crandb doesn't offer (the tree-shaking and version to svn mapping,
  for
  example).

 I'm not sure how the Meta/archive.rds file is generated; do you know?

 As for the other requests, we'll need to discuss this stuff internally
 before responding further.

 Dan





  Best
  ~G
 
  --
  Gabriel Becker, Ph.D
  Computational Biologist
  Genentech Research
 
[[alternative HTML version deleted]]
 
  ___
  Bioc-devel@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/bioc-devel
 




-- 
Gabriel Becker, Ph.D
Computational Biologist
Genentech Research

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] building R and bioconductor

2015-04-28 Thread Gabe Becker
Hi all,

Laurent: Thanks for the shout-out! It's gratifying to see that people have
noticed what I'm doing.

Berri,

As Laurent suggested, my switchr package (currently at
http://github.com/gmbecker/switchr , on CRAN soon) is designed to do this.
It has a few features which I think are relevant to what you're asking for:

   - It is designed to easily install cohorts of packages - or specific
   versions thereof.
   - It uses install.packages to get dependency issues right, while adding
   an extra layer that understands versions, and can retreive old versions
   from a variety of sources (github, CRAN Archive, Bioc previous releases and
   SVN, etc).
   - It provides a mechanism for describing cohorts of specific package
   versions (a SeedingManifest) in a form that can be
  - used to install the listed packages
  - deployed as a traditional repository (including integration
  testing) via my related GRANBase package
  - Published or distributed in a light-weight manner, e.g. as a github
  gist (via http://github.com/gmbecker/switchrGist )


I would argue that whatever approach you choose should deal with package
dependencies and installation order automatically. There should be no
reason for you to need to specify those yourself, and lots of downsides of
doing so. AFAIK RStudio's packrat and my switchr systems both have this
property.


Some things switchr doesn't do:

   - Anything at the R-version level or above.
  - This is ripe for a combination with a docker-based approach. Carl
  Boettiger (cc'ed) is one of the people behind
  https://github.com/rocker-org/rocker and there was some discussion of
  incorporating switchr there for this purpose, but I don't know
if anything
  has happened with that yet.
   - Binary packages - Currently, you must have the ability to build all
   the packages you want from source on the destination system.

I'm happy to answer any further questions about switchr or to hear about
features you need that it doesn't seem to have yet.

Best,
~G

On Tue, Apr 28, 2015 at 10:36 AM, Laurent Gatto lg...@cam.ac.uk wrote:


 Dear Stefano,

 On 28 April 2015 16:50, Berri, Stefano wrote:

  Hi.
 
  I need a very reproducible way of creating a R builds with a series of
  CRAN and Bioconductor packages.
 
  I want to be able to download a specific version or R, a specific
  version of all packages and then install them in the right order (to
  make sure every package has the dependencies at installation time). I
  do not want to use install.packages or biocLite because they are not
  very transparent/reproducible I have done it for R-3.0.2, but it has
  been rather slow and boring (Try installing a package, see the
  complaint, go into a working installation, load the package an figure
  out the version of all the related packages).
 
  I am wondering if there is an automated way, or a repository, to
  recursively retrieve all the package versions required in a certain
  version of R.

 Using biocLite, you will get the specific package versions required for
 your R version. If you want more control, I think packrat [1,2] from
 RStudio or this paper [3] by Gabe Becker et al. might helpful.

   [1] https://rstudio.github.io/packrat/
   [2] https://github.com/rstudio/packrat
   [3] http://arxiv.org/abs/1501.02284

 You might also consider rolling out your own docker image.

   [4] http://bioconductor.org/help/docker/

  I am also looking into easybuild
  (http://easybuild.readthedocs.org/en/latest/), and, also there,
  knowing the exact path to all the packages in the right order seem
  crucial.
 
  how do install.packages or biocLite know what version is required and
  where it is located?

 The locations are defined by getOption(repos) and biocinstallRepos()
 respectively. In the latter case, these will also depend on the version
 of R.

 Hope this helps.

 Best wishes,

 Laurent

  Thanks a lot
 
  Stefano
 
  ___
  Bioc-devel@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/bioc-devel

 ___
 Bioc-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/bioc-devel




-- 
Gabriel Becker, Ph.D
Computational Biologist
Genentech Research

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] building R and bioconductor

2015-04-28 Thread Gabe Becker
Martin,


On Tue, Apr 28, 2015 at 10:52 AM, Martin Morgan mtmor...@fredhutch.org
wrote:


 Downloading a specific version of Bioconductor packages retrospectively is
 challenging, unless the version is the final version in a release cycle.
 This is because the Bioc repository only contains one version of a package
 for each release, and while it is might be possible to dissect the svn logs
 of individual packages to identify when a DESCRIPTION file had a particular
 version, there may be several svn commits associated with the version
 (packages are only pushed to the public repository when a version changes
 at the time the build starts each day, so the first svn revision would be a
 good [but not infallible] bet as to the revision that was made public).


This is true. switchr does this via the heuristic that you suggest. The
model here is that a version bump (in SVN/github/etc) represents a new
version of the package, and that all changes between version bumps simply
accumulate but do not arrive until the next version of the pacakge.



 If you wish to take a snapshot 'now' and have it available in the future,
 then the tools Laurent mentions might be appropriate. I think I would
 rather (but maybe I'm just perverse in this respect) rsync (
 http://bioconductor.org/about/mirrors/mirror-how-to/ and similar for
 CRAN) or manually create a 'CRAN-style' repository of source packages, and
 simply use this as the 'repos' argument (including pointing to a local file
 system) in install.packages.


Again, switchr does this via a concept called lazy repositories, which
are built on demand with only the required packages, but that is how
dependencies between packages are supported, even when those packages don't
currently reside in any repository.

Best,
~G



-- 
Gabriel Becker, Ph.D
Computational Biologist
Genentech Research

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] Short URLs for packages?

2015-03-24 Thread Gabe Becker
On Tue, Mar 24, 2015 at 7:28 AM, Wolfgang Huber whu...@embl.de wrote:

  5. At the end of the day I find myself casting my lot for landing pages
 with the form
   http://bioconductor.org/release/BiocGenerics/
  which leads to a little less typing but not the dynamic resolution that
 started this (version) of the thread.

 But we already have dynamic resolution. Even
 http://bioconductor.org/release/BiocGenerics will point to different
 package versions (e.g. after bugfixes) as time goes by.
 So the attribute “release” is dynamically resolved.
 All I am asking for is another attribute that means “the best that we
 currently have”, i.e. release if it exists and devel otherwise.

 I didn’t expect so much disagreement on so mundane an issue. And there are
 plenty of ways of doing this outside the Bioc webpage, any of the public
 ’tiny URL’ services, through my own webpage, or by just telling people to
 google the package name.


I just think there are a couple of subtleties here. I certainly don't
begrudge people wanting to type less and find packages easier. But if a
naive user with a default (read: release) Bioc installation goes to
http://bioconductor.org/CoolAwesomePkg and see's that it is available in
bioconductor but then can't install it because it is only in devel, are
they going to be less confused, or more? I don't know the answer to that,
but I think it's something to consider.

Also, as I have said elsewhere, though I acknowledge that you seem to
disagree, I think such urls are substantially less appropriate for
credit/citation in publications. A link that brought users to the version
in question, but which - if not current - had a prominent link to the
current version would be better imho.




  On 24 Mar 2015, at 12:14, Martin Morgan mtmor...@fredhutch.org wrote:
 
  4. In terms of best practices, it seems like articles are about
 particular versions and should cite the package as such, for instance if
 only in devel when the paper is being written as .../3.1/..., but that
 there is no substantive cost to also referencing 'current version available
 [after April, 2015] at .../release/….

 I don’t agree. This would mean that for each later version of the same
 package, even just after a trivial typo fix, there is either no article, or
 another one would have to be written. I don’t think this has an easily
 formalized solution, some good judgement is required.
 E.g. try to apply the above reasoning to
 http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1003118


I agree that there can be a bit of a beard problem* here. If people
follow the Bioc development guidelines, though, I think a pretty good rule
of thumb can be had: bugfix version changes (in the major.minor-bugfix
nomenclature) are (relatively unambiguously) the same software from a
publication standpoint, while package versions with minor or major version
differences are not. This doesn't mean that a new article need to be
written, imo, just that awareness that the article discussed a  different
version of the software - and that users should see the NEWS file or
current documentation  for fully up-to-date information - is important.

Not to harp on you personally, Wolfgang, because your paper with Simon
about DESeq was ahead of its time (and ours, sadly) on many of these
issues, but the API and default behavior of DESeq have changed
substantially (and for the better!) since its publication [1].

As a never-going-to-happen pipedream, this would be even more
straight-forward if Bioc package version numbers were of the form
(BiocVersion.PkgVersion-bugfix). Then the automatic incrementing of package
versions for bioc releases wouldn't muddy the waters here.


* The philosophical issue where some men obviously have beards, and some
men obviously don't, but there is no exact number of facial hairs at which
one unambiguously transitions from not having a beard to having one.

[1]
http://blog.revolutionanalytics.com/2014/08/gran-and-switchr-cant-send-you-back-in-time-but-they-can-send-r-sort-of.html

~G
-- 
Gabriel Becker, Ph.D
Computational Biologist
Genentech Research

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] Short URLs for packages?

2015-03-23 Thread Gabe Becker
On Mon, Mar 23, 2015 at 9:00 AM, Tim Triche, Jr. tim.tri...@gmail.com
wrote:

 .../release/... silently changes every six months or so, as does
 .../devel/..., so I don't see how this changes anything beyond that.  It
 does make finding the

packages a lot easier in general, and more mnemonic.


It makes finding whatever the package is at the time you read the
publication easier, yes. Finding the software discussed or used in the
publication ... not really.

Packages are (read: should be, IMHO) published, citable pieces of research,
though. Imagine if a paper you cite were silently updated without the
doi/citation changing. That wouldn't be good




 If you want to document the versions of packages used in an analysis,
 there's always sessionInfo() and/or a dockerfile, rite?


I guess my problem is that there is even an if at the beginning of that
sentence.  That's not an attack on you, I know that the above reflects the
current state of affairs, I'm simply saying that perhaps Bioconductor, as a
project, can help/encourage people to do better.

~G



 --t

  On Mar 23, 2015, at 8:38 AM, Gabe Becker becker.g...@gene.com wrote:
 
  On Mon, Mar 23, 2015 at 8:05 AM, Fischer, Bernd 
  b.fisc...@dkfz-heidelberg.de wrote:
 
 
  During the production process of the paper we want to link to the
  accompanying
  BioC package that is in devel, but not yet in release. Before the first
  release, the
  link (e.g. www.bioconductor.org/packagename) should go to the devel
  version
  (maybe with an additional warning that it is only available in devel),
  before the
  first release of the package and should go to release afterwards.
 
  I understand the appeal of this, but decoupling publications from the
  actual, exact versions they discuss or use seems like a relatively large
  step backwards in terms of reproducibility. At the very least, I think
  there is some nuance here that warrants careful consideration before we
  adopt a single-silently-changing-link-per-package paradigm.
 
  ~G
 
 
 
 
 
  Bernd
 
 
  On 23.03.2015, at 11:45, Sean Davis seand...@gmail.com wrote:
 
  Just so we don't lose the thoughts that have come before, here is a
 link
  to
  a similar proposal from last year.
 
  https://stat.ethz.ch/pipermail/bioc-devel/2014-February/005292.html
 
  Sean
 
 
 
  On Mon, Mar 23, 2015 at 6:17 AM, Wolfgang Huber whu...@embl.de
 wrote:
 
  I wonder whether it’d possible to have the website understand URLs
 like
http://www.bioconductor.org/pkgname
 
  This could resolve to
  http://www.bioconductor.org/packages/release/bioc/html/pkgname.html
  or
  http://www.bioconductor.org/packages/devel/bioc/html/pkgname.html
  depending on whether the package was yet released.
 
  This could be handy in papers or grants that mention packages.
 
  Wolfgang
 
 
  
  Wolfgang Huber
  Principal Investigator, EMBL Senior Scientist
  Genome Biology Unit
  European Molecular Biology Laboratory (EMBL)
  Heidelberg, Germany
 
  T +49-6221-3878823
  wolfgang.hu...@embl.de
  http://www.huber.embl.de
 
  ___
  Bioc-devel@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/bioc-devel
 
   [[alternative HTML version deleted]]
 
  ___
  Bioc-devel@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/bioc-devel
 
  ___
  Bioc-devel@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/bioc-devel
 
 
 
  --
  Gabriel Becker, Ph.D
  Computational Biologist
  Genentech Research
 
 [[alternative HTML version deleted]]
 
  ___
  Bioc-devel@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/bioc-devel




-- 
Gabriel Becker, Ph.D
Computational Biologist
Genentech Research

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] Short URLs for packages?

2015-03-23 Thread Gabe Becker
On Mon, Mar 23, 2015 at 9:25 AM, Tim Triche, Jr. tim.tri...@gmail.com
wrote:

 I guess my problem is that there is even an if at the beginning of that
 sentence.  That's not an attack on you, I know that the above reflects the
 current state of affairs, I'm simply saying that perhaps Bioconductor, as a
 project, can help/encourage people to do better.


 Quite true.  Perhaps that could be emphasized as part of adding the
 redirect rules.  I am always delighted when people cite the version number
 of a package, as it shows that they care about the quality of their work,
 and the stability of its conclusions.

 People rarely do what they know is right; they do what is convenient, then
 repent.  (Bob Dylan pointed this out a while ago...)


I agree. That is why I am somewhat leery of making more convenient to do
the wrong thing (or to not do the right one).



 Thus it is more likely that a person will do the right thing if it happens
 to be the most convenient thing IMHO. Anything to advance this strategy
 would be a step in the right direction


Bioc core team: This may be getting a bit off topic,  but has there been
any discussion of working with an organization like http://zenodo.org/ to
get DOIs assigned to Bioc packages on release? This could be for every
release or only for the initial inclusion in a Bioc release, but if they
are  version specific they would make citing, etc easier and more rigorous.
We could have a biocCite or figure out how to get citation to do the right
thing.

~G





 Best,

 --t


 ~G



 --t

  On Mar 23, 2015, at 8:38 AM, Gabe Becker becker.g...@gene.com wrote:
 
  On Mon, Mar 23, 2015 at 8:05 AM, Fischer, Bernd 
  b.fisc...@dkfz-heidelberg.de wrote:
 
 
  During the production process of the paper we want to link to the
  accompanying
  BioC package that is in devel, but not yet in release. Before the first
  release, the
  link (e.g. www.bioconductor.org/packagename) should go to the devel
  version
  (maybe with an additional warning that it is only available in devel),
  before the
  first release of the package and should go to release afterwards.
 
  I understand the appeal of this, but decoupling publications from the
  actual, exact versions they discuss or use seems like a relatively large
  step backwards in terms of reproducibility. At the very least, I think
  there is some nuance here that warrants careful consideration before we
  adopt a single-silently-changing-link-per-package paradigm.
 
  ~G
 
 
 
 
 
  Bernd
 
 
  On 23.03.2015, at 11:45, Sean Davis seand...@gmail.com wrote:
 
  Just so we don't lose the thoughts that have come before, here is a
 link
  to
  a similar proposal from last year.
 
  https://stat.ethz.ch/pipermail/bioc-devel/2014-February/005292.html
 
  Sean
 
 
 
  On Mon, Mar 23, 2015 at 6:17 AM, Wolfgang Huber whu...@embl.de
 wrote:
 
  I wonder whether it’d possible to have the website understand URLs
 like
http://www.bioconductor.org/pkgname
 
  This could resolve to
  http://www.bioconductor.org/packages/release/bioc/html/
 pkgname.html
  or
  http://www.bioconductor.org/packages/devel/bioc/html/pkgname.html
  depending on whether the package was yet released.
 
  This could be handy in papers or grants that mention packages.
 
  Wolfgang
 
 
  
  Wolfgang Huber
  Principal Investigator, EMBL Senior Scientist
  Genome Biology Unit
  European Molecular Biology Laboratory (EMBL)
  Heidelberg, Germany
 
  T +49-6221-3878823
  wolfgang.hu...@embl.de
  http://www.huber.embl.de
 
  ___
  Bioc-devel@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/bioc-devel
 
   [[alternative HTML version deleted]]
 
  ___
  Bioc-devel@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/bioc-devel
 
  ___
  Bioc-devel@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/bioc-devel
 
 
 
  --
  Gabriel Becker, Ph.D
  Computational Biologist
  Genentech Research
 
 [[alternative HTML version deleted]]
 
  ___
  Bioc-devel@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/bioc-devel




 --
 Gabriel Becker, Ph.D
 Computational Biologist
 Genentech Research




-- 
Gabriel Becker, Ph.D
Computational Biologist
Genentech Research

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] Changes to the SummarizedExperiment Class

2015-03-03 Thread Gabe Becker
Jim et al.,

Why have two accessors (rowRanges, rowData), each of which are less
flexible than the underlying structure and thus will fail (return NULL? or
GRanges()/DataFrame() ?) in some proportion of valid objects?

~G

On Tue, Mar 3, 2015 at 2:37 PM, Jim Hester james.f.hes...@gmail.com wrote:

 Motivated by the discussion thread from November (https://stat.ethz.ch/
 pipermail/bioc-devel/2014-November/006686.html) the Bioconductor core team
 is planning on making changes to the SummarizedExperiment class.  Our end
 goal is to allow the @rowData slot to become more flexible and hold either
 a DataFrame or GRanges type object.

 To this end we have currently deprecated the current rowData accessor in
 favor of a rowRanges accessor.  This change has resulted in a few broken
 builds in devel, which we are in the process of fixing now.  We will
 contact any package authors directly if needed for this migration.

 The rowData accessor will be deprecated in this release, however eventually
 the plan is to re-purpose this function to serve as an accessor for
 DataFrame data on the rows.

 Please let us know if you have any questions with the above and if you need
 any assistance with the transition.

 [[alternative HTML version deleted]]

 ___
 Bioc-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/bioc-devel




-- 
Gabriel Becker, Ph.D
Computational Biologist
Genentech Research

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] Importing classes into NAMESPACE

2015-02-25 Thread Gabe Becker
Karolis,

Do you really not need any of the methods for GRanges and ExpressionSet
objects? import(GenomicRanges) might be better, even though the package
isn't exactly small.

~G

On Wed, Feb 25, 2015 at 6:27 AM, Thomas Sandmann sandmann.tho...@gene.com
wrote:

 Hi Karolis,

 These classes have constructor functions of the same name as the class. For
 example, the constructor function for GRanges is called GRanges().

 If you use the constructors you need to import them separately, e.g.

 importFrom GenomicRanges GRanges

 Best,
 Thomas

 [[alternative HTML version deleted]]

 ___
 Bioc-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/bioc-devel




-- 
Gabriel Becker, Ph.D
Computational Biologist
Genentech Research

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] BamTallyParam argument 'which'

2015-02-25 Thread Gabe Becker
I think we need to be a little careful of trying to know the users
intentions better than they do here.  Reduce is a (very) easy operation to
do on a GRanges, so if the user didn't, are we really safe assuming they
meant to when passing the GRanges as a which?

I would argue for the samtools way, not because samtools does it (though
consistency is good) but because it allows the user to do more things,
while not making it that painful to do the thing that they might want most
often.

I agree with Michael that an additional argument might be a good middle
ground.

~G

On Tue, Feb 24, 2015 at 7:40 PM, Leonardo Collado Torres lcoll...@jhu.edu
wrote:

 Related to my post on a separate thread
 (https://stat.ethz.ch/pipermail/bioc-devel/2015-February/006978.html),
 I think that if 'which' is not being reduced by default, a simple
 example showing the effects of this could be included in the functions
 that have such an argument. Also note that 'reducing' could lead to
 unintended results.

 For example, in the help page for GenomicAlignments::readGAlignments,
 after the 'gal4' example it would be nice to add something like this:


 ## Note that if overlapping ranges are provided in 'which'
 ## reads could be selected more than once. This would artificually
 ## increase the coverage or affect other downstream results.
 ## If you 'reduce' the ranges, reads that originally overlapped
 ## two disjoint segments will be included.

 which_dups - RangesList(seq1=rep(IRanges(1000, 2000), 2),
 seq2=IRanges(c(100, 1000), c(1000, 2000)))
 param_dups - ScanBamParam(which=which_dups)
 param_reduced - ScanBamParam(which=reduce(which_dups))
 gal4_dups - readGAlignments(bamfile, param=param_dups)
 gal4_reduced - readGAlignments(bamfile, param=param_reduced)


 length(gal4)

 ## Duplicates some reads. In this case, all the ones between
 ## bases 1000 and 2000 on seq1.
 length(gal4_dups)

 ## Includes some reads that mapped around base 1000 in seq2
 ## that were excluded in gal4.
 length(gal4_reduced)








 Here's the output:



  library('GenomicAlignments')
 
  ## Code already included in ?readGAlignments
  bamfile - system.file(extdata, ex1.bam, package=Rsamtools,
 +mustWork=TRUE)
  which - RangesList(seq1=IRanges(1000, 2000),
 + seq2=IRanges(c(100, 1000), c(1000, 2000)))
  param - ScanBamParam(which=which)
  gal4 - readGAlignments(bamfile, param=param)
  gal4
 GAlignments object with 2404 alignments and 0 metadata columns:
  seqnames strand   cigarqwidth start   end
 width njunc
 Rle  Rle character integer integer integer
 integer integer
  [1] seq1  + 35M35   970  1004
35 0
  [2] seq1  + 35M35   971  1005
35 0
  [3] seq1  + 35M35   972  1006
35 0
  [4] seq1  + 35M35   973  1007
35 0
  [5] seq1  + 35M35   974  1008
35 0
  ...  ...... ...   ...   ...   ...
   ...   ...
   [2400] seq2  + 35M35  1524  1558
35 0
   [2401] seq2  + 35M35  1524  1558
35 0
   [2402] seq2  - 35M35  1528  1562
35 0
   [2403] seq2  - 35M35  1532  1566
35 0
   [2404] seq2  - 35M35  1533  1567
35 0
   ---
   seqinfo: 2 sequences from an unspecified genome
 
  ## Note that if overlapping ranges are provided in 'which'
  ## reads could be selected more than once. This would artificually
  ## increase the coverage or affect other downstream results.
  ## If you 'reduce' the ranges, reads that originally overlapped
  ## two disjoint segments will be included.
 
  which_dups - RangesList(seq1=rep(IRanges(1000, 2000), 2),
 + seq2=IRanges(c(100, 1000), c(1000, 2000)))
  param_dups - ScanBamParam(which=which_dups)
  param_reduced - ScanBamParam(which=reduce(which_dups))
  gal4_dups - readGAlignments(bamfile, param=param_dups)
  gal4_reduced - readGAlignments(bamfile, param=param_reduced)
 
 
  length(gal4)
 [1] 2404
 
  ## Duplicates some reads. In this case, all the ones between
  ## bases 1000 and 2000 on seq1.
  length(gal4_dups)
 [1] 3014
 
  ## Includes some reads that mapped around base 1000 in seq2
  ## that were excluded in gal4.
  length(gal4_reduced)
 [1] 2343
 
 
 
 
 
  options(width = 120)
  devtools::session_info()
 Session
 info---
  setting  value
  version  R Under development (unstable) (2014-11-01 r66923)
  system   x86_64, darwin10.8.0
  ui   AQUA
  language (EN)
  collate  en_US.UTF-8
  tz   America/New_York


 

Re: [Bioc-devel] A quick questions on writing R functions

2015-02-20 Thread Gabe Becker
Dongmei,

This isn't really the right list for this type of question. That said, I
can give you a few pointers.

The actual section of the manual you quoted is about documenting your
function, not the code of the function itself.

That said, eval is not a good name for your function. It doesn't tell the
caller what the function is going to do, and it will mask the eval function
in base (this is unlikely to actually cause any problems, but it's bad
practice).

This function could also be vectorized if you turn it into a an elementwise
comparsion of two matrices followed by a call to colSums. Even if not, I
suspect that reptition is simply the length of the vectors you pass in? If
so, just call mean on the logical vectors, no need to add them and divide
as a separate step (though admittedly that isn't really going to save you
much).

~G


On Fri, Feb 20, 2015 at 6:10 AM, Li, Dongmei dongmei...@urmc.rochester.edu
wrote:

 Hi,

 I'm developing an R package and got the following suggestions for revising
 the functions:

 When a function returns a named list, a good practise is to start the
   \value section with the following:

 A named list with the following components:

   and then to itemize the components

 This is an example of my current function:

 eval-
   function (ord_vec_t, ord_vec_pvalue, eval_matrix_t, eval_matrix_pvalue,
 repetition)
   {
 rowsdim - length(ord_vec_t)
 p_pvalue - rep(NA, rowsdim)
 p_tstat - rep(NA, rowsdim)
 for (g in 1:rowsdim) {
   p_pvalue[g] - sum(eval_matrix_pvalue[g, ] =
 ord_vec_pvalue[g])/repetition
   p_tstat[g] - sum(abs(eval_matrix_t[g, ]) =
 abs(ord_vec_t[g]))/repetition
 }
 mylist - list(p_pvalue = p_pvalue, p_tstat = p_tstat)
 return(mylist)
   }

 Anyone could offer some suggestions on revising the function based on the
 comments? Thanks so much for all your help!

 Best,
 Dongmei





 ___
 Bioc-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/bioc-devel




-- 
Gabriel Becker, Ph.D
Computational Biologist
Genentech Research

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] R-devel update schedule/strategy for Bioc devel build machines

2014-11-03 Thread Gabe Becker
Dan,

Thanks for the quick and thorough response. This info will be instrumental
as we try to mach your build systems locally.

~G

On Fri, Oct 31, 2014 at 1:28 PM, Dan Tenenbaum dtene...@fredhutch.org
wrote:



 - Original Message -
  From: Dan Tenenbaum dtene...@fredhutch.org
  To: Gabe Becker becker.g...@gene.com
  Cc: bioc-devel@r-project.org, Jim Fitzgerald 
 fitzgerald.ja...@gene.com
  Sent: Thursday, October 30, 2014 10:23:58 AM
  Subject: Re: [Bioc-devel] R-devel update schedule/strategy for Bioc
 devel build   machines
 
 
 
  - Original Message -
   From: Gabe Becker becker.g...@gene.com
   To: bioc-devel@r-project.org, Jim Fitzgerald
   fitzgerald.ja...@gene.com
   Sent: Thursday, October 30, 2014 10:21:17 AM
   Subject: [Bioc-devel] R-devel update schedule/strategy for Bioc
   devel build machines
  
   Bioc admins,
  
   We have an automatic build/test mechanism for our internal packages
   which
   we'd like to match the Bioc build machines as closely as possible.
   AFAIU,
   the exact commit-specific version of R-devel used on your devel
   build
   server changes occasionally within Bioc releases, and we'd like to
   mirror
   those changes on our system.
  
   Is there a set schedule for when the exact commit of R-devel on
   your
   build
   server changes?
 
  No. We do it in response to the following events:
 
  - significant changes in the R code
  - seems like a while since we last updated R
  - milestones leading up to release (release candidates, alpha, beta,
  etc).
 
  You could scrape the build report daily to see if the version of R
  has changed.
 
   Also, do you guys grab them as tarballs, or
   checkout/update
   from the svn?
 
  Tarballs.

 BTW, I don't know if you are concerned about platforms other than Linux,
 but for Mac and Windows we install the binaries provided by CRAN, we don't
 build R from source on these platforms any more.

 Dan


  Dan
 
 
  
   Thanks,
   ~G
  
   --
   Computational Biologist
   Genentech Research
  
   [[alternative HTML version deleted]]
  
   ___
   Bioc-devel@r-project.org mailing list
   https://stat.ethz.ch/mailman/listinfo/bioc-devel
  
 
  ___
  Bioc-devel@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/bioc-devel
 




-- 
Computational Biologist
Genentech Research

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


[Bioc-devel] R-devel update schedule/strategy for Bioc devel build machines

2014-10-30 Thread Gabe Becker
Bioc admins,

We have an automatic build/test mechanism for our internal packages which
we'd like to match the Bioc build machines as closely as possible. AFAIU,
the exact commit-specific version of R-devel used on your devel build
server changes occasionally within Bioc releases, and we'd like to mirror
those changes on our system.

Is there a set schedule for when the exact commit of R-devel on your build
server changes? Also, do you guys grab them as tarballs, or checkout/update
from the svn?

Thanks,
~G

-- 
Computational Biologist
Genentech Research

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] writeVcf performance

2014-09-30 Thread Gabe Becker
Valerie,

Apologies for this taking much longer than it should have. The changes in
Bioc-devel have wreaked havoc on the code we use to to generate and process
the data we need to write out, but the fault is mine for not getting on top
of it sooner.

I'm not seeing the speed you mentioned above in the latest devel version
(1.11.35). It took ~1.5hrs to write the an expanded vcf with  56M rows
(print output and sessionInfo() follow). I'll try reading in the illumina
platinum and writing it back out to see if it is something about our
specific vcf object (could ExpandedVCF vs VCF be an issue?).

 vcfgeno
*class: ExpandedVCF *
*dim: 50307989 1 *
rowData(vcf):
  GRanges with 4 metadata columns: REF, ALT, QUAL, FILTER
info(vcf):
  DataFrame with 1 column: END
  Fields with no header: END
geno(vcf):
  SimpleList of length 7: AD, DP, FT, GT, GQ, PL, MIN_DP
geno(header(vcf)):
  Number TypeDescription

   AD 2  Integer Allelic depths (number of reads in each observed
al...
   DP 1  Integer Total read depth

   FT 1  String  Variant filters

   GT 1  String  Genotype

   GQ 1  Integer Genotype quality

   PL 3  Integer Normalized, Phred-scaled likelihoods for genotypes

   MIN_DP 1  Integer Minimum DP observed within the GVCF block

 sessionInfo()
R version 3.1.1 (2014-07-10)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8   LC_NAME=C
 [9] LC_ADDRESS=C   LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats4parallel  stats graphics  grDevices utils datasets
[8] methods   base

other attached packages:
 [1] VariantCallingPaper_0.0.3 GenomicFeatures_1.17.17
 [3] AnnotationDbi_1.27.16 Biobase_2.25.0
 [5] gmapR_1.7.8   VTGenotyping_0.0.1
 [7] BiocParallel_0.99.22  futile.logger_1.3.7
 [9] VariantTools_1.7.5*VariantAnnotation_1.11.35*
[11] Rsamtools_1.17.34 Biostrings_2.33.14
[13] XVector_0.5.8 rtracklayer_1.25.16
[15] GenomicRanges_1.17.42 GenomeInfoDb_1.1.23
[17] IRanges_1.99.28   S4Vectors_0.2.4
[19] BiocGenerics_0.11.5   switchr_0.2.1

loaded via a namespace (and not attached):
 [1] annotate_1.43.5base64enc_0.1-2
 [3] BatchJobs_1.4  BBmisc_1.7
 [5] biomaRt_2.21.1 bitops_1.0-6
 [7] brew_1.0-6 BSgenome_1.33.9
 [9] CGPtools_2.2.0 checkmate_1.4
[11] codetools_0.2-9DBI_0.3.1
[13] DESeq_1.17.0   digest_0.6.4
[15] fail_1.2   foreach_1.4.2
[17] futile.options_1.0.0   genefilter_1.47.6
[19] geneplotter_1.43.0 GenomicAlignments_1.1.29
[21] genoset_1.19.32gneDB_0.4.18
[23] grid_3.1.1 iterators_1.0.7
[25] lambda.r_1.1.6 lattice_0.20-29
[27] Matrix_1.1-4   RColorBrewer_1.0-5
[29] RCurl_1.95-4.3 rjson_0.2.14
[31] RSQLite_0.11.4 sendmailR_1.2-1
[33] splines_3.1.1  stringr_0.6.2
[35] survival_2.37-7tools_3.1.1
[37] TxDb.Hsapiens.BioMart.igis_2.3 XML_3.98-1.1
[39] xtable_1.7-4   zlibbioc_1.11.1

On Wed, Sep 17, 2014 at 2:08 PM, Valerie Obenchain voben...@fhcrc.org
wrote:

 Hi Gabe,

 Have you had a chance to test writeVcf? The changes made over the past
 week have shaved off more time. It now takes ~ 9 minutes to write the
 NA12877 example.

  dim(vcf)

 [1] 516127621

 gc()

  used   (Mb) gc trigger(Mb)   max used(Mb)
 Ncells  157818565 8428.5  298615851 15947.9  261235336 13951.5
 Vcells 1109849222 8467.5 1778386307 13568.1 1693553890 12920.8

 print(system.time(writeVcf(vcf, tempfile(

user  system elapsed
 555.282   6.700 565.700

 gc()

  used   (Mb) gc trigger(Mb)   max used(Mb)
 Ncells  157821990 8428.7  329305975 17586.9  261482807 13964.7
 Vcells 1176960717 8979.5 2183277445 16657.1 2171401955 16566.5



 In the most recent version (1.11.35) I've added chunking for files with 
 1e5 records. Right now the choice of # records per chunk is simple, based
 on total records only. We are still experimenting with this. You can
 override default chunking with 'nchunk'. Examples on the man page.

 Valerie


 On 09/08/14 08:43, Gabe Becker wrote:

 Val,

 That is great. I'll check this out and test it on our end.

 ~G

 On Mon, Sep 8, 2014 at 8:38 AM, Valerie Obenchain voben...@fhcrc.org
 mailto:voben...@fhcrc.org wrote:

 The new writeVcf code is in 1.11.28.

 Using the illumina file you suggested, geno fields only, writing now
 takes about 17 minutes.

   hdr
 class: VCFHeader
 samples(1): NA12877
 meta(6): fileformat ApplyRecalibration ... reference source
 fixed(1

Re: [Bioc-devel] 'semantically rich' subsetting of SummarizedExperiments

2014-09-20 Thread Gabe Becker
Hey all,

We are in the (very) early stages of experimenting with something that
seems relevant here: classed identifiers. We are using them for
database/mart queries, but the same concept could be useful for the cases
you're describing I think.

E.g.

 mysyms = GeneSymbol(c(BRAF, BRCA1))
 mysyms
An object of class GeneSymbol
[1] BRAF  BRCA1
 yourSE[mysyms, ]
...


This approach has the benefit of being declarative instead of heuristic
(people won't be able to accidentally invoke it), while still giving most
of the convenience I believe you are looking for.

The object classes inherit directly from character, so should just work
most of the time, but as I said it's early days; lots more testing for
functionality and usefulness is needed.

~G


On Sat, Sep 20, 2014 at 11:38 AM, Vincent Carey st...@channing.harvard.edu
wrote:

 OK by me to leave [ alone.  We could start with subsetByEntrez,
 subsetByKEGG, subsetBySymbol, subsetByGOTERM, subsetByGOID.

 Utilities to generate GRanges for queries in each of these vocabularies
 should, perhaps, be in the OrganismDb space?  Once those are in place
 no additional infrastructure is necessary?

 On Sat, Sep 20, 2014 at 12:49 PM, Tim Triche, Jr. tim.tri...@gmail.com
 wrote:

  Agreed with Sean, having tried implementing to magical alternative
 
  --t
 
   On Sep 20, 2014, at 9:31 AM, Sean Davis sdav...@mail.nih.gov wrote:
  
   Hi, Vince.
  
   I'm coming a little late to the party, but I agree with Kasper's
  sentiment
   that the less magical approach of using subsetByXXX might be the
  cleaner
   way to go for the time being.
  
   Sean
  
  
   On Sat, Sep 20, 2014 at 10:42 AM, Vincent Carey 
  st...@channing.harvard.edu
   wrote:
  
  
  
 
 https://github.com/vjcitn/biocMultiAssay/blob/master/vignettes/SEresolver.Rnw
  
   shows some modifications to [ that allow subsetting of SE by
   gene or pathway name
  
   it may be premature to work at the [ level.  Kasper suggested defining
   a suite of subsetBy operations that would accomplish this
  
   i think we could get something along these lines into the release
  without
   too much more work.  votes?
  
  [[alternative HTML version deleted]]
  
   ___
   Bioc-devel@r-project.org mailing list
   https://stat.ethz.ch/mailman/listinfo/bioc-devel
  
  [[alternative HTML version deleted]]
  
   ___
   Bioc-devel@r-project.org mailing list
   https://stat.ethz.ch/mailman/listinfo/bioc-devel
 

 [[alternative HTML version deleted]]

 ___
 Bioc-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/bioc-devel




-- 
Computational Biologist
Genentech Research

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] Please bump version number when committing changes

2014-09-08 Thread Gabe Becker
Michael,

Tags could work. Another approach  would be to update the repository and
then look in the log to see if the version number was changed in the most
recent commit. In a sense this is the converse of what our GRANBase package
does when locating historical package versions within the Bioc SVN.

This has the benefit of not requiring the authors do anything they didn't
already need to do in order to flag a release (i.e., bump the version
number).

I should note also to Dan's later point that while the Bioc repository
always builds whatever is in the latest commit currently, that need not be
the case. Our internal package repository only builds on  version bumps (of
a package or any of it's recursive dependencies).

~
G

On Fri, Sep 5, 2014 at 6:48 PM, Michael Lawrence lawrence.mich...@gene.com
wrote:

 As Pete and Ryan have pointed out, it seems that the version control system
 should somehow ease the burden of the developer here.

 Let's look at this from the github perspective, since it is likely to be
 the primary hosting mechanism for the foreseeable future. Just thinking out
 loud, if R could somehow dynamically ascertain the version of a package at
 build time, it could query the git checkout for a version. A simple
 algorithm that I have found effective in non-R projects is to consider git
 tags, which on github equate to releases. If the repository state is *at*
 the tag, then use the tag as the version. If the state is ahead of the most
 recent tag, then use the tag + latest commit hash. I wonder if R could
 support this by allowing a path to an R script in the version field?



 On Fri, Sep 5, 2014 at 6:27 PM, Vincent Carey st...@channing.harvard.edu
 wrote:

  On Fri, Sep 5, 2014 at 7:50 PM, Peter Haverty haverty.pe...@gene.com
  wrote:
 
   Hi all,
  
   I respectfully disagree.  One should certainly check in each discrete
  unit
   of work.  These will often not result in something that is ready to be
  used
   by someone else.  Bumping the version number constitutes a new release
  and
   carries the implicit promise that the package works again.  This is why
  
 
  Here I would respectfully disagree.  Code in the devel branch carries no
  guarantees.
  I think we have been pretty loose with respect to package version number
  bumping in devel
  branch; the svn tracking can be used to deal with isolation of code for
  rollbacks.
 
  In this informal regime the package version number is a simple marker of
  package state.
  I think it has served us pretty well in past years but the developer
  community was smaller
  and had fairly homogeneous habits.
 
  Clearly there is room for more regimentation in this area but at the
 moment
  I agree with
  Dan that version numbers are cheap and should be bumped when new code is
  committed.
  And the recognition by all that a devel image may not work and may change
  fairly dramatically
  while in devel should be general; whether we need to alter that is open
 to
  question but I would
  think not.
 
 
   continuous integration systems do a build when the version number
  changes.
  
   One should expect working software when installing a pre-build package
  (the
   tests passed, right?).  Checking out from SVN is for developers of that
   package and nothing should be assumed about the current state of the
  code.
  
   To keep everyone happy, one could add a commit hook to our SVN setup
 that
   would add the SVN revision number to the version string.  This would be
  for
   dev only and hopefully not sufficient to trigger a build.
  
   That's my two cents.  Happy weekend all.
  
   Regards,
  
  
  
   Pete
  
   
   Peter M. Haverty, Ph.D.
   Genentech, Inc.
   phave...@gene.com
  
  
   On Fri, Sep 5, 2014 at 4:30 PM, Dan Tenenbaum dtene...@fhcrc.org
  wrote:
  
   
   
- Original Message -
 From: Stephanie M. Gogarten sdmor...@u.washington.edu
 To: Dan Tenenbaum dtene...@fhcrc.org, bioc-devel 
bioc-devel@r-project.org
 Sent: Friday, September 5, 2014 4:27:13 PM
 Subject: Re: [Bioc-devel] Please bump version number when
 committing
changes

 I am guilty of doing this today, but I have (I think) a good
 reason.
 I'm making a bunch of changes that are all related to each other,
 but
 are being implemented and tested in stages.  I'd like to use svn to
 commit when I've made a set of changes that works, so I can roll
 back
 if
 I break something in the next step, but I'd like the users to see
 them
 all at once as a single version update.  Perhaps others are doing
 something similar?

   
I understand the motivation but this still results in an ambiguous
  state
if two different people check out your package from svn at different
   times
today (before and after your changes).
   
Version numbers are cheap, so if version 1.2.3 exists for a day
 before
version 1.2.4 (which contains all the changes you want to push to
 your

Re: [Bioc-devel] writeVcf performance

2014-09-08 Thread Gabe Becker
Val,

That is great. I'll check this out and test it on our end.

~G

On Mon, Sep 8, 2014 at 8:38 AM, Valerie Obenchain voben...@fhcrc.org
wrote:

 The new writeVcf code is in 1.11.28.

 Using the illumina file you suggested, geno fields only, writing now takes
 about 17 minutes.

  hdr
 class: VCFHeader
 samples(1): NA12877
 meta(6): fileformat ApplyRecalibration ... reference source
 fixed(1): FILTER
 info(22): AC AF ... culprit set
 geno(8): GT GQX ... PL VF

  param = ScanVcfParam(info=NA)
  vcf = readVcf(fl, , param=param)
  dim(vcf)
 [1] 516127621

  system.time(writeVcf(vcf, out.vcf))
 user   system  elapsed
  971.0326.568 1004.593

 In 1.11.28, parsing of geno data was moved to C. If this didn't speed
 things up enough we were planning to implement 'chunking' through the VCF
 and/or move the parsing of info to C, however, it looks like geno was the
 bottleneck.

 I've tested a number of samples/fields combinations in files with = .5
 million rows and the improvement over writeVcf() in release is ~ 90%.

 Valerie




 On 09/04/14 15:28, Valerie Obenchain wrote:

 Thanks Gabe. I should have something for you on Monday.

 Val


 On 09/04/2014 01:56 PM, Gabe Becker wrote:

 Val and Martin,

 Apologies for the delay.

 We realized that the Illumina platinum genome vcf files make a good test
 case, assuming you strip out all the info (info=NA when reading it into
 R) stuff.

 ftp://platgene:g3n3s...@ussd-ftp.illumina.com/NA12877_S1.genome.vcf.gz
 took about ~4.2 hrs to write out, and is about 1.5x the size of the
 files we are actually dealing with (~50M ranges vs our ~30M).

 Looking forward a new vastly improved writeVcf :).

 ~G


 On Tue, Sep 2, 2014 at 1:53 PM, Michael Lawrence
 lawrence.mich...@gene.com mailto:lawrence.mich...@gene.com wrote:

 Yes, it's very clear that the scaling is non-linear, and Gabe has
 been experimenting with a chunk-wise + parallel algorithm.
 Unfortunately there is some frustrating overhead with the
 parallelism. But I'm glad Val is arriving at something quicker.

 Michael


 On Tue, Sep 2, 2014 at 1:33 PM, Martin Morgan mtmor...@fhcrc.org
 mailto:mtmor...@fhcrc.org wrote:

 On 08/27/2014 11:56 AM, Gabe Becker wrote:

 The profiling I attached in my previous email is for 24 geno
 fields, as I said,
 but our typical usecase involves only ~4-6 fields, and is
 faster but still on
 the order of dozens of minutes.


 I think Val is arriving at a (much) more efficient
 implementation, but...

 I wanted to share my guess that the poor _scaling_ is because
 the garbage collector runs multiple times as the different
 strings are pasted together, and has to traverse, in linear
 time, increasing numbers of allocated SEXPs. So times scale
 approximately quadratically with the number of rows in the VCF

 An efficiency is to reduce the number of SEXPs in play by
 writing out in chunks -- as each chunk is written, the SEXPs
 become available for collection and are re-used. Here's my toy
 example

 time.R
 ==
 splitIndices - function (nx, ncl)
 {
  i - seq_len(nx)
  if (ncl == 0L)
  list()
  else if (ncl == 1L || nx == 1L)
  list(i)
  else {
  fuzz - min((nx - 1L)/1000, 0.4 * nx/ncl)
  breaks - seq(1 - fuzz, nx + fuzz, length = ncl + 1L)
  structure(split(i, cut(i, breaks, labels=FALSE)), names
 = NULL)
  }
 }

 x = as.character(seq_len(1e7)); y = sample(x)
 if (!is.na http://is.na(Sys.getenv(SPLIT, NA))) {
  idx - splitIndices(length(x), 20)
  system.time(for (i in idx) paste(x[i], y[i], sep=:))
 } else {
  system.time(paste(x, y, sep=:))
 }


 running under R-devel with $ SPLIT=TRUE R --no-save --quiet -f
 time.R the relevant time is

 user  system elapsed
   15.320   0.064  15.381

 versus with $ R --no-save --quiet -f time.R it is

 user  system elapsed
   95.360   0.164  95.511

 I think this is likely an overall strategy when dealing with
 character data -- processing in independent chunks of moderate
 (1M?) size (enabling as a consequence parallel evaluation in
 modest memory) that are sufficient to benefit from
 vectorization, but that do not entail allocation of large
 numbers of in-use SEXPs.

 Martin


 Sorry for the confusion.
 ~G


 On Wed, Aug 27, 2014 at 11:45 AM, Gabe Becker
 becke...@gene.com mailto:becke...@gene.com
 mailto:becke...@gene.com mailto:becke...@gene.com
 wrote:

  Martin and Val.

  I re-ran

Re: [Bioc-devel] Please bump version number when committing changes

2014-09-05 Thread Gabe Becker
Dan,

If that is is a hard BioC policy I'll endeavor to follow it (I do already
in the vast majority of cases), but I must say it makes the Bioc repository
much less useful from a development standpoint.

There are lots of reason to commit code that doesn't work and shouldn't yet
be deployed, from portability between machines to simple preservation of
work in progress. What is the suggested behavior in the under heavy
development and not safe but I don't want to lose days of work case?

~G


On Fri, Sep 5, 2014 at 4:30 PM, Dan Tenenbaum dtene...@fhcrc.org wrote:



 - Original Message -
  From: Stephanie M. Gogarten sdmor...@u.washington.edu
  To: Dan Tenenbaum dtene...@fhcrc.org, bioc-devel 
 bioc-devel@r-project.org
  Sent: Friday, September 5, 2014 4:27:13 PM
  Subject: Re: [Bioc-devel] Please bump version number when committing
 changes
 
  I am guilty of doing this today, but I have (I think) a good reason.
  I'm making a bunch of changes that are all related to each other, but
  are being implemented and tested in stages.  I'd like to use svn to
  commit when I've made a set of changes that works, so I can roll back
  if
  I break something in the next step, but I'd like the users to see
  them
  all at once as a single version update.  Perhaps others are doing
  something similar?
 

 I understand the motivation but this still results in an ambiguous state
 if two different people check out your package from svn at different times
 today (before and after your changes).

 Version numbers are cheap, so if version 1.2.3 exists for a day before
 version 1.2.4 (which contains all the changes you want to push to your
 users) then that's ok, IMO.

 Including a version bump doesn't impact whether or not you can rollback a
 commit with svn.

 Dan



  Stephanie
 
  On 9/4/14, 12:04 PM, Dan Tenenbaum wrote:
   Hello,
  
   Looking through our svn logs, I see that there are many commits
   that are not accompanied by version bumps.
   All svn commits (or, if you are using the git-svn bridge, every
   group of commits included in a push) should include a version bump
   (that is, incrementing the z segment of the x.y.z version
   number). This practice is documented at
   http://www.bioconductor.org/developers/how-to/version-numbering/ .
  
   Failure to bump the version has two consequences:
  
   1) Your changes will not propagate to our package repository or web
   site, so users installing your package via biocLite() will not
   receive the latest changes unless you bump the version.
  
   2) Users *can* always get the current files of your package using
   Subversion, but if you've made changes without bumping the version
   number, it can be difficult to troubleshoot problems. If two
   people are looking at what appears to be the same version of a
   package, but it's behaving differently, it can be really
   frustrating to realize that the packages actually differ (but not
   by version number).
  
   So if you're not already, please get in the habit of bumping the
   version number with each set of changes you commit.
  
   Let us know on bioc-devel if you have any questions about this.
  
   Thanks,
   Dan
  
   ___
   Bioc-devel@r-project.org mailing list
   https://stat.ethz.ch/mailman/listinfo/bioc-devel
  
 

 ___
 Bioc-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/bioc-devel




-- 
Computational Biologist
Genentech Research

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] writeVcf performance

2014-09-04 Thread Gabe Becker
Val and Martin,

Apologies for the delay.

We realized that the Illumina platinum genome vcf files make a good test
case, assuming you strip out all the info (info=NA when reading it into R)
stuff.

ftp://platgene:g3n3s...@ussd-ftp.illumina.com/NA12877_S1.genome.vcf.gz took
about ~4.2 hrs to write out, and is about 1.5x the size of the files we are
actually dealing with (~50M ranges vs our ~30M).

Looking forward a new vastly improved writeVcf :).

~G


On Tue, Sep 2, 2014 at 1:53 PM, Michael Lawrence lawrence.mich...@gene.com
wrote:

 Yes, it's very clear that the scaling is non-linear, and Gabe has been
 experimenting with a chunk-wise + parallel algorithm. Unfortunately there
 is some frustrating overhead with the parallelism. But I'm glad Val is
 arriving at something quicker.

 Michael


 On Tue, Sep 2, 2014 at 1:33 PM, Martin Morgan mtmor...@fhcrc.org wrote:

 On 08/27/2014 11:56 AM, Gabe Becker wrote:

 The profiling I attached in my previous email is for 24 geno fields, as
 I said,
 but our typical usecase involves only ~4-6 fields, and is faster but
 still on
 the order of dozens of minutes.


 I think Val is arriving at a (much) more efficient implementation, but...

 I wanted to share my guess that the poor _scaling_ is because the garbage
 collector runs multiple times as the different strings are pasted together,
 and has to traverse, in linear time, increasing numbers of allocated SEXPs.
 So times scale approximately quadratically with the number of rows in the
 VCF

 An efficiency is to reduce the number of SEXPs in play by writing out in
 chunks -- as each chunk is written, the SEXPs become available for
 collection and are re-used. Here's my toy example

 time.R
 ==
 splitIndices - function (nx, ncl)
 {
 i - seq_len(nx)
 if (ncl == 0L)
 list()
 else if (ncl == 1L || nx == 1L)
 list(i)
 else {
 fuzz - min((nx - 1L)/1000, 0.4 * nx/ncl)
 breaks - seq(1 - fuzz, nx + fuzz, length = ncl + 1L)
 structure(split(i, cut(i, breaks, labels=FALSE)), names = NULL)
 }
 }

 x = as.character(seq_len(1e7)); y = sample(x)
 if (!is.na(Sys.getenv(SPLIT, NA))) {
 idx - splitIndices(length(x), 20)
 system.time(for (i in idx) paste(x[i], y[i], sep=:))
 } else {
 system.time(paste(x, y, sep=:))
 }


 running under R-devel with $ SPLIT=TRUE R --no-save --quiet -f time.R the
 relevant time is

user  system elapsed
  15.320   0.064  15.381

 versus with $ R --no-save --quiet -f time.R it is

user  system elapsed
  95.360   0.164  95.511

 I think this is likely an overall strategy when dealing with character
 data -- processing in independent chunks of moderate (1M?) size (enabling
 as a consequence parallel evaluation in modest memory) that are sufficient
 to benefit from vectorization, but that do not entail allocation of large
 numbers of in-use SEXPs.

 Martin


 Sorry for the confusion.
 ~G


 On Wed, Aug 27, 2014 at 11:45 AM, Gabe Becker becke...@gene.com
 mailto:becke...@gene.com wrote:

 Martin and Val.

 I re-ran writeVcf on our (G)VCF data (34790518 ranges, 24 geno
 fields) with
 profiling enabled. The results of summaryRprof for that run are
 attached,
 though for a variety of reasons they are pretty misleading.

 It took over an hour to write (3700+seconds), so it's definitely a
 bottleneck when the data get very large, even if it isn't for
 smaller data.

 Michael and I both think the culprit is all the pasting and cbinding
 that is
 going on, and more to the point, that memory for an internal
 representation
 to be written out is allocated at all.  Streaming across the object,
 looping
 by rows and writing directly to file (e.g. from C) should be
 blisteringly
 fast in comparison.

 ~G


 On Tue, Aug 26, 2014 at 11:57 AM, Michael Lawrence 
 micha...@gene.com
 mailto:micha...@gene.com wrote:

 Gabe is still testing/profiling, but we'll send something
 randomized
 along eventually.


 On Tue, Aug 26, 2014 at 11:15 AM, Martin Morgan 
 mtmor...@fhcrc.org
 mailto:mtmor...@fhcrc.org wrote:

 I didn't see in the original thread a reproducible
 (simulated, I
 guess) example, to be explicit about what the problem is??

 Martin


 On 08/26/2014 10:47 AM, Michael Lawrence wrote:

 My understanding is that the heap optimization provided
 marginal
 gains, and
 that we need to think harder about how to optimize the
 all of
 the string
 manipulation in writeVcf. We either need to reduce it or
 reduce its
 overhead (i.e., the CHARSXP allocation). Gabe is doing
 more tests.


 On Tue, Aug 26, 2014 at 9:43 AM, Valerie Obenchain
 voben...@fhcrc.org mailto:voben...@fhcrc.org

 wrote:

 Hi Gabe,

 Martin responded, and so did

Re: [Bioc-devel] writeVcf performance

2014-08-27 Thread Gabe Becker
Martin and Val.

I re-ran writeVcf on our (G)VCF data (34790518 ranges, 24 geno fields) with
profiling enabled. The results of summaryRprof for that run are attached,
though for a variety of reasons they are pretty misleading.

It took over an hour to write (3700+seconds), so it's definitely a
bottleneck when the data get very large, even if it isn't for smaller data.

Michael and I both think the culprit is all the pasting and cbinding that
is going on, and more to the point, that memory for an internal
representation to be written out is allocated at all.  Streaming across the
object, looping by rows and writing directly to file (e.g. from C) should
be blisteringly fast in comparison.

~G


On Tue, Aug 26, 2014 at 11:57 AM, Michael Lawrence micha...@gene.com
wrote:

 Gabe is still testing/profiling, but we'll send something randomized along
 eventually.


 On Tue, Aug 26, 2014 at 11:15 AM, Martin Morgan mtmor...@fhcrc.org
 wrote:

 I didn't see in the original thread a reproducible (simulated, I guess)
 example, to be explicit about what the problem is??

 Martin


 On 08/26/2014 10:47 AM, Michael Lawrence wrote:

 My understanding is that the heap optimization provided marginal gains,
 and
 that we need to think harder about how to optimize the all of the string
 manipulation in writeVcf. We either need to reduce it or reduce its
 overhead (i.e., the CHARSXP allocation). Gabe is doing more tests.


 On Tue, Aug 26, 2014 at 9:43 AM, Valerie Obenchain voben...@fhcrc.org
 wrote:

  Hi Gabe,

 Martin responded, and so did Michael,

 https://stat.ethz.ch/pipermail/bioc-devel/2014-August/006082.html

 It sounded like Michael was ok with working with/around heap
 initialization.

 Michael, is that right or should we still consider this on the table?


 Val


 On 08/26/2014 09:34 AM, Gabe Becker wrote:

  Val,

 Has there been any movement on this? This remains a substantial
 bottleneck for us when writing very large VCF files (e.g.
 variants+genotypes for whole genome NGS samples).

 I was able to see a ~25% speedup with 4 cores and  an optimal speedup
 of ~2x with 10-12 cores for a VCF with 500k rows  using a very naive
 parallelization strategy and no other changes. I suspect this could be
 improved on quite a bit, or possibly made irrelevant with judicious use
 of serial C code.

 Did you and Martin make any plans regarding optimizing writeVcf?

 Best
 ~G


 On Tue, Aug 5, 2014 at 2:33 PM, Valerie Obenchain voben...@fhcrc.org
 mailto:voben...@fhcrc.org wrote:

  Hi Michael,

  I'm interested in working on this. I'll discuss with Martin next
  week when we're both back in the office.

  Val





  On 08/05/14 07:46, Michael Lawrence wrote:

  Hi guys (Val, Martin, Herve):

  Anyone have an itch for optimization? The writeVcf function is
  currently a
  bottleneck in our WGS genotyping pipeline. For a typical 50
  million row
  gVCF, it was taking 2.25 hours prior to yesterday's
 improvements
  (pasteCollapseRows) that brought it down to about 1 hour,
 which
  is still
  too long by my standards ( 0). Only takes 3 minutes to call
 the
  genotypes
  (and associated likelihoods etc) from the variant calls (using
  80 cores and
  450 GB RAM on one node), so the output is an issue. Profiling
  suggests that
  the running time scales non-linearly in the number of rows.

  Digging a little deeper, it seems to be something with R's
  string/memory
  allocation. Below, pasting 1 million strings takes 6 seconds,
 but
 10
  million strings takes over 2 minutes. It gets way worse with
 50
  million. I
  suspect it has something to do with R's string hash table.

  set.seed(1000)
  end - sample(1e8, 1e6)
  system.time(paste0(END, =, end))
   user  system elapsed
  6.396   0.028   6.420

  end - sample(1e8, 1e7)
  system.time(paste0(END, =, end))
   user  system elapsed
  134.714   0.352 134.978

  Indeed, even this takes a long time (in a fresh session):

  set.seed(1000)
  end - sample(1e8, 1e6)
  end - sample(1e8, 1e7)
  system.time(as.character(end))
   user  system elapsed
 57.224   0.156  57.366

  But running it a second time is faster (about what one would
  expect?):

  system.time(levels - as.character(end))
   user  system elapsed
 23.582   0.021  23.589

  I did some simple profiling of R to find that the resizing of
  the string
  hash table is not a significant component of the time. So
 maybe
  something
  to do with the R heap/gc? No time right now to go deeper. But
 I
  know Martin
  likes this sort of thing ;)

  Michael

   [[alternative HTML

Re: [Bioc-devel] writeVcf performance

2014-08-27 Thread Gabe Becker
The profiling I attached in my previous email is for 24 geno fields, as I
said, but our typical usecase involves only ~4-6 fields, and is faster but
still on the order of dozens of minutes.

Sorry for the confusion.
~G


On Wed, Aug 27, 2014 at 11:45 AM, Gabe Becker becke...@gene.com wrote:

 Martin and Val.

 I re-ran writeVcf on our (G)VCF data (34790518 ranges, 24 geno fields)
 with profiling enabled. The results of summaryRprof for that run are
 attached, though for a variety of reasons they are pretty misleading.

 It took over an hour to write (3700+seconds), so it's definitely a
 bottleneck when the data get very large, even if it isn't for smaller data.

 Michael and I both think the culprit is all the pasting and cbinding that
 is going on, and more to the point, that memory for an internal
 representation to be written out is allocated at all.  Streaming across the
 object, looping by rows and writing directly to file (e.g. from C) should
 be blisteringly fast in comparison.

 ~G


 On Tue, Aug 26, 2014 at 11:57 AM, Michael Lawrence micha...@gene.com
 wrote:

 Gabe is still testing/profiling, but we'll send something randomized
 along eventually.


 On Tue, Aug 26, 2014 at 11:15 AM, Martin Morgan mtmor...@fhcrc.org
 wrote:

 I didn't see in the original thread a reproducible (simulated, I guess)
 example, to be explicit about what the problem is??

 Martin


 On 08/26/2014 10:47 AM, Michael Lawrence wrote:

 My understanding is that the heap optimization provided marginal gains,
 and
 that we need to think harder about how to optimize the all of the string
 manipulation in writeVcf. We either need to reduce it or reduce its
 overhead (i.e., the CHARSXP allocation). Gabe is doing more tests.


 On Tue, Aug 26, 2014 at 9:43 AM, Valerie Obenchain voben...@fhcrc.org
 wrote:

  Hi Gabe,

 Martin responded, and so did Michael,

 https://stat.ethz.ch/pipermail/bioc-devel/2014-August/006082.html

 It sounded like Michael was ok with working with/around heap
 initialization.

 Michael, is that right or should we still consider this on the table?


 Val


 On 08/26/2014 09:34 AM, Gabe Becker wrote:

  Val,

 Has there been any movement on this? This remains a substantial
 bottleneck for us when writing very large VCF files (e.g.
 variants+genotypes for whole genome NGS samples).

 I was able to see a ~25% speedup with 4 cores and  an optimal
 speedup
 of ~2x with 10-12 cores for a VCF with 500k rows  using a very naive
 parallelization strategy and no other changes. I suspect this could be
 improved on quite a bit, or possibly made irrelevant with judicious
 use
 of serial C code.

 Did you and Martin make any plans regarding optimizing writeVcf?

 Best
 ~G


 On Tue, Aug 5, 2014 at 2:33 PM, Valerie Obenchain voben...@fhcrc.org
 mailto:voben...@fhcrc.org wrote:

  Hi Michael,

  I'm interested in working on this. I'll discuss with Martin next
  week when we're both back in the office.

  Val





  On 08/05/14 07:46, Michael Lawrence wrote:

  Hi guys (Val, Martin, Herve):

  Anyone have an itch for optimization? The writeVcf function
 is
  currently a
  bottleneck in our WGS genotyping pipeline. For a typical 50
  million row
  gVCF, it was taking 2.25 hours prior to yesterday's
 improvements
  (pasteCollapseRows) that brought it down to about 1 hour,
 which
  is still
  too long by my standards ( 0). Only takes 3 minutes to call
 the
  genotypes
  (and associated likelihoods etc) from the variant calls
 (using
  80 cores and
  450 GB RAM on one node), so the output is an issue. Profiling
  suggests that
  the running time scales non-linearly in the number of rows.

  Digging a little deeper, it seems to be something with R's
  string/memory
  allocation. Below, pasting 1 million strings takes 6
 seconds, but
 10
  million strings takes over 2 minutes. It gets way worse with
 50
  million. I
  suspect it has something to do with R's string hash table.

  set.seed(1000)
  end - sample(1e8, 1e6)
  system.time(paste0(END, =, end))
   user  system elapsed
  6.396   0.028   6.420

  end - sample(1e8, 1e7)
  system.time(paste0(END, =, end))
   user  system elapsed
  134.714   0.352 134.978

  Indeed, even this takes a long time (in a fresh session):

  set.seed(1000)
  end - sample(1e8, 1e6)
  end - sample(1e8, 1e7)
  system.time(as.character(end))
   user  system elapsed
 57.224   0.156  57.366

  But running it a second time is faster (about what one would
  expect?):

  system.time(levels - as.character(end))
   user  system elapsed
 23.582   0.021  23.589

  I did some simple profiling of R to find that the resizing

Re: [Bioc-devel] a S4 dispatching question

2014-08-06 Thread Gabe Becker
Mike,

This can be done. I would argue that the convenience your users get from
this is far outweighed by the damage this does to the ability to read and
easily understand the code they are writing. Users, maintainers, etc now
need to know the object class, what columns it has and what variables are
in their environment in order to predict a '['  expression's behavior.
(This is why most people I know to won't touch the attach function with a
10 foot pole, regardless of how convenient it seems at first).

But anyway, I'm not the design police and my dislike of non-standard
evaluation is far from universal, so here is how to do it:

 setClass(toydf, representation(x = data.frame))
 setMethod([, toydf, function(x, i, j, ...) {key = substitute(i) ;
eval(key, x@x, parent.frame())})
[1] [
 x = data.frame(holycrapnothere = 1:6)
 y = new(toydf, x= x)
 y
An object of class toydf
Slot x:
  holycrapnothere
1   1
2   2
3   3
4   4
5   5
6   6

 holycrapnothere
Error: object 'holycrapnothere' not found
No suitable frames for recover()

 y[holycrapnothere]
[1] 1 2 3 4 5 6


~G



On Wed, Aug 6, 2014 at 10:43 AM, Mike wjia...@fhcrc.org wrote:

 I'd like to do 'data.table-like' subsetting on `S4 class` by using 'i
 expression'. However, '[' generic function has the problem to dispatch S4
 method because of its early evaluation of i argument.
 e.g.

  gslist[Visit == 1, ]
 Error in gslist[Visit == 1, ] :
   error in evaluating the argument 'i' in selecting a method for function
 '[': Error: object 'Visit' not found

 Here 'gslist' is a S4 object `GatingSet` .

 I wasn't able to bypass this even after I defined my own S3 method (e.g
 [.GatingSet).
 I guess it is because 'GatingSet' is S4 class and there are already some
 S4 methods defined by other packages

  showMethods([)
 Function: [ (package base)
 x=AnnotatedDataFrame, i=ANY
 x=flowFrame, i=ANY
 ...

 So it will always try these S4 methods before any S3 gets its chance.

 Is there better way other than these two?
 1. change 'GatingSet' to S3 class
 2. use a different generic function that is not associated with any S4
 methods (e.g. subset)

 Mike

 ___
 Bioc-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/bioc-devel




-- 
Computational Biologist
Genentech Research

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] a S4 dispatching question

2014-08-06 Thread Gabe Becker
Mike,

This makes sense (I was actually surprised I was able to get my example to
work as easily as I did).

The thing is, if you are dispatching on i (which you must if any methods
do), you HAVE to know what class i is in order to identify the method.
AFAIK there is no way in R of doing that without evaluating the object.

So I guess we're back to no, you can't do that.

Sorry for the noise and for getting your hopes up. You can always write a
subset S3 method.

~G


On Wed, Aug 6, 2014 at 1:45 PM, Mike wjia...@fhcrc.org wrote:

  Gebe,

 Your suggestion only works in an environment where no formal argument 'i'
 is defined in any of existing  '[' method. e..g


  showMethods([)
 Function: [ (package base)
 x=nonStructure

 Once we load the package that exports '[' methods with 'i' (e.g.
 'flowCore' ), then method dispatch still tries to evaluate 'i'  in order to
 match the call to the available methods.

  library(flowCore)

  showMethods([)
 Function: [ (package base)
 x=AnnotatedDataFrame, i=ANY
 x=container, i=ANY
 x=eSet, i=ANY
 x=filterResultList, i=ANY
 x=filterSet, i=character

 x=flowFrame, i=ANY
 ...

  y[holycrapnothere]

 Error: object 'holycrapnothere' not found


 Mike

 On 08/06/2014 11:45 AM, Gabe Becker wrote:

 Mike,

  This can be done. I would argue that the convenience your users get from
 this is far outweighed by the damage this does to the ability to read and
 easily understand the code they are writing. Users, maintainers, etc now
 need to know the object class, what columns it has and what variables are
 in their environment in order to predict a '['  expression's behavior.
 (This is why most people I know to won't touch the attach function with a
 10 foot pole, regardless of how convenient it seems at first).

  But anyway, I'm not the design police and my dislike of non-standard
 evaluation is far from universal, so here is how to do it:

setClass(toydf, representation(x = data.frame))
   setMethod([, toydf, function(x, i, j, ...) {key = substitute(i) ;
 eval(key, x@x, parent.frame())})
  [1] [
   x = data.frame(holycrapnothere = 1:6)
   y = new(toydf, x= x)
  y
 An object of class toydf
 Slot x:
   holycrapnothere
 1   1
 2   2
 3   3
 4   4
 5   5
 6   6

   holycrapnothere
 Error: object 'holycrapnothere' not found
 No suitable frames for recover()

   y[holycrapnothere]
  [1] 1 2 3 4 5 6


  ~G



 On Wed, Aug 6, 2014 at 10:43 AM, Mike wjia...@fhcrc.org wrote:

 I'd like to do 'data.table-like' subsetting on `S4 class` by using 'i
 expression'. However, '[' generic function has the problem to dispatch S4
 method because of its early evaluation of i argument.
 e.g.

  gslist[Visit == 1, ]
 Error in gslist[Visit == 1, ] :
   error in evaluating the argument 'i' in selecting a method for function
 '[': Error: object 'Visit' not found

 Here 'gslist' is a S4 object `GatingSet` .

 I wasn't able to bypass this even after I defined my own S3 method (e.g
 [.GatingSet).
 I guess it is because 'GatingSet' is S4 class and there are already some
 S4 methods defined by other packages

  showMethods([)
 Function: [ (package base)
 x=AnnotatedDataFrame, i=ANY
 x=flowFrame, i=ANY
 ...

 So it will always try these S4 methods before any S3 gets its chance.

 Is there better way other than these two?
 1. change 'GatingSet' to S3 class
 2. use a different generic function that is not associated with any S4
 methods (e.g. subset)

 Mike

 ___
 Bioc-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/bioc-devel




  --
 Computational Biologist
 Genentech Research





-- 
Computational Biologist
Genentech Research

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


[Bioc-devel] A new way to manage side-by-side BioC release and Devel installs

2014-07-24 Thread Gabe Becker
Hey all,

One of the things that has come up from the recent Release/Devel
distinction thread on this list is that people don't consider there to be
an easy way of handling both at the same time.

I'd like to offer an alternative based on the switchr
https://github.com/gmbecker/switchr package I'm developing.

switchr is designed to allow seamless switching between distinct sets of
installed packages. It also has build in support for the BioC release/devel
distinction, like so:

* library(switchr)*
 *switchTo(BiocDevel)*
trying URL '
http://www.bioconductor.org/packages/3.0/bioc/bin/windows/contrib/3.1/BiocInstaller_1.15.5.zip
'
Content type 'application/zip' length 109769 bytes (107 Kb)
opened URL
downloaded 107 Kb

package 'BiocInstaller' successfully unpacked and MD5 sums checked
Switched to the 'BioC_3.0' computing environment. 31 packages are currently
available. Packages installed in your site library ARE suppressed.
 To switch back to your previous environment type switchBack()
* BiocInstaller::biocVersion()*
[1] '3.0'
* switchBack()*
Reverted to the 'original' computing environment. 40 packages are currently
available. Packages installed in your site library ARE NOT suppressed.
 To switch back to your previous environment type switchBack()
* BiocInstaller::biocVersion()*
[1] '2.14'

Note that this only works if you have an R version able to install the
devel version of the BiocInstaller package, but that will be true of any
solution using BioC Devel. I'm working on making the failure more graceful
when this is not the case, and on writing/updating the documentation more
generally.

switchr is more widely useful than this, which I will be talking about
publicly soon, but since it is so on-topic I figured I'd give this list a
heads-up on this aspect of it.

Please feel free to try switchr out, any feedback is appreciated.
~G
-- 
Computational Biologist
Genentech Research

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] A new way to manage side-by-side BioC release and Devel installs

2014-07-24 Thread Gabe Becker
Herve,

It doesn't handle that, so it is a time-saving approximation akin to
devtools, rather than an exact solution.

I recognize and agree with the reasons that Bioc devel targets R devel
(sometimes); the question becomes how often BioC devel will actually fail
if you use an R with a slightly old R. I'm hoping not often, but you know
more than me about that, so maybe that will happen more often than I think.

~G


On Thu, Jul 24, 2014 at 9:51 AM, Hervé Pagès hpa...@fhcrc.org wrote:

 Hi Gabe,

 Thanks for the heads up, sounds promising.

 Note that one major difficulty of switching between BioC release
 and devel is that between October and April it also requires to switch
 between R release and devel. How is that handled in switchr?

 Thanks,
 H.



 On 07/24/2014 08:42 AM, Gabe Becker wrote:

 Hey all,

 One of the things that has come up from the recent Release/Devel
 distinction thread on this list is that people don't consider there to be
 an easy way of handling both at the same time.

 I'd like to offer an alternative based on the switchr
 https://github.com/gmbecker/switchr package I'm developing.

 switchr is designed to allow seamless switching between distinct sets of
 installed packages. It also has build in support for the BioC
 release/devel
 distinction, like so:

  * library(switchr)*
 *switchTo(BiocDevel)*

 trying URL '
 http://www.bioconductor.org/packages/3.0/bioc/bin/windows/
 contrib/3.1/BiocInstaller_1.15.5.zip
 '
 Content type 'application/zip' length 109769 bytes (107 Kb)
 opened URL
 downloaded 107 Kb

 package 'BiocInstaller' successfully unpacked and MD5 sums checked
 Switched to the 'BioC_3.0' computing environment. 31 packages are
 currently
 available. Packages installed in your site library ARE suppressed.
   To switch back to your previous environment type switchBack()

 * BiocInstaller::biocVersion()*

 [1] '3.0'

 * switchBack()*

 Reverted to the 'original' computing environment. 40 packages are
 currently
 available. Packages installed in your site library ARE NOT suppressed.
   To switch back to your previous environment type switchBack()

 * BiocInstaller::biocVersion()*

 [1] '2.14'

 Note that this only works if you have an R version able to install the
 devel version of the BiocInstaller package, but that will be true of any
 solution using BioC Devel. I'm working on making the failure more graceful
 when this is not the case, and on writing/updating the documentation more
 generally.

 switchr is more widely useful than this, which I will be talking about
 publicly soon, but since it is so on-topic I figured I'd give this list a
 heads-up on this aspect of it.

 Please feel free to try switchr out, any feedback is appreciated.
 ~G


 --
 Hervé Pagès

 Program in Computational Biology
 Division of Public Health Sciences
 Fred Hutchinson Cancer Research Center
 1100 Fairview Ave. N, M1-B514
 P.O. Box 19024
 Seattle, WA 98109-1024

 E-mail: hpa...@fhcrc.org
 Phone:  (206) 667-5791
 Fax:(206) 667-1319




-- 
Computational Biologist
Genentech Research

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] Distinction between release and devel package websites

2014-07-22 Thread Gabe Becker
Andrzej,

If you have the an important enough bugfix the correct procedure is to
commit it to the release branch. Then users would simply update the package
via biocLite.

Note that this should be fixes ONLY. No new features. Regardless of how
much the users want them, those belong in dev.

~G


On Tue, Jul 22, 2014 at 2:06 PM, Andrzej Oleś andrzej.o...@gmail.com
wrote:

 Dear Dan, James, Michael, Matt,

 thank you, I see your point but I'm afraid I must disagree with you.
 I've had this situation numerous times that I have added/fixed
 something in the devel branch of a package and had to advice the users
 to use this latest version. Needless to say, they were typically using
 the release branch, and it was a relatively painless procedure for
 them to pick the tarball from the devel landing page and proceed with
 manual installation. Of course, this could be also achieved by
 installing from the svn, however, this is not very welcome from the
 user's perspective.

 Please correct me if I'm wrong, but to my knowledge there is no
 build-in mechanism in 'biocLite' facilitating the above described
 scenario. Therefore, I think that it could be useful to have an
 'useDevel' argument to biocLite() allowing for the installation of a
 specific package(s) from the devel rather than from the release branch
 without having to switch to devel completely. As this would be an
 optional argument defaulting to FALSE I wouldn't be worried by
 potential abuse, at least not more than by having the devel packages
 exposed on the website. As an additional precaution measure we could
 issue a warning and ask the user to confirm that (s)he is aware of the
 risks and wants to proceed.

 As Matt pointed out, direct links to package source tarballs are
 very useful for quick and lightweight inspection of package code. This
 approach combined with opening the files directly with an archive
 browser is particularly appealing, as it saves one from dealing with
 manual svn checkout and the cleanup afterwards. Please note that
 replacing the prebuild tarball with a link to the SVN has the caveat
 of getting potantially broken code. Tarballs which make it through to
 the website guarantee that the package at least builds.

 Best,
 Andrzej

 On Tue, Jul 22, 2014 at 7:57 PM, Dan Tenenbaum dtene...@fhcrc.org wrote:
 
 
  - Original Message -
  From: James W. MacDonald jmac...@uw.edu
  To: Andrzej Oleś andrzej.o...@gmail.com
  Cc: Dan Tenenbaum dtene...@fhcrc.org, Julian Gehring 
 julian.gehr...@embl.de, Michael Lawrence
  lawrence.mich...@gene.com, bioc-devel@r-project.org
  Sent: Tuesday, July 22, 2014 10:51:35 AM
  Subject: Re: [Bioc-devel] Distinction between release and devel package
 websites
 
  Hi Andrzej,
 
  On 7/22/2014 1:14 PM, Andrzej Oleś wrote:
   Hi all,
  
   I think having links is useful, e.g. for someone who uses BioC
   release
   but wants to install by hand a particular package from the devel
   branch.
 
  I'm not sure I think this is a compelling reason for keeping the
  links.
  If someone is sophisticated enough to install a devel version of a
  package into their release install, then surely they are
  sophisticated
  enough to get it from svn?
 
 
  Or to know how to find the link to the tarball.
 
  Dan
 
 
  It has always struck me as odd that we try time and again to get
  people
  to use biocLite() to install packages, yet make it so easy for people
  to
  ignore this advice.
 
  Best,
 
  Jim
 
 
 
 
  
   Distinct colors between release and devel make sense only if one
   understands their meaning, which in the end might prove not to be
   very
   useful. I would rather recommend emphasizing the distinction
   between
   release and devel in clear text across the package landing page,
   possibly in multiple places, e.g. somewhere close to the actual
   package version number; for instance, add the word devel after
   the
   version number with a tooltip which will give some
   explanation/warning
   that this is not the stable release version.
  
   The concept of a notification box is far from ideal because it
   tends
   to be annoing to the user and once dismissed 'forever' the user
   won't
   be warned in the future.
  
   I think that the actual problem arises from the fact that the
   release
   landing pages are not clearly prioritized over the devel ones.
   Maybe
   this could be  addressed by preventing the devel pages from being
   harvested by google? It could make also sense to emphasize (bold
   face,
   color, ...) the package release landing page on the result list
   returned by the search engine on the BioC website. Currently, the
   results for release and devel differ only in their relative path,
   which can be easily overlooked, and both say Package Home, see
   example below:
  
   Bioconductor - DESeq2 - /packages/release/bioc/html/DESeq2.html
  Bioconductor - DESeq2 Home
   Bioconductor - DESeq2 - /packages/devel/bioc/html/DESeq2.html
  

Re: [Bioc-devel] A question on IRanges package

2014-04-03 Thread Gabe Becker
If you want to step into the environments where the errors are happening
you want options(error=recover) not traceback. In that case,  you can step
into each of the frames.

The .local is actually the body of the method being called though, nothing
to do with an environment.

~G


On Thu, Apr 3, 2014 at 6:17 PM, Yuan Luo yuan.hypnos@gmail.com wrote:

 A side question, in the frame 3, what does .local mean? local environment?
 like if I go into frame 3, I'd be able to print out their values?


 On Thu, Apr 3, 2014 at 9:10 PM, Yuan Luo yuan.hypnos@gmail.com
 wrote:

  I think that line at 4 points to the generic definition, at least on my
  machine (and keep source works, thanks!).
 
  Just found what was the culprit, there is a wrapper in
  findOverlaps-GIntervalTree-methods.R#12. After updating that, it works.
   traceback()
  5: stop(gettextf('arg' should be one of %s, paste(dQuote(choices),
 collapse = , )), domain = NA)
  4: match.arg(type) at findOverlaps-GIntervalTree-methods.R#12
  3: .local(query, subject, maxgap, minoverlap, type, select, ...)
  2: findOverlaps(varanges, rna_tree, type = o) at
  findOverlaps-methods.R#14
  1: findOverlaps(varanges, rna_tree, type = o)
 
  Again, thank you both!
 
  Best,
  Yuan
 
 
 
  On Thu, Apr 3, 2014 at 7:49 PM, Martin Morgan mtmor...@fhcrc.org
 wrote:
 
  On 04/03/2014 04:42 PM, Michael Lawrence wrote:
 
  I'll look at the code. As far as tracking line numbers, no, because the
  code is bundled into a package -- there are no files anymore. In
  principle,
  that could be improved, but as far as I know, it hasn't been. If you're
 
 
  I think there's an option, set in .Rprofile or as an environment
 variable
  described in ?options,
 
options(keep.source.pkgs=TRUE)
 
  that annotates the source of installed packages with line numbers, e.g.,
  after doing this and then installing GenomicRanges (on my own version)
 
   library(GenomicRanges)
   findOverlaps(GRanges(), GRanges(), type=o)
 
  Error in match.arg(type) :
'arg' should be one of any, start, end, within, equal
   traceback()
  5: stop(gettextf('arg' should be one of %s, paste(dQuote(choices),
 collapse = , )), domain = NA)
  4: match.arg(type) at findOverlaps-methods.R#63
 
  3: .local(query, subject, maxgap, minoverlap, type, select, ...)
  2: findOverlaps(GRanges(), GRanges(), type = o)
  1: findOverlaps(GRanges(), GRanges(), type = o)
 
 
  I'm not really sure which lines are annotated with source information.
 
  Martin
 
 
   trying to figure out dispatch behavior, things like
  selectMethod(findOverlaps, c(GRanges, GRanges)) and
  trace(findOverlaps,
  browser, sig=c(GRanges, GRanges)) are your friend.
 
  Michael
 
 
  On Thu, Apr 3, 2014 at 4:22 PM, Yuan Luo yuan.hypnos@gmail.com
  wrote:
 
   At the moment I am using the package to tweak some design on interval
  tree
  algorithm, and much of my efforts are hack. So does the code suggest
 to
  you
  what I am doing wrong to get the match.arg failing error?
  Also, when you were developing the package, how do you tell the
  traceback
  to show line numbers and file names. My googling seems to suggest it's
  hard
  to do so in R, but I figured gurus may see better.
 
  Best,
  Yuan
 
 
  On Thu, Apr 3, 2014 at 6:05 PM, Michael Lawrence 
  lawrence.mich...@gene.com wrote:
 
   It looks like the only hits this will filter out are cases where the
  start of the query (X) is equal to the end of the subject (Y), but it
  seems
  like the o operation is different -- it requires that X start
 before
  Y
  starts and end before Y ends.
 
  We could add these relations to IRanges, but maybe findOverlaps is
 not
  the right place. Instead, we could have an %o% operator, plus
  operators for
  the rest of the algebra. But maybe it would help to hear your use
 case.
 
  Michael
 
 
 
  On Thu, Apr 3, 2014 at 1:13 PM, Yuan Luo yuan.hypnos@gmail.com
  wrote:
 
   Hi Michael,
  Thanks for your reply! I covered setGeneric as well, attached is the
  modified code.
  My change is pretty simple, I want to support the o relation in
  Allen's
  Interval algebra (http://en.wikipedia.org/wiki/
  Allen's_interval_algebra)
  So I added one more filter option
   } else if (type == o) {
   m - m[start(query)[m[,1L]]  end(subject)[m[,2L]], ,
  drop=FALSE]
   }
 
   From the stack trace, I suspect the method
  definition setMethod(findOverlaps, c(RangesList,
  IntervalForest),
  is not called upon, and the error happens before that, but since the
  stack trace doesn't tell me in which file and which line each frame
  is, I
  am a bit clueless. Is there anyway to reveal that information?
 
  Best,
  Yuan
 
 
  On Thu, Apr 3, 2014 at 3:57 PM, Michael Lawrence 
  lawrence.mich...@gene.com wrote:
 
 
 
 
  On Thu, Apr 3, 2014 at 11:33 AM, Yuan Luo 
 yuan.hypnos@gmail.com
  wrote:
 
   Hi All,
  Sorry for possible spam, but I am trying to customize IRanges
  package
  locally. For what I am doing, I introduced another 

[Bioc-devel] Overflow in as(rlelist, IntegerList

2014-03-29 Thread Gabe Becker
Hi all,

Apologies if this gets duplicated. I was not subscribed when I originally
sent it.

We have a very large RleList, such that the sum of the lengths is larger
than INT.MAX, that we want to convert to an also very large
IntegerList (whole genome coverages by chromosome I believe, though I'm not
the author of the code that ran into this so I could be wrong about the
details there).

The IntegerList will fit in memory fine, but the coercion method is trying
to collapse our RleList into a single Rle (in compress_listData() which is
called from coerceToCompressedList() ) during the coercion step, which is
too long and causes an integer overflow in the constructor. Specically the
Rle_constructor C function is calling _sum_non_neg_ints C function, which
throws an error.

I can see that there is quite a bit of machinery trying to make these
coercions go fast, but it seems they have introduced an  unintended (?)
limitation on the size of the *List objects involved. Is there a slower but
more robust coercion machinery I don't know about, and if not could one be
exposed? (Fast and more robust would also be acceptable ;-) )

Thanks,
~G

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] Overflow in as(rlelist, IntegerList

2014-03-29 Thread Gabe Becker
Fair enough, though an indication of the length limit in the documentation
at ?IntegerList or a more informative error message would be nice.

Apologies for the noise.
~G


On Sat, Mar 29, 2014 at 2:07 PM, Michael Lawrence lawrence.mich...@gene.com
 wrote:

 Just coerce the RleList to SimpleIntegerList.


 On Sat, Mar 29, 2014 at 1:10 PM, Gabe Becker becker.g...@gene.com wrote:

 Hi all,

 Apologies if this gets duplicated. I was not subscribed when I originally
 sent it.

 We have a very large RleList, such that the sum of the lengths is larger
 than INT.MAX, that we want to convert to an also very large
 IntegerList (whole genome coverages by chromosome I believe, though I'm
 not
 the author of the code that ran into this so I could be wrong about the
 details there).

 The IntegerList will fit in memory fine, but the coercion method is trying
 to collapse our RleList into a single Rle (in compress_listData() which is
 called from coerceToCompressedList() ) during the coercion step, which is
 too long and causes an integer overflow in the constructor. Specically the
 Rle_constructor C function is calling _sum_non_neg_ints C function, which
 throws an error.

 I can see that there is quite a bit of machinery trying to make these
 coercions go fast, but it seems they have introduced an  unintended (?)
 limitation on the size of the *List objects involved. Is there a slower
 but
 more robust coercion machinery I don't know about, and if not could one be
 exposed? (Fast and more robust would also be acceptable ;-) )

 Thanks,
 ~G

 [[alternative HTML version deleted]]

 ___
 Bioc-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/bioc-devel




[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel