[Bioc-devel] Package vignette and knitr

2014-04-22 Thread davide risso
Dear list,

I've modified the vignette of my EDASeq package to work with knitr.

Following the guidelines in the BiocStyle vignette, I've added

Suggests: BiocStyle, knitr
VignetteBuilder: knitr

to the DESCRIPTION file and

%\VignetteEngine{knitr::knitr}

to the top of the .Rnw file.

The package builds fine on my machine, but does not build in any of
the Bioconductor machines:
http://master.bioconductor.org/checkResults/devel/bioc-LATEST/EDASeq/zin1-buildsrc.html

Command output
* checking for file ‘EDASeq/DESCRIPTION’ ... OK
* preparing ‘EDASeq’:
* checking DESCRIPTION meta-information ... OK
* installing the package to build vignettes
* creating vignettes ... ERROR
Quitting from lines 67-69 (EDASeq.Rnw)
Error: processing vignette 'EDASeq.Rnw' failed with diagnostics:
object 'opts_chunk' not found
Execution halted

Lines 67-69 are:

<>=
library(knitr)
opts_chunk$set(dev="pdf", fig.align="center", cache=FALSE,
message=FALSE, out.width=".55\\textwidth", echo=TRUE,
results="markup", fig.show="hold")
options(width=60)
@

Am I missing something?

Best,
davide

-- 
Davide Risso, PhD
Post Doctoral Scholar
Department of Statistics
University of California, Berkeley
344 Li Ka Shing Center, #3370
Berkeley, CA 94720-3370
E-mail: davide.ri...@berkeley.edu

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] Package vignette and knitr

2014-04-22 Thread davide risso
Thank you Dan.

I should have looked at the time of my commit.

Best,
davide

On Tue, Apr 22, 2014 at 12:15 PM, Dan Tenenbaum  wrote:
> Hi Davide,
>
> - Original Message -
>> From: "davide risso" 
>> To: bioc-devel@r-project.org
>> Sent: Tuesday, April 22, 2014 12:06:09 PM
>> Subject: [Bioc-devel] Package vignette and knitr
>>
>> Dear list,
>>
>> I've modified the vignette of my EDASeq package to work with knitr.
>>
>> Following the guidelines in the BiocStyle vignette, I've added
>>
>> Suggests: BiocStyle, knitr
>> VignetteBuilder: knitr
>>
>> to the DESCRIPTION file and
>>
>> %\VignetteEngine{knitr::knitr}
>>
>> to the top of the .Rnw file.
>>
>> The package builds fine on my machine, but does not build in any of
>> the Bioconductor machines:
>> http://master.bioconductor.org/checkResults/devel/bioc-LATEST/EDASeq/zin1-buildsrc.html
>>
>> Command output
>> * checking for file ‘EDASeq/DESCRIPTION’ ... OK
>> * preparing ‘EDASeq’:
>> * checking DESCRIPTION meta-information ... OK
>> * installing the package to build vignettes
>> * creating vignettes ... ERROR
>> Quitting from lines 67-69 (EDASeq.Rnw)
>> Error: processing vignette 'EDASeq.Rnw' failed with diagnostics:
>> object 'opts_chunk' not found
>> Execution halted
>>
>> Lines 67-69 are:
>>
>> <>=
>> library(knitr)
>> opts_chunk$set(dev="pdf", fig.align="center", cache=FALSE,
>> message=FALSE, out.width=".55\\textwidth", echo=TRUE,
>> results="markup", fig.show="hold")
>> options(width=60)
>> @
>>
>> Am I missing something?
>>
>
> You added the line
> library(knitr)
>
> after 5PM yesterday (Seattle time) too late to make it into today's build 
> report. It should be OK tomorrow.
>
> Dan
>
>
>
>> Best,
>> davide
>>
>> --
>> Davide Risso, PhD
>> Post Doctoral Scholar
>> Department of Statistics
>> University of California, Berkeley
>> 344 Li Ka Shing Center, #3370
>> Berkeley, CA 94720-3370
>> E-mail: davide.ri...@berkeley.edu
>>
>> ___
>> Bioc-devel@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>



-- 
Davide Risso, PhD
Post Doctoral Scholar
Department of Statistics
University of California, Berkeley
344 Li Ka Shing Center, #3370
Berkeley, CA 94720-3370
E-mail: davide.ri...@berkeley.edu

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] biocLite message "R package not available" is confusing

2014-09-10 Thread davide risso
I just wanted to add my support to Josef request.

During the last few weeks I received several emails from users asking
me if I "plan to make a version of RUVSeq compatible with R 3.1." (My
RUVSeq package is in devel).

I understand the error comes directly from install.packages, but is
there a way for biocLite to catch this before passing it to
install.packages? Perhaps throwing a different error, like "The
package xxx is not available in the release version of Bioconductor.
Use the devel version."

The current error message is not just confusing, it's incorrect.

Best,
davide

On Tue, Aug 19, 2014 at 4:57 PM, Gabe Becker  wrote:
> Josef,
>
> The problems with reviewers you are describing sound very frustrating (for
> the author and the reviewer) but I suspect you think that biocLite is doing
> somethign that it is not (reimplementing the actual package installation
> machinery in R). Responses inline.
>
>
> On Tue, Aug 19, 2014 at 4:40 PM, Josef Spidlen  wrote:
>
>> Hi,
>> I believe that the "R package ... is not available for R ..." message as
>> produced by biocLite is a bit confusing for "new-ish" BioConductor users,
>> and I have a suggestion how things could be improved.
>>
>> Imagine that a brand new package is submitted to BioConductor and a related
>> manuscript to some journal. Your typical reviewer as well as most other
>> users that heard about the package will search for it and end up somewhere
>> under http://bioconductor.org/packages/devel/bioc/. From there, they
>> will simply copy&paste
>> source("http://bioconductor.org/biocLite.R";)
>> biocLite("myFancyPackage")
>> into their R 3.1 console, which will tell them that the package is not
>> available for their version of R despite the fact that the actual package
>> "depends" on, say, R >= 2.10.0.
>>
>
> This message is from install.packages, which biocLite calls, not biocLite
> itself. The message is the generic "the repository you pointed at doesn't
> have a version of the package you wanted installable on your system" (types
> of packages not withstanding).
>
>
>
>>
>> Your typical user may try several versions of R and than either give up, or
>> contact the maintainer. Your manuscript reviewer will reject the manuscript
>> as the "package is not available". Trust me, I have seen both happen, and I
>> have answered several questions explaining how a package that is still just
>> "a development version" can be installed.
>>
>> In order to make things less confusing, I would suggest that future
>> versions of biocLite check also the development section of BioConductor (if
>> a package cannot be found in the current release), and possibly produce a
>> message that is more informative, e.g.,
>> "R package ... is still in development; you can either try again after the
>> next BioConductor release in October|April 20xx, or you can follow these
>> steps to install the development version now: ..."
>>
>
> You can't (safely) mix package versions from Bioc-devel and Bioc-release,
> so the instructions there would be "use bioc devel". I could easily be put
> in the availability section of a paper "it will be available as a devel
> package until X/Y/, after which it will be a fully released bioc
> package"
>
>
>
>>
>> And (less important), if biocLite "knew" which packages are from CRAN
>> rather than BioConductor (cache the names of the ~6,000 CRAN packages?),
>> then it could also produce errors like "R package ... seems like a CRAN
>> package; you may want to try install.packages to install it"). That may
>> help some users as well.
>>
>
> biocLite does/can know where the packages come from, but again, it is just
> calling install.packages, and will happily install CRAN packages for you
> without any trouble.
>
> ~G
>
>
>>
>> That's just my 2c :-).
>>
>> Cheers,
>> Josef
>>
>>
>> --
>> Josef Spidlen, Ph.D.
>> Staff Scientist, Terry Fox Laboratory, BC Cancer Agency
>> 675 West 10th Avenue, Vancouver, BC, V5Z1L3, Canada
>> Tel. +1 604-675-8000, ex. 7755
>>
>> [[alternative HTML version deleted]]
>>
>> ___
>> Bioc-devel@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>
>
>
>
> --
> Computational Biologist
> Genentech Research
>
> [[alternative HTML version deleted]]
>
> ___
> Bioc-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel



-- 
Davide Risso, PhD
Post Doctoral Scholar
Department of Statistics
University of California, Berkeley
344 Li Ka Shing Center, #3370
Berkeley, CA 94720-3370
E-mail: davide.ri...@berkeley.edu

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] biocLite message "R package not available" is confusing

2014-09-10 Thread davide risso
Thanks Martin,

yes, you're right about the "use the devel version," but perhaps
biocLite could check if the package is available in the devel, just to
distinguish between a package that is not in Bioconductor (mistyping?)
and one that is not yet available in release.

Just my two cents.

Davide

On Wed, Sep 10, 2014 at 11:19 AM, Martin Morgan  wrote:
> On 09/10/2014 10:39 AM, davide risso wrote:
>>
>> I just wanted to add my support to Josef request.
>>
>> During the last few weeks I received several emails from users asking
>> me if I "plan to make a version of RUVSeq compatible with R 3.1." (My
>> RUVSeq package is in devel).
>>
>> I understand the error comes directly from install.packages, but is
>> there a way for biocLite to catch this before passing it to
>> install.packages? Perhaps throwing a different error, like "The
>> package xxx is not available in the release version of Bioconductor.
>> Use the devel version."
>>
>> The current error message is not just confusing, it's incorrect.
>
>
> I don't think we'd say 'use the devel version' but we could say something
> about 'not available for this version of Bioconductor'; I'll also think
> about getting this 'fixed' upstream (it's not the version of R, but the
> repositories specified in the call to install.packages)
>
> Martin
>
>
>>
>> Best,
>> davide
>>
>> On Tue, Aug 19, 2014 at 4:57 PM, Gabe Becker  wrote:
>>>
>>> Josef,
>>>
>>> The problems with reviewers you are describing sound very frustrating
>>> (for
>>> the author and the reviewer) but I suspect you think that biocLite is
>>> doing
>>> somethign that it is not (reimplementing the actual package installation
>>> machinery in R). Responses inline.
>>>
>>>
>>> On Tue, Aug 19, 2014 at 4:40 PM, Josef Spidlen  wrote:
>>>
>>>> Hi,
>>>> I believe that the "R package ... is not available for R ..." message as
>>>> produced by biocLite is a bit confusing for "new-ish" BioConductor
>>>> users,
>>>> and I have a suggestion how things could be improved.
>>>>
>>>> Imagine that a brand new package is submitted to BioConductor and a
>>>> related
>>>> manuscript to some journal. Your typical reviewer as well as most other
>>>> users that heard about the package will search for it and end up
>>>> somewhere
>>>> under http://bioconductor.org/packages/devel/bioc/. From there, they
>>>> will simply copy&paste
>>>> source("http://bioconductor.org/biocLite.R";)
>>>> biocLite("myFancyPackage")
>>>> into their R 3.1 console, which will tell them that the package is not
>>>> available for their version of R despite the fact that the actual
>>>> package
>>>> "depends" on, say, R >= 2.10.0.
>>>>
>>>
>>> This message is from install.packages, which biocLite calls, not biocLite
>>> itself. The message is the generic "the repository you pointed at doesn't
>>> have a version of the package you wanted installable on your system"
>>> (types
>>> of packages not withstanding).
>>>
>>>
>>>
>>>>
>>>> Your typical user may try several versions of R and than either give up,
>>>> or
>>>> contact the maintainer. Your manuscript reviewer will reject the
>>>> manuscript
>>>> as the "package is not available". Trust me, I have seen both happen,
>>>> and I
>>>> have answered several questions explaining how a package that is still
>>>> just
>>>> "a development version" can be installed.
>>>>
>>>> In order to make things less confusing, I would suggest that future
>>>> versions of biocLite check also the development section of BioConductor
>>>> (if
>>>> a package cannot be found in the current release), and possibly produce
>>>> a
>>>> message that is more informative, e.g.,
>>>> "R package ... is still in development; you can either try again after
>>>> the
>>>> next BioConductor release in October|April 20xx, or you can follow these
>>>> steps to install the development version now: ..."
>>>>
>>>
>>> You can't (safely) mix package versions from Bioc-devel and Bioc-release,
>>

Re: [Bioc-devel] biocLite message "R package not available" is confusing

2014-09-10 Thread davide risso
Hi Martin,

> I wonder where the users are getting the notion that they _should_ be able
> to install RUVSeq in Bioc 2.14? I guess they follow the link in the Nature
> Methods paper to the devel landing page, then follow the 'Installation'
> instructions without paying attention to the various 'development version'
> flags on the page.

Yes. I also think they follow the link and just copy/paste the
biocLite() command without reading the rest of the page.

Best,
davide


On Wed, Sep 10, 2014 at 12:21 PM, Martin Morgan  wrote:
> On 09/10/2014 11:37 AM, davide risso wrote:
>>
>> Thanks Martin,
>>
>> yes, you're right about the "use the devel version," but perhaps
>> biocLite could check if the package is available in the devel, just to
>> distinguish between a package that is not in Bioconductor (mistyping?)
>> and one that is not yet available in release.
>
>
> biocLite() already scores high on the tangled code scale.
>
> The problem is that this month's 'devel' will actually be next month's
> 'release' and next year's 'previous version', so it's very hard to know
> where to look for the available version, and how to reliably tell the user
> what to do to get the package.
>
> If it's a simple typo of an available package then install.packages will
> already suggest an alternative.
>
> I wonder where the users are getting the notion that they _should_ be able
> to install RUVSeq in Bioc 2.14? I guess they follow the link in the Nature
> Methods paper to the devel landing page, then follow the 'Installation'
> instructions without paying attention to the various 'development version'
> flags on the page.
>
> Martin
>
>
>>
>> Just my two cents.
>>
>> Davide
>>
>> On Wed, Sep 10, 2014 at 11:19 AM, Martin Morgan 
>> wrote:
>>>
>>> On 09/10/2014 10:39 AM, davide risso wrote:
>>>>
>>>>
>>>> I just wanted to add my support to Josef request.
>>>>
>>>> During the last few weeks I received several emails from users asking
>>>> me if I "plan to make a version of RUVSeq compatible with R 3.1." (My
>>>> RUVSeq package is in devel).
>>>>
>>>> I understand the error comes directly from install.packages, but is
>>>> there a way for biocLite to catch this before passing it to
>>>> install.packages? Perhaps throwing a different error, like "The
>>>> package xxx is not available in the release version of Bioconductor.
>>>> Use the devel version."
>>>>
>>>> The current error message is not just confusing, it's incorrect.
>>>
>>>
>>>
>>> I don't think we'd say 'use the devel version' but we could say something
>>> about 'not available for this version of Bioconductor'; I'll also think
>>> about getting this 'fixed' upstream (it's not the version of R, but the
>>> repositories specified in the call to install.packages)
>>>
>>> Martin
>>>
>>>
>>>>
>>>> Best,
>>>> davide
>>>>
>>>> On Tue, Aug 19, 2014 at 4:57 PM, Gabe Becker 
>>>> wrote:
>>>>>
>>>>>
>>>>> Josef,
>>>>>
>>>>> The problems with reviewers you are describing sound very frustrating
>>>>> (for
>>>>> the author and the reviewer) but I suspect you think that biocLite is
>>>>> doing
>>>>> somethign that it is not (reimplementing the actual package
>>>>> installation
>>>>> machinery in R). Responses inline.
>>>>>
>>>>>
>>>>> On Tue, Aug 19, 2014 at 4:40 PM, Josef Spidlen 
>>>>> wrote:
>>>>>
>>>>>> Hi,
>>>>>> I believe that the "R package ... is not available for R ..." message
>>>>>> as
>>>>>> produced by biocLite is a bit confusing for "new-ish" BioConductor
>>>>>> users,
>>>>>> and I have a suggestion how things could be improved.
>>>>>>
>>>>>> Imagine that a brand new package is submitted to BioConductor and a
>>>>>> related
>>>>>> manuscript to some journal. Your typical reviewer as well as most
>>>>>> other
>>>>>> users that heard about the package will search for it and end up
>>>>>

Re: [Bioc-devel] plotPCA for BiocGenerics

2014-10-20 Thread davide risso
Hi Kevin,

I don't agree. In the case of EDASeq (as I suppose it is the case for
DESeq/DESeq2) plotting the principal components of the count matrix is only
one of possible exploratory plots (RLE plots, MA plots, etc.).
So, in my opinion, it makes more sense from an object oriented point of
view to have multiple plotting methods for a single "RNA-seq experiment"
object.

In addition, this is the same strategy adopted elsewhere in Bioconductor,
e.g., for the plotMA method.

Just my two cents.

Best,
davide

On Mon, Oct 20, 2014 at 11:30 AM, Kevin Coombes 
wrote:

>  I understand that breaking code is a problem, and that is admittedly the
> main reason not to immediately adopt my suggestion.
>
> But as a purely logical exercise, creating a "PCA" object X or something
> similar and using either
> plot(X)
> or
> plot(as.PCA(mySpecialObject))
> is a much more sensible use of object-oriented programming/design. This
> requires no new generics (to write or to learn).
>
> And you could use it to transition away from the current system by
> convincing the various package maintainers to re-implement plotPCA as
> follows:
>
> plotPCA <- function(object, ...) {
>   plot(as.PCA(object), ...)
> }
>
> This would be relatively easy to eventually deprecate and teach users to
> switch to the alternative.
>
>
> On 10/20/2014 1:07 PM, Michael Love wrote:
>
>  hi Kevin,
>
>  that would imply there is only one way to plot an object of a given
> class. Additionally, it would break a lot of code.​
>
>  best,
>
>  Mike
>
> On Mon, Oct 20, 2014 at 12:50 PM, Kevin Coombes  > wrote:
>
>> But shouldn't they all really just be named "plot" for the appropriate
>> objects?  In which case, there would already be a perfectly good generic
>>  On Oct 20, 2014 10:27 AM, "Michael Love" 
>> wrote:
>>
>>>  I noticed that 'plotPCA' functions are defined in EDASeq, DESeq2,
>>> DESeq,
>>> affycoretools, Rcade, facopy, CopyNumber450k, netresponse, MAIT (maybe
>>> more).
>>>
>>> Sounds like a case for BiocGenerics.
>>>
>>> best,
>>>
>>> Mike
>>>
>>>  [[alternative HTML version deleted]]
>>>
>>> _______
>>> Bioc-devel@r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>>
>>
>
>
>
> --
><http://www.avast.com/>
>
> This email is free from viruses and malware because avast! Antivirus
> <http://www.avast.com/> protection is active.
>
>


-- 
Davide Risso, PhD
Post Doctoral Scholar
Division of Biostatistics
School of Public Health
University of California, Berkeley
344 Li Ka Shing Center, #3370
Berkeley, CA 94720-3370
E-mail: davide.ri...@berkeley.edu

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] plotPCA for BiocGenerics

2014-10-20 Thread davide risso
Hi Kevin,

I see your points and I agree (especially for the specific case of plotPCA
that involves some non trivial computations).

On the other hand, having a wrapper function that starting from the "raw"
data gives you a pretty picture (with virtually zero effort by the user)
using a sensible choice of parameters that are more or less OK for RNA-seq
data is useful for practitioners that just want to look for patterns in the
data.

I guess it would be the same to have a PCA method for each of the objects
and then using the plot method on those new objects, but that would just
create a lot more objects and functions than the current approach (like
Mike was saying).

Your "as.pca" or "performPCA" approach would be definitely better if all
the different methods would create objects of the *same* PCA class, but
since we are talking about different packages, I don't know how easy it
would be to coordinate. But perhaps this is the way we should go.

Best,
davide



On Mon, Oct 20, 2014 at 1:26 PM, Kevin Coombes 
wrote:

>  Hi,
>
> It depends.
>
> The "traditional" R approach to these matters is that you (a) first
> perform some sort of an analysis and save the results as an object and then
> (b) show or plot what you got.  It is part (b) that tends to be really
> generic, and (in my opinion) should have really generic names -- like
> "show" or "plot" or "hist" or "image".
>
> With PCA in particular, you usually have to perform a bunch of
> computations in order to get the principal components from some part of the
> data.  As I understand it now, these computations are performed along the
> way as part of the various "plotPCA" functions.  The "R way" to do this
> would be something like
> pca <- performPCA(mySpecialObject)  # or as.PCA(mySpecialObject)
> plot(pca) # to get the scatter plot
> This apporach has the user-friendly advantage that you can tweak the plot
> (in terms of colors, symbols, ranges, titles, etc) without having to
> recompute the principal components every time. (I often find myself
> re-plotting the same PCA several times, with different colors or symbols
> for different factrors associated with the samples.) In addition, you could
> then also do something like
> screeplot(pca)
> to get a plot of the percentages of variance explained.
>
> My own feeling is that if the object doesn't know what to do when you tell
> it to "plot" itself, then you haven't got the right abstraction.
>
> You may still end up needing generics for each kind of computation you
> want to perform (PCA, RLE, MA, etc), which is why I suggested an "as.PCA"
> function.  After all, "as" is already pretty generic.  In the long run, l
> this would herlp BioConductor developers, since they wouldn't all have to
> reimplement the visualization code; they would just have to figure out how
> to convert their own object into a PCA or RLE or MA object.
>
> And I know that this "plotWhatever" approach is used elsewhere in
> BioConductor, and it has always bothered me. It just seemed that a post
> suggesting a new generic function provided a reasonable opportunity to
> point out that there might be a better way.
>
> Best,
>   Kevin
>
> PS: My own "ClassDicsovery" package, which is available from RForge via
> *install.packages("ClassDiscovery",
> repos="http://R-Forge.R-project.org"; <http://R-Forge.R-project.org>)*
> includes a "SamplePCA" class that does something roughly similar to this
> for microarrays.
>
> PPS (off-topic): The worst offender in base R -- because it doesn't use
> this "typical" approch -- is the "heatmap" function.  Having tried to teach
> this function in several different classes, I have come to the conclusion
> that it is basically unusable by mortals.  And I think the problem is that
> it tries to combine too many steps -- clustering rows, clustering columns,
> scaling, visualization -- all in a single fiunction
>
>
> On 10/20/2014 3:47 PM, davide risso wrote:
>
> Hi Kevin,
>
>  I don't agree. In the case of EDASeq (as I suppose it is the case for
> DESeq/DESeq2) plotting the principal components of the count matrix is only
> one of possible exploratory plots (RLE plots, MA plots, etc.).
> So, in my opinion, it makes more sense from an object oriented point of
> view to have multiple plotting methods for a single "RNA-seq experiment"
> object.
>
>  In addition, this is the same strategy adopted elsewhere in
> Bioconductor, e.g., for the plotMA method.
>
>  Just my two cents.
>
>  Best,
> davide
>
> On Mon, Oct 20

[Bioc-devel] Multiple colData in SummarizedExperiment

2015-06-17 Thread davide risso
Dear list,

I'm creating an R package to store RNA-seq data of a somewhat large project
in which I'm involved.

One of the initial goals is to compare different pre-processing pipelines,
hence I have multiple expression matrices corresponding to the same samples.
The SummarizedExperiment class seems a good candidate, since I have
multiple expression matrices with the same rowData and colData information.

I have several sample-specific variables that I want to store with the
object, namely, experimental information (e.g., batch, date, experimental
condition, ...) and sample quality (e.g., proportion of aligned reads,
total duplicate reads, etc...).

Of course, I can always create one big data frame concatenating the two
(experimental info + sample quality), but it seems that both conceptually
and practically, it might be useful to have two separate data frames.
Since this seems somewhat a reasonably standard type of information that
one would want to carry on, I was wondering if it would be possible /
useful to allow the user to have multiple data.frames in the colData slot
of SummarizedExperiment.

Best,
Davide

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] Multiple colData in SummarizedExperiment

2015-06-18 Thread davide risso
Thank you all for the responses.

I didn't think about the nested DataFrame solution.  It should work.
I agree that an extension might be cleaner, but I clearly need to give it
more thought.

One of the reasons I wanted to have quality and metadata as separate slots
is that one could enforce that all the qualities are numeric, and have a
quality() method to extract just the quality scores (e.g., for plotting /
quality control). Having them in the same slot makes it harder for the user
to extract just the scores (if the column order and/or names are not
standardized).

Best,
davide


On Thu, Jun 18, 2015 at 6:35 AM Vincent Carey 
wrote:

> yes, if a formal extension is warranted.  the metadata slot could also be
> used.
>
> On Thu, Jun 18, 2015 at 2:59 PM, Kasper Daniel Hansen <
> kasperdanielhan...@gmail.com> wrote:
>
> > I think the more clean solution for Davide (if he inists on having
> separate
> > objects; I decided against it in minfi) is to extend the class to allow
> > this.
> >
> > Kasper
> >
> > On Thu, Jun 18, 2015 at 12:25 AM, Ryan  wrote:
> >
> > > Oh wow, I didn't know you could put a DataFrame into a single column of
> > > another DataFrame. That actually solves a problem for me too (I don't
> > > intend to expose nested DataFrames to the users though).
> > >
> > >
> > > On 6/17/15 7:23 PM, Martin Morgan wrote:
> > >
> > >> On 06/17/2015 11:41 AM, davide risso wrote:
> > >>
> > >>> Dear list,
> > >>>
> > >>> I'm creating an R package to store RNA-seq data of a somewhat large
> > >>> project
> > >>> in which I'm involved.
> > >>>
> > >>> One of the initial goals is to compare different pre-processing
> > >>> pipelines,
> > >>> hence I have multiple expression matrices corresponding to the same
> > >>> samples.
> > >>> The SummarizedExperiment class seems a good candidate, since I have
> > >>> multiple expression matrices with the same rowData and colData
> > >>> information.
> > >>>
> > >>> I have several sample-specific variables that I want to store with
> the
> > >>> object, namely, experimental information (e.g., batch, date,
> > experimental
> > >>> condition, ...) and sample quality (e.g., proportion of aligned
> reads,
> > >>> total duplicate reads, etc...).
> > >>>
> > >>> Of course, I can always create one big data frame concatenating the
> two
> > >>> (experimental info + sample quality), but it seems that both
> > conceptually
> > >>> and practically, it might be useful to have two separate data frames.
> > >>> Since this seems somewhat a reasonably standard type of information
> > that
> > >>> one would want to carry on, I was wondering if it would be possible /
> > >>> useful to allow the user to have multiple data.frames in the colData
> > slot
> > >>>
> > >>
> > >> Actually, colData() is a DataFrame, and a DataFrame column can
> contain a
> > >> DataFrame. So after
> > >>
> > >>   example(SummarizedExperiment)
> > >>
> > >> we could make some faux sample quality data
> > >>
> > >>   quality = DataFrame(x=1:6, y=6:1, row.names=colnames(se1))
> > >>
> > >> add this as a column in the colData()
> > >>
> > >>   colData(se1)$quality = quality
> > >>
> > >> (or create the SummarizedExperiment from a similar DataFrame up-front)
> > >> and manage our grouped data
> > >>
> > >> > colData(se1)
> > >> DataFrame with 6 rows and 2 columns
> > >> Treatment quality
> > >>
> > >> AChIP
> > >> B   Input
> > >> CChIP
> > >> D   Input
> > >> EChIP
> > >> F   Input
> > >> > colData(se1[,1:2])$quality
> > >> DataFrame with 2 rows and 2 columns
> > >>   x y
> > >>
> > >> A 1 6
> > >> B 2 5
> > >>
> > >> I'm not sure that this is any less confusing to the end user than
> having
> > >> to manage a DataFrameList(), but it does not require any new features.
> > >>
> > >> Martin
> > >>
> > >>  of SummarizedExperiment.
> > >>>
> > >>> Best,
> > >>> Davide
> > >>>
> > >>> [[alternative HTML version deleted]]
> > >>>
> > >>> ___
> > >>> Bioc-devel@r-project.org mailing list
> > >>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
> > >>>
> > >>>
> > >>
> > >>
> > > ___
> > > Bioc-devel@r-project.org mailing list
> > > https://stat.ethz.ch/mailman/listinfo/bioc-devel
> > >
> >
> > [[alternative HTML version deleted]]
> >
> > ___
> > Bioc-devel@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/bioc-devel
> >
>
> [[alternative HTML version deleted]]
>
> ___
> Bioc-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] Multiple colData in SummarizedExperiment

2015-06-18 Thread davide risso
Thanks Kasper,

I think that's a good solution.

Best,
Davide
On Thu, Jun 18, 2015 at 11:51 AM Kasper Daniel Hansen <
kasperdanielhan...@gmail.com> wrote:

> you can just implement this by having reserved column names in the colData
> slot; that will work and will take appr. 23 seconds to implement.  I agree
> it is not as clean from a design perspective, but you get 100% of the
> functionality and you can write a separate checker for the colData argument.
>
> On Thu, Jun 18, 2015 at 2:00 PM, davide risso 
> wrote:
>
>> Thank you all for the responses.
>>
>> I didn't think about the nested DataFrame solution.  It should work.
>> I agree that an extension might be cleaner, but I clearly need to give it
>> more thought.
>>
>> One of the reasons I wanted to have quality and metadata as separate
>> slots is that one could enforce that all the qualities are numeric, and
>> have a quality() method to extract just the quality scores (e.g., for
>> plotting / quality control). Having them in the same slot makes it harder
>> for the user to extract just the scores (if the column order and/or names
>> are not standardized).
>>
>> Best,
>> davide
>>
>>
>> On Thu, Jun 18, 2015 at 6:35 AM Vincent Carey 
>> wrote:
>>
>>> yes, if a formal extension is warranted.  the metadata slot could also be
>>> used.
>>>
>>> On Thu, Jun 18, 2015 at 2:59 PM, Kasper Daniel Hansen <
>>> kasperdanielhan...@gmail.com> wrote:
>>>
>>> > I think the more clean solution for Davide (if he inists on having
>>> separate
>>> > objects; I decided against it in minfi) is to extend the class to allow
>>> > this.
>>> >
>>> > Kasper
>>> >
>>> > On Thu, Jun 18, 2015 at 12:25 AM, Ryan  wrote:
>>> >
>>> > > Oh wow, I didn't know you could put a DataFrame into a single column
>>> of
>>> > > another DataFrame. That actually solves a problem for me too (I don't
>>> > > intend to expose nested DataFrames to the users though).
>>> > >
>>> > >
>>> > > On 6/17/15 7:23 PM, Martin Morgan wrote:
>>> > >
>>> > >> On 06/17/2015 11:41 AM, davide risso wrote:
>>> > >>
>>> > >>> Dear list,
>>> > >>>
>>> > >>> I'm creating an R package to store RNA-seq data of a somewhat large
>>> > >>> project
>>> > >>> in which I'm involved.
>>> > >>>
>>> > >>> One of the initial goals is to compare different pre-processing
>>> > >>> pipelines,
>>> > >>> hence I have multiple expression matrices corresponding to the same
>>> > >>> samples.
>>> > >>> The SummarizedExperiment class seems a good candidate, since I have
>>> > >>> multiple expression matrices with the same rowData and colData
>>> > >>> information.
>>> > >>>
>>> > >>> I have several sample-specific variables that I want to store with
>>> the
>>> > >>> object, namely, experimental information (e.g., batch, date,
>>> > experimental
>>> > >>> condition, ...) and sample quality (e.g., proportion of aligned
>>> reads,
>>> > >>> total duplicate reads, etc...).
>>> > >>>
>>> > >>> Of course, I can always create one big data frame concatenating
>>> the two
>>> > >>> (experimental info + sample quality), but it seems that both
>>> > conceptually
>>> > >>> and practically, it might be useful to have two separate data
>>> frames.
>>> > >>> Since this seems somewhat a reasonably standard type of information
>>> > that
>>> > >>> one would want to carry on, I was wondering if it would be
>>> possible /
>>> > >>> useful to allow the user to have multiple data.frames in the
>>> colData
>>> > slot
>>> > >>>
>>> > >>
>>> > >> Actually, colData() is a DataFrame, and a DataFrame column can
>>> contain a
>>> > >> DataFrame. So after
>>> > >>
>>> > >>   example(SummarizedExperiment)
>>> > >>
>>> > >> we could make some faux sample quality data
>>> > >>
>>> > &

Re: [Bioc-devel] Bioconductor Git/GitHub Mirrors

2015-06-19 Thread davide risso
Hi all,

I don't know why but this is not working for me.

I deleted the bridge for my RUVSeq package.

I forked Bioconductor-mirror/RUVSeq into drisso/RUVSeq-mirror

I then run:
$ git clone https://github.com/drisso/RUVSeq-mirror
$ bash ../update_remotes.sh
$ git checkout devel
error: pathspec 'devel' did not match any file(s) known to git.
$ git branch
* master

I tried the same with my other package EDASeq and again it does not work.
Am I doing something wrong? It seems that the issue is with the
update_remotes script, but I cannot figure out what's wrong.

Best,
Davide

On Fri, Jun 19, 2015 at 1:52 PM Leonardo Collado Torres 
wrote:

> Hi,
>
> Dan previously said:
>
> Try starting over again. Remove your local repository and do a fresh clone:
>
> git clone https://github.com/leekgroup/derfinderHelper.git
> cd derfinderHelper
> bash /path/to/update_remotes.sh
> git checkout devel
> git svn rebase
> git merge master --no-edit
> git svn dcommit --add-author-from
>
>
>
> So I went and did that. Note that if you have local branches, like I
> did, make sure that you push them to GitHub first.
>
>
> 1) I first encountered an authentification error. I solved it below by
> deleting any previous SVN auth info I had (I only use SVN for Bioc).
>
> $ git svn dcommit --add-author-from --username=l.collado-torres
> Committing to
> https://hedgehog.fhcrc.org/bioconductor/trunk/madman/Rpacks/derfinderHelper
> ...
>
> ERROR from SVN:
> URL access forbidden for unknown reason: Access to
> '/bioconductor/!svn/me' forbidden
> No changes between ef1c7dc3236ba31adace8c2205d30cc76e913032 and
> refs/remotes/git-svn-devel
> Resetting to the latest refs/remotes/git-svn-devel
> ERROR: Not all changes have been committed into SVN, however the committed
> ones (if any) seem to be successfully integrated into the working tree.
> Please see the above messages for details.
>
> Solution:
>
> ## Check what auth files I had
> $ ls  ~/.subversion/auth/svn.simple/*
>
> ## Was my username in any of them? Nope
> $ grep -l l.collado-torres  ~/.subversion/auth/svn.simple/*
>
> ## Clear them. If you use SVN for multiple repos, you don't want to do
> this.
> $ rm  ~/.subversion/auth/svn.simple/*
>
> ## Yup, it's empty.
> $ ls  ~/.subversion/auth/svn.simple/*
>
> ## Start over and use this line on the first git svn rebase
> $ git svn rebase --username=l.collado-torres
>
> ## Now my SVN auth info is there
> $ ls  ~/.subversion/auth/svn.simple/*
> $ grep -l l.collado-torres  ~/.subversion/auth/svn.simple/*
>
>
>
>
> 2) The commands Dan posted before worked just like he described. That
> is, it didn't commit anything back to SVN (since there's nothing new).
> I'm re-posting them for clarity.
>
> ## Remove local repository and do a fresh clone
> git clone https://github.com/leekgroup/derfinderHelper.git
> cd derfinderHelper
> bash /path/to/update_remotes.sh
> git checkout devel
> git svn rebase
> git merge master --no-edit
> git svn dcommit --add-author-from
>
> ## Output from last command
> Committing to
> https://hedgehog.fhcrc.org/bioconductor/trunk/madman/Rpacks/derfinderHelper
> ...
> No changes
> 4a21725268e160061103dadb063d0f382235d7e6~1 ==
> 4a21725268e160061103dadb063d0f382235d7e6
> No changes between 4a21725268e160061103dadb063d0f382235d7e6 and
> refs/remotes/git-svn-devel
> Resetting to the latest refs/remotes/git-svn-devel
>
>
> As Dan pointed out earlier, I do not want to make my "master" commit
> history the one from SVN. Want I want to achieve is to have my past
> commit history be the one from GitHub, and then after that, I don't
> mind the double commit messages like in
>
> https://github.com/rmflight/categoryCompare/commit/762bf7046931096b730e21d8a9c4d0d02c734602
> and
> https://github.com/rmflight/categoryCompare/commit/e132385d0ada77bb9a43e1f7e0027f6b6d59af19
>
>
> 3) With my fresh clone, if I follow the original instructions, I still
> end with the problem I described in my first email.
>
>
> $ git clone https://github.com/leekgroup/derfinderHelper.git
> Cloning into 'derfinderHelper'...
> remote: Counting objects: 387, done.
> remote: Total 387 (delta 0), reused 0 (delta 0), pack-reused 387
> Receiving objects: 100% (387/387), 81.50 KiB | 0 bytes/s, done.
> Resolving deltas: 100% (214/214), done.
> Checking connectivity... done.
> cd derfinderHelper
>
> $ bash ../update_remotes.sh
> Commit to git as normal, when you want to push your commits to svn
>   1. `git checkout devel` to switch to the devel branch. (use release-X.X
> for
> release branches)
>   2. `git svn rebase` to get the latest SVN changes.
>   3. `git merge master --no-edit` to merge your changes from the master
> branch
> or skip this step and work directly on the current branch.
>   4. `git svn rebase && git svn dcommit --add-author-from` to sync and
> commit
> your changes to svn.
>
> $ git status
> On branch master
> Your branch is up-to-date with 'origin/master'.
> nothing to commit, working directory clean
>
> $ git checkout devel
> Swi

Re: [Bioc-devel] Bioconductor Git/GitHub Mirrors

2015-06-19 Thread davide risso
Thanks Dan,

renaming the repository to RUVSeq solved the problem.

If you're still interested, there was no output from the script and the
value of $? was
$ echo $?
128

Best,
davide


On Fri, Jun 19, 2015 at 8:18 PM Dan Tenenbaum 
wrote:

>
>
> - Original Message -----
> > From: "davide risso" 
> > To: "Leonardo Collado Torres" , "Dan Tenenbaum" <
> dtene...@fredhutch.org>, "Jim Hester"
> > 
> > Cc: bioc-devel@r-project.org
> > Sent: Friday, June 19, 2015 8:07:29 PM
> > Subject: Re: [Bioc-devel] Bioconductor Git/GitHub Mirrors
> >
> >
> > Hi all,
> >
> >
> > I don't know why but this is not working for me.
> >
> >
> > I deleted the bridge for my RUVSeq package.
> >
> >
> > I forked Bioconductor-mirror/RUVSeq into drisso/RUVSeq-mirror
> >
> >
> > I then run:
> > $ git clone https://github.com/drisso/RUVSeq-mirror
>
> I assume that you then "cd RUVSeq-mirror"
>
> > $ bash ../update_remotes.sh
>
> Is there any output at all from this step?
> What is the value of $? after running the script? (echo $?). That tells us
> if it failed with an error.
>
>
>
> > $ git checkout devel
> > error: pathspec 'devel' did not match any file(s) known to git.
>
>
> It looks like the name of your forked repository has to be RUVSeq, not
> RUVSeq-mirror.
>
> That at least should work around your issue. We'll see if another fix is
> possible.
>
> Dan
>
>
> >
> > $ git branch
> > * master
> >
>
>
>
>
>
> >
> >
> > I tried the same with my other package EDASeq and again it does not
> > work. Am I doing something wrong? It seems that the issue is with
> > the update_remotes script, but I cannot figure out what's wrong.
> >
> >
> > Best,
> > Davide
> >
> >
> > On Fri, Jun 19, 2015 at 1:52 PM Leonardo Collado Torres <
> > lcoll...@jhu.edu > wrote:
> >
> >
> > Hi,
> >
> > Dan previously said:
> >
> > Try starting over again. Remove your local repository and do a fresh
> > clone:
> >
> > git clone https://github.com/leekgroup/derfinderHelper.git
> > cd derfinderHelper
> > bash /path/to/update_remotes.sh
> > git checkout devel
> > git svn rebase
> > git merge master --no-edit
> > git svn dcommit --add-author-from
> >
> >
> >
> > So I went and did that. Note that if you have local branches, like I
> > did, make sure that you push them to GitHub first.
> >
> >
> > 1) I first encountered an authentification error. I solved it below
> > by
> > deleting any previous SVN auth info I had (I only use SVN for Bioc).
> >
> > $ git svn dcommit --add-author-from --username=l.collado-torres
> > Committing to
> >
> https://hedgehog.fhcrc.org/bioconductor/trunk/madman/Rpacks/derfinderHelper
> > ...
> >
> > ERROR from SVN:
> > URL access forbidden for unknown reason: Access to
> > '/bioconductor/!svn/me' forbidden
> > No changes between ef1c7dc3236ba31adace8c2205d30cc76e913032 and
> > refs/remotes/git-svn-devel
> > Resetting to the latest refs/remotes/git-svn-devel
> > ERROR: Not all changes have been committed into SVN, however the
> > committed
> > ones (if any) seem to be successfully integrated into the working
> > tree.
> > Please see the above messages for details.
> >
> > Solution:
> >
> > ## Check what auth files I had
> > $ ls ~/.subversion/auth/svn.simple/*
> >
> > ## Was my username in any of them? Nope
> > $ grep -l l.collado-torres ~/.subversion/auth/svn.simple/*
> >
> > ## Clear them. If you use SVN for multiple repos, you don't want to
> > do this.
> > $ rm ~/.subversion/auth/svn.simple/*
> >
> > ## Yup, it's empty.
> > $ ls ~/.subversion/auth/svn.simple/*
> >
> > ## Start over and use this line on the first git svn rebase
> > $ git svn rebase --username=l.collado-torres
> >
> > ## Now my SVN auth info is there
> > $ ls ~/.subversion/auth/svn.simple/*
> > $ grep -l l.collado-torres ~/.subversion/auth/svn.simple/*
> >
> >
> >
> >
> > 2) The commands Dan posted before worked just like he described. That
> > is, it didn't commit anything back to SVN (since there's nothing
> > new).
> > I'm re-posting them for clarity.
> >
> > ## Remove local repository and do a fresh clone
> > git clone https://github.com/leekg

[Bioc-devel] License question for experimental data package

2016-03-01 Thread davide risso
Dear Bioc developers,

I recently downloaded three publicly available single-cell RNA-seq datasets
from the NCBI GEO/SRA repository and created an R package with some
gene-level summaries (read counts and FPKMs).

I'm currently using the package locally for my own tests, but I'm thinking
that this may be a useful resource for the community and thinking of
sharing it on github and eventually submit it to Bioconductor.

I was not involved in any way with the original studies, and I'm wondering
what is the best practice in terms of license / data sharing. Since there
are many experimental data packages in Bioconductor, I'm guessing that I'm
not the first person wondering about this.

>From the NCBI website, I read (quote from
https://www.ncbi.nlm.nih.gov/home/about/policies.shtml):
Databases of molecular data on the NCBI Web site include such examples as
nucleotide sequences (GenBank), protein sequences, macromolecular
structures, molecular variation, gene expression, and mapping data. They
are designed to provide and encourage access within the scientific
community to sources of current and comprehensive information. Therefore,
NCBI itself places no restrictions on the use or distribution of the data
contained therein. Nor do we accept data when the submitter has requested
restrictions on reuse or redistribution. However, some submitters of the
original data (or the country of origin of such data) may claim patent,
copyright, or other intellectual property rights in all or a portion of the
data (that has been submitted). NCBI is not in a position to assess the
validity of such claims and since there is no transfer of rights from
submitters to NCBI, NCBI has no rights to transfer to a third party.
Therefore, NCBI cannot provide comment or unrestricted permission
concerning the use, copying, or distribution of the information contained
in the molecular databases.

Should I contact the original authors for permission? Or is the fact that
the data were publicly shared enough to grant me permission to redistribute?
In that case, is there a standard license that I should use?

Thanks for any feedback / thought!

Best,
davide

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] License question for experimental data package

2016-03-03 Thread davide risso
Hi Hervé and Sean,

thanks for your help. It will indeed be interesting to hear how other
people chose the license, especially for those package that redistribute a
dataset not from their lab.

I do have an experimental data package in Bioc, zebrafishRNASeq, but it's
an experiment from a collaborator and at the time I didn't pay much
attention on which license to use.
In this case, I'd like to redistribute data from different labs. I guess I
will contact the original authors at least as a courtesy.
But I'm still keen to hear opinions on which license(s) is appropriate for
experimental data sharing.

Best,
davide




On Thu, Mar 3, 2016 at 12:50 PM Hervé Pagès  wrote:

> Hi Davide,
>
> On 03/01/2016 02:25 PM, davide risso wrote:
> > Dear Bioc developers,
> >
> > I recently downloaded three publicly available single-cell RNA-seq
> datasets
> > from the NCBI GEO/SRA repository and created an R package with some
> > gene-level summaries (read counts and FPKMs).
> >
> > I'm currently using the package locally for my own tests, but I'm
> thinking
> > that this may be a useful resource for the community and thinking of
> > sharing it on github and eventually submit it to Bioconductor.
> >
> > I was not involved in any way with the original studies, and I'm
> wondering
> > what is the best practice in terms of license / data sharing. Since there
> > are many experimental data packages in Bioconductor, I'm guessing that
> I'm
> > not the first person wondering about this.
> >
> >>From the NCBI website, I read (quote from
> > https://www.ncbi.nlm.nih.gov/home/about/policies.shtml):
> > Databases of molecular data on the NCBI Web site include such examples as
> > nucleotide sequences (GenBank), protein sequences, macromolecular
> > structures, molecular variation, gene expression, and mapping data. They
> > are designed to provide and encourage access within the scientific
> > community to sources of current and comprehensive information. Therefore,
> > NCBI itself places no restrictions on the use or distribution of the data
> > contained therein. Nor do we accept data when the submitter has requested
> > restrictions on reuse or redistribution. However, some submitters of the
> > original data (or the country of origin of such data) may claim patent,
> > copyright, or other intellectual property rights in all or a portion of
> the
> > data (that has been submitted). NCBI is not in a position to assess the
> > validity of such claims and since there is no transfer of rights from
> > submitters to NCBI, NCBI has no rights to transfer to a third party.
> > Therefore, NCBI cannot provide comment or unrestricted permission
> > concerning the use, copying, or distribution of the information contained
> > in the molecular databases.
> >
> > Should I contact the original authors for permission? Or is the fact that
> > the data were publicly shared enough to grant me permission to
> redistribute?
> > In that case, is there a standard license that I should use?
> >
> > Thanks for any feedback / thought!
>
> I don't have much to offer. AFAIK we don't really have guidelines or
> recommendations for what license to use for experimental data packages,
> except for the usual "make sure you use an appropriate license" advice.
> So far it has really been up to each author/maintainer to make sure
> they pick up a license that is compatible with the original
> license/copyright/patent of the original data they are packaging
> and with its redistribution thru the Bioconductor channel.
>
> FWIW here is a summary of the licenses used by the 276 experimental
> data packages currently in BioC devel:
>
>License   Nb of packages
>  --
>GPL  135
>Artistic-2.0  96
>LGPL  41
>other  4
>
> Would be interesting to hear from other developers about this. For
> example, how people choose between GPL vs Artistic-2.0? Is one
> license typically more appropriate for packaging and redistributing
> data that is already publicly available?
>
> H.
>
> >
> > Best,
> > davide
> >
> >   [[alternative HTML version deleted]]
> >
> > ___
> > Bioc-devel@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/bioc-devel
> >
>
> --
> Hervé Pagès
>
> Program in Computational Biology
> Division of Public Health Sciences
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N, M1-B514
> P.O. Box 19024
> Seattle, WA 98109-1024
>
> E-mail: hpa...@fredhutch.org
> Phone:  (206) 667-5791
> Fax:(206) 667-1319
>

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Re: [Bioc-devel] License question for experimental data package

2016-03-04 Thread davide risso
Thank you all for the useful suggestions and links.

I like the idea of using a CC0 license. That's likely what I will go for.

Best,
davide


On Fri, Mar 4, 2016 at 7:42 AM Tim Triche, Jr.  wrote:

> I was going to mention droit d'auteur under EU common law, but somehow
> that seemed more in Hervé's wheelhouse ;-).
>
> --t
>
> > On Mar 4, 2016, at 7:17 AM, Lyle Burgoon  wrote:
> >
> > Also keep in mind US copyright rules for data are different from
> European. We ran into this recently when wanting to publish european data
> from a web database.
> >
> >> On Mar 4, 2016 10:05 AM, "Tim Triche, Jr." 
> wrote:
> >> Data (facts) are not copyright worthy, but databases (collections of
> facts) can be.  See Feist v Rural for precedent; in short, there must be an
> inobvious and creative aspect to the database for it to be elevated to
> copyrightable status.  I doubt that a collection of datasets would clear
> this bar, but it's still worth noting.
> >>
> >> --t
> >>
> >> > On Mar 4, 2016, at 6:22 AM, Robert M. Flight 
> wrote:
> >> >
> >> > I am pretty sure in general "data" is not copyrightable per se (
> >> > http://www.lib.umich.edu/copyright/facts-and-data), so while I might
> >> > contact the original authors as a courtesy, if the data has been
> released
> >> > into any public database, then you should be free to do with it as you
> >> > please. Providing the original accession numbers for the data and
> relevant
> >> > citations (if they exist) so that it is easy for you and others to be
> given
> >> > credit if the data is used would be a good thing to do.
> >> >
> >> > Also, I would personally go with the CC0 (waive of copyright, see
> >> > https://wiki.creativecommons.org/wiki/CC0) for a data package, as
> the data
> >> > is already publicly available, you have just packaged it together
> into a
> >> > useful set.
> >> >
> >> > My 2 cents.
> >> >
> >> > -Robert
> >> >
> >> > Robert M Flight, PhD
> >> > Bioinformatics Research Associate
> >> > Resource Center for Stable Isotope Resolved Metabolomics
> >> > Manager, Systems Biology and Omics Integration Journal Club
> >> > Markey Cancer Center
> >> > CC434 Roach Building
> >> > University of Kentucky
> >> > Lexington, KY
> >> >
> >> > Twitter: @rmflight
> >> > Web: rmflight.github.io
> >> > ORCID: http://orcid.org/-0001-8141-7788
> >> > EM rfligh...@gmail.com
> >> > PH 502-509-1827
> >> >
> >> > To call in the statistician after the experiment is done may be no
> more
> >> > than asking him to perform a post-mortem examination: he may be able
> to say
> >> > what the experiment died of. - Ronald Fisher
> >> >
> >> >
> >> >
> >> > On Fri, Mar 4, 2016 at 8:52 AM Kasper Daniel Hansen <
> >> > kasperdanielhan...@gmail.com> wrote:
> >> >
> >> >> For data packages, which does not contain any code, it seems weird
> to use a
> >> >> software license such as GPL or GPL-2.  It seems better to use
> something
> >> >> like Artistic-2.0 or one of the CC licenses.
> >> >>
> >> >> On Thu, Mar 3, 2016 at 5:15 PM, davide risso  >
> >> >> wrote:
> >> >>
> >> >>> Hi Hervé and Sean,
> >> >>>
> >> >>> thanks for your help. It will indeed be interesting to hear how
> other
> >> >>> people chose the license, especially for those package that
> redistribute
> >> >> a
> >> >>> dataset not from their lab.
> >> >>>
> >> >>> I do have an experimental data package in Bioc, zebrafishRNASeq,
> but it's
> >> >>> an experiment from a collaborator and at the time I didn't pay much
> >> >>> attention on which license to use.
> >> >>> In this case, I'd like to redistribute data from different labs. I
> guess
> >> >> I
> >> >>> will contact the original authors at least as a courtesy.
> >> >>> But I'm still keen to hear opinions on which license(s) is
> appropriate
> >> >> for
> >> >>> experimental data sharing.
> >> >>>
> >> >>> Best,
> >> &g

[Bioc-devel] Found more than one class "Annotated" in cache

2016-04-24 Thread davide risso
Dear list,

we are developing a new package that defines a class that builds on
SummarizedExperiment and also imports the CRAN package phylobase.

The class "Annotated" is defined both in the S4Vectors package (a
dependency of SummarizedExperiment) and in the RNeXML package (a dependency
of phylobase). Note that we want the former. This causes the following
message to be thrown every time we create a new object.

Found more than one class "Annotated" in cache; using the first, from
namespace 'S4Vectors'


A minimal example is the following.

> library(SummarizedExperiment)> SummarizedExperiment()class: 
> SummarizedExperiment
dim: 0 0
metadata(0):
assays(0):
rownames: NULL
rowData names(0):
colnames: NULL
colData names(0):> library(phylobase)> SummarizedExperiment()Found
more than one class "Annotated" in cache; using the first, from
namespace 'S4Vectors'class: SummarizedExperiment
dim: 0 0
metadata(0):
assays(0):
rownames: NULL
rowData names(0):
colnames: NULL
colData names(0):

> sessionInfo()R Under development (unstable) (2016-03-07 r70284)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Debian GNU/Linux stretch/sid

locale:
 [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
LC_TIME=en_US.UTF-8
 [4] LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8
LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8   LC_NAME=C
LC_ADDRESS=C
[10] LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8
LC_IDENTIFICATION=C

attached base packages:
[1] stats4parallel  stats graphics  grDevices utils
datasets  methods
[9] base

other attached packages:
[1] phylobase_0.8.2 SummarizedExperiment_1.1.21
[3] Biobase_2.31.3  GenomicRanges_1.23.23
[5] GenomeInfoDb_1.7.6  IRanges_2.5.39
[7] S4Vectors_0.9.41BiocGenerics_0.17.3

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.3  plyr_1.8.3   XVector_0.11.7   iterators_1.0.8
 [5] tools_3.3.0  zlibbioc_1.17.0  uuid_0.1-2   jsonlite_0.9.19
 [9] nlme_3.1-125 lattice_0.20-33  foreach_1.4.3DBI_0.3.1
[13] httr_1.1.0   stringr_1.0.0dplyr_0.4.3  xml2_0.1.2
[17] ade4_1.7-4   grid_3.3.0   rredlist_0.1.0   reshape_0.8.5
[21] data.table_1.9.6 R6_2.1.2 XML_3.98-1.4 RNeXML_2.0.6
[25] reshape2_1.4.1   tidyr_0.4.1  magrittr_1.5 codetools_0.2-14
[29] assertthat_0.1   bold_0.3.5   taxize_0.7.5 ape_3.4
[33] stringi_1.0-1rncl_0.6.0   lazyeval_0.1.10  rotl_0.5.0
[37] chron_2.3-47


Since we need to import both packages, is there a way to explicitly use the
correct definition of "Annotated" or to not import the class "Annotated"
from RNeXML?

Or is this something that could be addressed in the SummarizedExperiment
package?

These are our DESCRIPTION and NAMESPACE file (note that we are importing
only what we need from phylobase):
https://github.com/epurdom/clusterExperiment/blob/develop/DESCRIPTION
https://github.com/epurdom/clusterExperiment/blob/develop/NAMESPACE

See also Henrik's comment at:
https://github.com/epurdom/clusterExperiment/issues/66

Thank you in advance for any help!

Best,
davide

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] Found more than one class "Annotated" in cache

2016-04-25 Thread davide risso
Thank you Michael and Martin for looking into this!

Best,
davide


On Mon, Apr 25, 2016 at 1:49 PM Michael Lawrence 
wrote:

> Yes, that's a better fix for this in principle, since it uses the
> information encoded in the extends object, although I think
> getClassFromCache() also needs to be fixed, so that getClassDef() sees
> only the imported classes when only given "where". That's a more
> general fix that works in both load orders.
>
> Maybe the superClass slot of the extends object should store its
> package? Would be convenient, if somewhat redundant. Conceptually, it
> seems the package really belongs on superClass, not on the extends
> object itself (how does a ClassExtends pertain to a package?).
>
> Index: ClassExtensions.R
> ===
> --- ClassExtensions.R (revision 70547)
> +++ ClassExtensions.R (working copy)
> @@ -316,6 +316,7 @@
>  stop(gettextf("the 'replace' argument to setIs() should be a
> function of 2 or 3 arguments, got an object of class %s",
>dQuote(class(replace))), domain = NA)
>
> +packageSlot(to) <- package
>  new(extClass, subClass = Class, superClass = to, package = package,
>   coerce = coerce, test = test, replace = replace, simple = simple,
>   by = by, dataPart = dataPart, distance = distance)
>
>
>
> On Mon, Apr 25, 2016 at 1:24 PM, Martin Morgan
>  wrote:
> >
> >
> > On 04/25/2016 03:53 PM, Michael Lawrence wrote:
> >>
> >> Yea, this is a bug in R, in my opinion. The class cache circumvents
> >> the namespace imports. In my working copy I've fixed it by falling
> >> back to the namespace search when there are multiple hits. That at
> >> least fixes this case.
> >
> >
> > This also fixes it
> >
> > Index: src/library/methods/R/SClasses.R
> > ===
> > --- src/library/methods/R/SClasses.R(revision 70547)
> > +++ src/library/methods/R/SClasses.R(working copy)
> > @@ -524,7 +524,7 @@
> > superClass <- exti@superClass
> > if(!exti@simple && !is(object, superClass))
> > next ## skip conditional relations that don't hold for this
> > object
> > -   superDef <- getClassDef(superClass, where = where)
> > +   superDef <- getClassDef(superClass, package = packageSlot(exti))
> > if(is.null(superDef)) {
> > errors <- c(errors,
> > paste0("superclass \"", superClass,
> >
> >
> > There's another problem seen by loading the packages in reverse order
> >
> >> suppressPackageStartupMessages({ library(RNeXML); library(GenomicRanges)
> >> })
> > Found more than one class "Annotated" in cache; using the first, from
> > namespace 'RNeXML'
> > Also defined by 'S4Vectors'
> > ...
> >
> > which is from
> >
> > [[17]]
> > S4Vectors:::setDefaultSlotValue("XRaw", "shared", new("SharedRaw"),
> > where = asNamespace(pkgname))
> >
> > [[18]]
> > resetClass(classname, classdef, where = where)
> >
> > [[19]]
> > completeClassDefinition(Class, classDef, where)
> >
> > [[20]]
> > .completeClassSlots(ClassDef, where)
> >
> > [[21]]
> > isClass(eClass, where = where)
> >
> > [[22]]
> > getClassDef(Class, where)
> >
> > [[23]]
> > .getClassFromCache(Class, where, package = package, resolve.msg =
> > resolve.msg)
> >
> > but I haven't quite got to the bottom of that. Also, I think these are
> both
> > different from but related to
> >
> >> suppressPackageStartupMessages(library(Statomica))
> > Error in .mergeMethodsTable(generic, mtable, tt, attach) :
> >   trying to get slot "defined" from an object of a basic class ("list")
> with
> > no slots
> > Error: package or namespace load failed for 'Statomica'
> >
> >
> >>
> >> You can disable the message in the short term by setting the
> >> "getClass.msg" option to FALSE.
> >>
> >> Michael
> >>
> >> On Sun, Apr 24, 2016 at 12:50 PM, davide risso 
> >> wrote:
> >>>
> >>> Dear list,
> >>>
> >>> we are developing a new package that defines a class that builds on
> >>> SummarizedExperiment and also imports the CRAN package phylobase.
> >&g

Re: [Bioc-devel] Found more than one class "Annotated" in cache

2016-04-29 Thread davide risso
Hi all,

when running R CMD check on our package, we get the following warning:

checking whether the namespace can be unloaded cleanly ... WARNING
 unloading
Error in .getClassFromCache(what, resolve.confl = "all") :
  argument "where" is missing, with no default
Calls: unloadNamespace ...  -> .removeSuperclassBackRefs ->
.getClassFromCache
Execution halted

Is this warning related to the message mentioned earlier in this thread? If
so, should I expect this warning to go away once the bug in the class
import is fixed?

Here for the full Travis CI report:
https://travis-ci.org/epurdom/clusterExperiment/builds/126175973

Thanks,
Davide


On Mon, Apr 25, 2016 at 2:00 PM davide risso  wrote:

> Thank you Michael and Martin for looking into this!
>
> Best,
> davide
>
>
> On Mon, Apr 25, 2016 at 1:49 PM Michael Lawrence <
> lawrence.mich...@gene.com> wrote:
>
>> Yes, that's a better fix for this in principle, since it uses the
>> information encoded in the extends object, although I think
>> getClassFromCache() also needs to be fixed, so that getClassDef() sees
>> only the imported classes when only given "where". That's a more
>> general fix that works in both load orders.
>>
>> Maybe the superClass slot of the extends object should store its
>> package? Would be convenient, if somewhat redundant. Conceptually, it
>> seems the package really belongs on superClass, not on the extends
>> object itself (how does a ClassExtends pertain to a package?).
>>
>> Index: ClassExtensions.R
>> ===
>> --- ClassExtensions.R (revision 70547)
>> +++ ClassExtensions.R (working copy)
>> @@ -316,6 +316,7 @@
>>  stop(gettextf("the 'replace' argument to setIs() should be a
>> function of 2 or 3 arguments, got an object of class %s",
>>dQuote(class(replace))), domain = NA)
>>
>> +packageSlot(to) <- package
>>  new(extClass, subClass = Class, superClass = to, package = package,
>>   coerce = coerce, test = test, replace = replace, simple = simple,
>>   by = by, dataPart = dataPart, distance = distance)
>>
>>
>>
>> On Mon, Apr 25, 2016 at 1:24 PM, Martin Morgan
>>  wrote:
>> >
>> >
>> > On 04/25/2016 03:53 PM, Michael Lawrence wrote:
>> >>
>> >> Yea, this is a bug in R, in my opinion. The class cache circumvents
>> >> the namespace imports. In my working copy I've fixed it by falling
>> >> back to the namespace search when there are multiple hits. That at
>> >> least fixes this case.
>> >
>> >
>> > This also fixes it
>> >
>> > Index: src/library/methods/R/SClasses.R
>> > ===
>> > --- src/library/methods/R/SClasses.R(revision 70547)
>> > +++ src/library/methods/R/SClasses.R(working copy)
>> > @@ -524,7 +524,7 @@
>> > superClass <- exti@superClass
>> > if(!exti@simple && !is(object, superClass))
>> > next ## skip conditional relations that don't hold for this
>> > object
>> > -   superDef <- getClassDef(superClass, where = where)
>> > +   superDef <- getClassDef(superClass, package = packageSlot(exti))
>> > if(is.null(superDef)) {
>> > errors <- c(errors,
>> > paste0("superclass \"", superClass,
>> >
>> >
>> > There's another problem seen by loading the packages in reverse order
>> >
>> >> suppressPackageStartupMessages({ library(RNeXML);
>> library(GenomicRanges)
>> >> })
>> > Found more than one class "Annotated" in cache; using the first, from
>> > namespace 'RNeXML'
>> > Also defined by 'S4Vectors'
>> > ...
>> >
>> > which is from
>> >
>> > [[17]]
>> > S4Vectors:::setDefaultSlotValue("XRaw", "shared", new("SharedRaw"),
>> > where = asNamespace(pkgname))
>> >
>> > [[18]]
>> > resetClass(classname, classdef, where = where)
>> >
>> > [[19]]
>> > completeClassDefinition(Class, classDef, where)
>> >
>> > [[20]]
>> > .completeClassSlots(ClassDef, where)
>> >
>> > [[21]]
>> > isClass(eClass, where = where)
>> >
>> > [[22]]
>> > getClassDef(Class, where)
>> &

Re: [Bioc-devel] problem with class definitions between S4Vectors and RNeXML in using Summarized Experiment

2018-04-13 Thread Davide Risso
Hi Michael,

Thanks for looking into this.

Can you or someone with push permission to S4Vectors implement the workaround 
that you mentioned?

Happy to create a pull request on Github if that helps.

We’re trying to solve this to fix the clusterExperiment package build on 
Bioc-devel.

Thanks,
Davide


On Apr 12, 2018, at 1:27 PM, Michael Lawrence 
mailto:lawrence.mich...@gene.com>> wrote:

Yea it's basically

library(S4Vectors)
library(RNeXML)
is(1:5, "Annotated")
# Found more than one class "Annotated" in cache; using the first,
from namespace 'S4Vectors'
# Also defined by ‘RNeXML’
# [1] FALSE

But can be worked around:
is(1:5, getClass("Annotated", where=getNamespace("S4Vectors"))
# [1] FALSE

Of course, using class objects instead of class names in every call to
is() is not very palatable, but that's how it's done in all other
languages, as far as I know.

There is an inconsistency between new() and is() when resolving the
class name. new() looks into the calling package's namespace, while
is() looks at the package for the class of the 'object'. The new()
approach seems sensible for that function, since packages should be
abstracting the construction of their objects with constructors. The
is() approach is broken though, because it's easy to imagine cases
like where some foreign object is passed to a function, and the
function checks the type with is().

I can change is() to use the calling package as the fallback, so
DataFrame(1:5) no longer produces a message. But calling it from
another package, or global env, will still break, just like new(). How
does that sound?

On the other hand, maybe we should be more careful with calls to is()
and use class objects. That's a good workaround in this case, anyway,
since I probably can't get the change into R before release.

Michael


On Thu, Apr 12, 2018 at 9:03 AM, Aaron Lun 
mailto:a...@wehi.edu.au>> wrote:
Well, it's not really SingleCellExperiment's problem, either.

library(S4Vectors)
DataFrame(1:5) # Silent, okay.
library(RNeXML)
DataFrame(1:5) # Prints out the message
## Found more than one class "Annotated" in cache; using the first,
from namespace 'S4Vectors'
## Also defined by ‘RNeXML’

Session information attached below.

-Aaron

sessionInfo()
R Under development (unstable) (2018-03-26 r74466)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.4 LTS

Matrix products: default
BLAS: 
/home/cri.camres.org/lun01/Software/R/trunk/lib/libRblas.so
LAPACK: 
/home/cri.camres.org/lun01/Software/R/trunk/lib/libRlapack.so

locale:
[1] LC_CTYPE=en_GB.UTF-8   LC_NUMERIC=C
[3] LC_TIME=en_GB.UTF-8LC_COLLATE=en_GB.UTF-8
[5] LC_MONETARY=en_GB.UTF-8LC_MESSAGES=en_GB.UTF-8
[7] LC_PAPER=en_GB.UTF-8   LC_NAME=C
[9] LC_ADDRESS=C   LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] parallel  stats4stats graphics  grDevices
utils datasets
[8] methods   base

other attached packages:
[1] RNeXML_2.0.8ape_5.1 S4Vectors_0.17.41
[4] BiocGenerics_0.25.3

loaded via a namespace (and not attached):
[1] Rcpp_0.12.16compiler_3.6.0  pillar_1.2.1
[4] plyr_1.8.4  bindr_0.1.1 iterators_1.0.9
[7] tools_3.6.0 uuid_0.1-2  jsonlite_1.5
[10] tibble_1.4.2nlme_3.1-137lattice_0.20-35
[13] pkgconfig_2.0.1 rlang_0.2.0 foreach_1.4.4
[16] crul_0.5.2  curl_3.2bindrcpp_0.2.2
[19] httr_1.3.1  stringr_1.3.0   dplyr_0.7.4
[22] xml2_1.2.0  grid_3.6.0  reshape_0.8.7
[25] glue_1.2.0  data.table_1.10.4-3 R6_2.2.2
[28] XML_3.98-1.10   purrr_0.2.4 reshape2_1.4.3
[31] tidyr_0.8.0 magrittr_1.5codetools_0.2-15
[34] assertthat_0.2.0bold_0.5.0  taxize_0.9.3
[37] stringi_1.1.7   lazyeval_0.2.1  zoo_1.8-1


On Thu, 2018-04-12 at 17:40 +0200, Elizabeth Purdom wrote:
Just to follow up on my previous post. I am able to replicate the
problem in the problem like in the github post from 2 years ago (http
s://github.com/epurdom/clusterExperiment/issues/66
)
 only now it
is not the SummarizedExperiment class but the SingleCellExperiment
class that has the problem. [And I was incorrect, the problem does
occur in  development version 2018-03-22 r74446].

So this is actually a problem with the SingleCellExperiment package —
sorry for the incorrect subject line.

All of the best,
Elizabeth



library(SingleCellExperiment)
SingleCellExperiment()
class: SingleCellExperiment
dim: 0 0
metadata(0):
assays(0):
rownames: NU

[Bioc-devel] Write access to scRNAseq

2019-06-10 Thread davide risso
Hi all,

Can you please give write access to the scRNAseq package to Aaron Lun (
infinite.monkeys.with.keyboa...@gmail.com)?

Best,
Davide

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel