Re: [Bioc-devel] Need for file-based handling of meta-data

2016-06-29 Thread Thomas Girke
Great thanks. I will add some ideas later this week or next week.

Thomas


On Wed, Jun 29, 2016 at 12:44 PM Martin Morgan <
martin.mor...@roswellpark.org> wrote:

> On 06/29/2016 03:42 PM, Thomas Girke wrote:
> > Yes, a "readSummarizedExperiment" would be a "modern-day analog of
> > Biobase::readExpressionSet". I also agree with the other suggestions
> > including github to get this started, and Vince's thoughts on binding
> > meta-data more tightly to source data as well as improving
> > interoperability.
>
> I started a repository at
>
>https://github.com/Bioconductor/TenStepReproducible
>
> I envision this as a package / white paper / eventually publication.
> feel free to fork etc., and / or to contribute other ideas.
>
> Martin
>
> >
> > As suggested I am sharing this discussion with the bioc-devel list.
> >
> > Thomas
> >
> > On Wed, Jun 29, 2016 at 06:22:49PM +, Vincent Carey wrote:
> >
> >> Thanks Thomas -- I think this should be circulated to biocore for
> further comments.  I am in agreement
> >> that we need to do a better job at both demonstrating the values of a)
> binding metadata to data, b)
> >> using standard containers through workflows, c) allowing
> interoperation.  I learned some useful things
> >> about spreadsheet interoperation at the conference and need to learn
> more.
> >>
> >> In a sense we are giving a specific implementation of some of the rules
> in
> >>
> >>
> http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1003285
> >>
> >> and I wonder whether we could come up with another topic for the "ten
> simple rules"
> >> series that addresses these concerns, or do something similar, perhaps
> for F1000Research,
> >> with a Bioconductor-interoperability focus on metadata.
> >
> >
> >> On Wed, Jun 29, 2016 at 06:28:49PM +, Martin Morgan wrote:
> >
> >> I guess you mean a modern-day analog of Biobase::readExpressionSet ? I
> >> like the idea of templates, and also drafting a 'Ten Steps Toward
> >> Reproduciblity in R / Bioconductor'. Would be happy to start a github
> >> repo for same if there are any takers...
> >>
> >> Martin
> >
> >> This email message may contain legally privileged and/or confidential
> >> information.  If you are not the intended recipient(s), or the employee
> >> or agent responsible for the delivery of this message to the intended
> >> recipient(s), you are hereby notified that any disclosure, copying,
> >> distribution, or use of this email message is prohibited.  If you have
> >> received this message in error, please notify the sender immediately by
> >> e-mail and delete this email message from your computer. Thank you.
> >
> >
> >> On 06/29/2016 01:57 PM, Thomas Girke wrote:
> >>> Hi Vince and Martin,
> >>>
> >>> It was great seeing you at the Bioc conference, and thanks for all your
> >>> time organizing the conference. As always it was a great success with a
> >>> lot of inspiring presentations and discussions.
> >>>
> >>> In one of our discussions you ask me for feedback why I think handling
> >>> of meta-data is currently not straightforward for non-expert users of
> >>> Bioc packages such as biologists, data analysts or developers coming
> >>> from other languages.
> >>>
> >>> In my opinion, one main reason for this difficulty is that there is no
> >>> formal utility provided for importing meta-data from external files
> >>> (e.g. tabular, json or other formats). SummarizedExperiments has all
> >>> these great functionalities but it is not intuitive to non-expert users
> >>> how to import the data into the final object. For a developer it is
> easy
> >>> to write a custom import function but not to non-R programmers.
> >>> Addressing this need would be trivial by providing an import function
> >>> that could read meta-data (optionally along with assay/range data)
> >>> provided by the user directly into SummarizedExperiment objects (and/or
> >>> RangedSummarizedExperiment). To the best of my knowledge, a
> >>> readSummarizedExperiment is currently not available, but I might be
> wrong?
> >>>
> >>> Almost equally important would be an export function so that users can
> >>> easily report intermediate results and also share them with external
> >>> software outside of R. Clearly, for the latter need exporting to an Rd
> >>> file is not an option.
> >>>
> >>> Especially the import step overlaps substantially how we communicate
> >>> with experimentalists via spreadsheets, a topic we discussed at the
> >>> meeting quite a bit. Providing one or two best practice templates of
> how
> >>> to organize experiments in the 'spirit' of SummarizedExperiment could
> >>> help to educate scientists how to format their meta-data in Excel or
> >>> Google sheets so that they are easier to process. This would also
> >>> improve reproducibility since many sample handling errors happen right
> >>> at this level. As an example file one could use here the current
> colData
> >>> sample used by the SummarizedExperiment vignette.
> >>>
> >>> 

Re: [Bioc-devel] Need for file-based handling of meta-data

2016-06-29 Thread Martin Morgan

On 06/29/2016 03:42 PM, Thomas Girke wrote:

Yes, a "readSummarizedExperiment" would be a "modern-day analog of
Biobase::readExpressionSet". I also agree with the other suggestions
including github to get this started, and Vince's thoughts on binding
meta-data more tightly to source data as well as improving
interoperability.


I started a repository at

  https://github.com/Bioconductor/TenStepReproducible

I envision this as a package / white paper / eventually publication. 
feel free to fork etc., and / or to contribute other ideas.


Martin



As suggested I am sharing this discussion with the bioc-devel list.

Thomas

On Wed, Jun 29, 2016 at 06:22:49PM +, Vincent Carey wrote:


Thanks Thomas -- I think this should be circulated to biocore for further 
comments.  I am in agreement
that we need to do a better job at both demonstrating the values of a) binding 
metadata to data, b)
using standard containers through workflows, c) allowing interoperation.  I 
learned some useful things
about spreadsheet interoperation at the conference and need to learn more.

In a sense we are giving a specific implementation of some of the rules in

http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1003285

and I wonder whether we could come up with another topic for the "ten simple 
rules"
series that addresses these concerns, or do something similar, perhaps for 
F1000Research,
with a Bioconductor-interoperability focus on metadata.




On Wed, Jun 29, 2016 at 06:28:49PM +, Martin Morgan wrote:



I guess you mean a modern-day analog of Biobase::readExpressionSet ? I
like the idea of templates, and also drafting a 'Ten Steps Toward
Reproduciblity in R / Bioconductor'. Would be happy to start a github
repo for same if there are any takers...

Martin



This email message may contain legally privileged and/or confidential
information.  If you are not the intended recipient(s), or the employee
or agent responsible for the delivery of this message to the intended
recipient(s), you are hereby notified that any disclosure, copying,
distribution, or use of this email message is prohibited.  If you have
received this message in error, please notify the sender immediately by
e-mail and delete this email message from your computer. Thank you.




On 06/29/2016 01:57 PM, Thomas Girke wrote:

Hi Vince and Martin,

It was great seeing you at the Bioc conference, and thanks for all your
time organizing the conference. As always it was a great success with a
lot of inspiring presentations and discussions.

In one of our discussions you ask me for feedback why I think handling
of meta-data is currently not straightforward for non-expert users of
Bioc packages such as biologists, data analysts or developers coming
from other languages.

In my opinion, one main reason for this difficulty is that there is no
formal utility provided for importing meta-data from external files
(e.g. tabular, json or other formats). SummarizedExperiments has all
these great functionalities but it is not intuitive to non-expert users
how to import the data into the final object. For a developer it is easy
to write a custom import function but not to non-R programmers.
Addressing this need would be trivial by providing an import function
that could read meta-data (optionally along with assay/range data)
provided by the user directly into SummarizedExperiment objects (and/or
RangedSummarizedExperiment). To the best of my knowledge, a
readSummarizedExperiment is currently not available, but I might be wrong?

Almost equally important would be an export function so that users can
easily report intermediate results and also share them with external
software outside of R. Clearly, for the latter need exporting to an Rd
file is not an option.

Especially the import step overlaps substantially how we communicate
with experimentalists via spreadsheets, a topic we discussed at the
meeting quite a bit. Providing one or two best practice templates of how
to organize experiments in the 'spirit' of SummarizedExperiment could
help to educate scientists how to format their meta-data in Excel or
Google sheets so that they are easier to process. This would also
improve reproducibility since many sample handling errors happen right
at this level. As an example file one could use here the current colData
sample used by the SummarizedExperiment vignette.

That's really all.

Best,

Thomas







This email message may contain legally privileged and/or...{{dropped:2}}

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] Need for file-based handling of meta-data

2016-06-29 Thread Thomas Girke
Yes, a "readSummarizedExperiment" would be a "modern-day analog of
Biobase::readExpressionSet". I also agree with the other suggestions
including github to get this started, and Vince's thoughts on binding
meta-data more tightly to source data as well as improving
interoperability.

As suggested I am sharing this discussion with the bioc-devel list.

Thomas

On Wed, Jun 29, 2016 at 06:22:49PM +, Vincent Carey wrote:

> Thanks Thomas -- I think this should be circulated to biocore for further 
> comments.  I am in agreement
> that we need to do a better job at both demonstrating the values of a) 
> binding metadata to data, b)
> using standard containers through workflows, c) allowing interoperation.  I 
> learned some useful things
> about spreadsheet interoperation at the conference and need to learn more.
> 
> In a sense we are giving a specific implementation of some of the rules in
> 
> http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1003285
> 
> and I wonder whether we could come up with another topic for the "ten simple 
> rules"
> series that addresses these concerns, or do something similar, perhaps for 
> F1000Research,
> with a Bioconductor-interoperability focus on metadata.


> On Wed, Jun 29, 2016 at 06:28:49PM +, Martin Morgan wrote:

> I guess you mean a modern-day analog of Biobase::readExpressionSet ? I 
> like the idea of templates, and also drafting a 'Ten Steps Toward 
> Reproduciblity in R / Bioconductor'. Would be happy to start a github 
> repo for same if there are any takers...
> 
> Martin

> This email message may contain legally privileged and/or confidential
> information.  If you are not the intended recipient(s), or the employee
> or agent responsible for the delivery of this message to the intended
> recipient(s), you are hereby notified that any disclosure, copying,
> distribution, or use of this email message is prohibited.  If you have
> received this message in error, please notify the sender immediately by
> e-mail and delete this email message from your computer. Thank you.


> On 06/29/2016 01:57 PM, Thomas Girke wrote:
> > Hi Vince and Martin,
> >
> > It was great seeing you at the Bioc conference, and thanks for all your
> > time organizing the conference. As always it was a great success with a
> > lot of inspiring presentations and discussions.
> >
> > In one of our discussions you ask me for feedback why I think handling
> > of meta-data is currently not straightforward for non-expert users of
> > Bioc packages such as biologists, data analysts or developers coming
> > from other languages.
> >
> > In my opinion, one main reason for this difficulty is that there is no
> > formal utility provided for importing meta-data from external files
> > (e.g. tabular, json or other formats). SummarizedExperiments has all
> > these great functionalities but it is not intuitive to non-expert users
> > how to import the data into the final object. For a developer it is easy
> > to write a custom import function but not to non-R programmers.
> > Addressing this need would be trivial by providing an import function
> > that could read meta-data (optionally along with assay/range data)
> > provided by the user directly into SummarizedExperiment objects (and/or
> > RangedSummarizedExperiment). To the best of my knowledge, a
> > readSummarizedExperiment is currently not available, but I might be wrong?
> >
> > Almost equally important would be an export function so that users can
> > easily report intermediate results and also share them with external
> > software outside of R. Clearly, for the latter need exporting to an Rd
> > file is not an option.
> >
> > Especially the import step overlaps substantially how we communicate
> > with experimentalists via spreadsheets, a topic we discussed at the
> > meeting quite a bit. Providing one or two best practice templates of how
> > to organize experiments in the 'spirit' of SummarizedExperiment could
> > help to educate scientists how to format their meta-data in Excel or
> > Google sheets so that they are easier to process. This would also
> > improve reproducibility since many sample handling errors happen right
> > at this level. As an example file one could use here the current colData
> > sample used by the SummarizedExperiment vignette.
> >
> > That's really all. 
> >
> > Best,
> >
> > Thomas
> >
> 
>

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel