Re: BioRDF Telcon

Kei Cheung Fri, 24 Jul 2009 11:34:46 -0700

Hi Michael,

Thanks for your detailed description of mageml. For our use case, weprobably don't need to use all the information captured in mageml. Thetypes of information we are currently focusing on includeexperiment/sample annotation (including some provenance as youindicated) and gene lists and how they are linked to existingontologies. A couple of convincing examples may be enough to start. Ican relay your comments about the validity of mageml to the consortium,although I don't know whether they can address them.


Cheers,

-Kei

Miller, Michael D (Rosetta) wrote:

hi kei and helen,

like helen, i've been following the HCLS working groups with great
interest.  as one of the designers, with helen, of the MAGE-ML and
MAGE-TAB specs i might be able to provide a little technical insight
into the formats.

(from helen)
"This is probably as we don't have data - here's a list of humanexperiments with the term neuron - if any of these are useful, then Ican prioritize their curation and inclusion in an atlas release"
kei, are the NIH Neuroscience Microarry Consortium exeriments you've
cited and others like them in GEO or ArrayExpress?  a set of those could
be a good starting point for helen.

My understanding is that the publicly visible mciroarray projects in theneuroscience microarray consortium should also be in geo and/orarrayexpress, although I don't know whether all the annotations arepreserved.

first, MAGE-ML is based on a DTD[1], not an XSD.  in early 2002 as the
OMG Gene Expression specification[1] was being finalized, XSD was still
in its infancy so we weren't comfortable at that point generating a XSD.
the MAGE-OM UML[2], in a very early XMI format from Rational Rose and

UniSys, was used to generate the DTD with code we wrote ourselves[3].

the UML model was designed to capture the flow of a microarray
experiment and how the resulting arrays were organized in the experiment
based on how the samples were treated and/or on the samples' phenotypes
for the purpose of a reviewer understanding the methodology and for a

researcher replicating and/or re-analyzing the results.

some of the details of the flow may not be of much interest, i.e. it
might be worth simply connecting the BioSource elements with their gene
expression data and not worrying about how the hybridization was
performed.  but that depends on what you want to do and you know that
better than i.

also, the data itself are specified in external files, typically in a
white-space delimited format where the column headers are specified in
the MAGE-ML file in the QuantitationTypeDimension element and the
identifiers of the row specified in one of the three
DesignElementDimension elements, Feature, Reporter, CompositeSequence,
depending on how derived the data is.  Also the data can be in a vendor
specific format such as the Affymetrix CEL (since the CEL file
internally specifies the dimensions often they are left out of the
MAGE-ML document).

the ExperimentalFactor elements are certainly relevant and if you've
looked at some of the examples you will noticed that the BioSource
elements, in particular, and other elements are annotated by
OntologyEntry elements.  from the gene expression specification:

"OntologyEntry
A single entry from an ontology or a controlled vocabulary. For
instance, category
could be 'species name,' value could be 'homo sapiens' and ontology
would be
taxonomy database, NCBI."

for the element an ontology entry element is annotating, we looked at it
as a way of specifying something like "the object identified by the
element is an instance of the class/individual specified by the
OntologyEntry"

so from "kitm-affy-droso-176167" one sees that the BioSource is an
"instance of" Drosophila, whole animal, whole head and an age of 3 days:

        <BioSource
identifier="arrayconsortium.tgen.org::biosource.181527" name="Oregon R
head 3d">
           <Characteristics_assnlist>
              <OntologyEntry category="Organism" value="Drosophila"
description="Drosophila">
                 <OntologyReference_assn>
                    <DatabaseEntry accession="#Organism"
URI="http://mged.sourceforge.net/ontologies/MGEDontology.php#Organism";>
                       <Database_assnref>
                          <Database_ref identifier="MO"/>
                       </Database_assnref>
                    </DatabaseEntry>
<!-- snip -->
                 </OntologyReference_assn>
              </OntologyEntry>
              <OntologyEntry category="OrganismPart" value="whole
animal" description="">
                 <OntologyReference_assn>
                    <DatabaseEntry accession="#OrganismPart"
URI="http://mged.sourceforge.net/ontologies/MGEDontology.php#OrganismPar
t">
                       <Database_assnref>
                          <Database_ref identifier="MO"/>
                       </Database_assnref>
                    </DatabaseEntry>
                 </OntologyReference_assn>
<!-- snip -->
              </OntologyEntry>
              <OntologyEntry category="OrganismPartRegion" value="whole
head" description="">
<!-- snip -->
              </OntologyEntry>
<!-- snip -->
              <OntologyEntry category="Age" value="Age">
                 <OntologyReference_assn>
                    <DatabaseEntry accession="#Age"
URI="http://mged.sourceforge.net/ontologies/MGEDontology.php#Age";>
                       <Database_assnref>
                          <Database_ref identifier="MO"/>
                       </Database_assnref>
                    </DatabaseEntry>
                 </OntologyReference_assn>
                 <Associations_assnlist>
                    <OntologyEntry category="has_measurement"
value="has_measurement">
                       <OntologyReference_assn>
                          <DatabaseEntry accession="#has_measurement"
URI="http://mged.sourceforge.net/ontologies/MGEDontology.php#has_measure
ment">
                             <Database_assnref>
                                <Database_ref identifier="MO"/>
                             </Database_assnref>
                          </DatabaseEntry>
                       </OntologyReference_assn>
                       <Associations_assnlist>
                          <OntologyEntry category="Measurement"
value="Measurement">
                             <OntologyReference_assn>
                                <DatabaseEntry accession="#Measurement"
URI="http://mged.sourceforge.net/ontologies/MGEDontology.php#Measurement
">
                                   <Database_assnref>
                                      <Database_ref identifier="MO"/>
                                   </Database_assnref>
                                </DatabaseEntry>
                             </OntologyReference_assn>
                             <Associations_assnlist>
                                <OntologyEntry category="has_value"
value="has_value">
                                   <OntologyReference_assn>
                                      <DatabaseEntry
accession="#has_value"
URI="http://mged.sourceforge.net/ontologies/MGEDontology.php#has_value";>
                                         <Database_assnref>
                                            <Database_ref
identifier="MO"/>
                                         </Database_assnref>
                                      </DatabaseEntry>
                                   </OntologyReference_assn>
                                   <Associations_assnlist>
                                      <OntologyEntry
category="has_value" value="3"/>
                                   </Associations_assnlist>
                                </OntologyEntry>
                                <OntologyEntry category="has_units"
value="has_units">
                                   <OntologyReference_assn>
                                      <DatabaseEntry
accession="#has_units"
URI="http://mged.sourceforge.net/ontologies/MGEDontology.php#has_units";>
                                         <Database_assnref>
                                            <Database_ref
identifier="MO"/>
                                         </Database_assnref>
                                      </DatabaseEntry>
                                   </OntologyReference_assn>
                                   <Associations_assnlist>
                                      <OntologyEntry
category="TimeUnit" value="days" description="24 hours, time unit">
                                         <OntologyReference_assn>
                                            <DatabaseEntry
accession="#days"
URI="http://mged.sourceforge.net/ontologies/MGEDontology.php#days";>
                                               <Database_assnref>
                                                  <Database_ref
identifier="MO"/>
                                               </Database_assnref>
                                            </DatabaseEntry>
                                         </OntologyReference_assn>
                                      </OntologyEntry>
                                   </Associations_assnlist>
                                </OntologyEntry>
                             </Associations_assnlist>
                          </OntologyEntry>
                       </Associations_assnlist>
                    </OntologyEntry>
                 </Associations_assnlist>
              </OntologyEntry>
<!-- snip -->
           </Characteristics_assnlist>
<!-- snip -->
        </BioSource>

by the by, the MAGE-ML examples i've looked at from the NIH Neuroscience
Microarry Consortium are not in a valid MAGE-ML.dtd format.  i'll send a
follow-up e-mail dealing with the problems i see.  they are not far off
but are invalid in a number of places.

cheers,
michael

Michael Miller
Lead Software Developer
Rosetta Biosoftware Business Unit
www.rosettabio.com

[1] http://www.omg.org/spec/GENE/1.1/

(sadly, the original links to the MAGEstk appear to be broken, this
mirror site still has the MAGE related files built up over the years,
here's my best guess as to the most helpful for the references)
[2]
http://www.mirrorservice.org/sites/download.sourceforge.net/pub/sourcefo
rge/m/mg/mged/  
        v1.0:
http://www.mirrorservice.org/sites/download.sourceforge.net/pub/sourcefo
rge/m/mg/mged/MAGE-2002-01-07.xmi.gz/MAGE-2002-01-07.xmi
        v1.1:
http://www.mirrorservice.org/sites/download.sourceforge.net/pub/sourcefo
rge/m/mg/mged/MAGE.xmi.gz[peek]
[3]
http://www.mirrorservice.org/sites/download.sourceforge.net/pub/sourcefo
rge/m/mg/mged/MAGE%20Java%20API/20010911/

-----Original Message-----
From: public-semweb-lifesci-requ...@w3.org[mailto:public-semweb-lifesci-requ...@w3.org] On Behalf OfHelen Parkinson
Sent: Wednesday, July 22, 2009 2:55 AM
To: Kei Cheung
Cc: HCLS; James Malone
Subject: Re: BioRDF Telcon

Responses in line.
1. We have text mined much of the Affymetrix GEO data,
curated it and
imported it into ArrayExpress - there is now much better sampleannotation than the native data in GEO. We also are
running QC across
all the data files so we know which should be excluded for futureanalyses.
I think it's the right thing to do both to enrich data
annotation and
to enhance data quality. This will help data integration a lot.
Currently, we are exploring query federation in the neurosciencecontext. It'd be great if we can use the neuroscience use
case(s) to
help drive your ontology development for text mining and datavisualization. In addition to the NIH neuroscience microarrayconsortium, it may be possible to collaborate with the NeuroscienceInformation Framework (NIF) to see if we can utilize some of itsresources (e.g., neuron ontology).
Re-use of the neuron ontology is possible, but it depends on whetherthere is available data to annotate either in ArrayExpress or GEO. Ifyou can get me a list of experiments accessions or pubmed idsI can seeif this is feasible
3. We have summary level data of genes x conditions for
~30,000 hybs
worth of data in our gene expression atlas with p values
indicating
relative under/over-expression. We are planning to export these astriples as soon as we publish the atlas - these may be of
interest.
www.ebi.ac.uk/gxa - there's an API at present, but it will beimproved in the next month or so.
It fits well with what we're currently exploring in terms
of gene list
representation and linking genes and samples to existing
ontologies.
It'd be great if we can download or fetch RDF triples from
EBI atlas.
We have a student starting work on this in a month, if youcan produceconcrete use cases for how you want to access these data we can dosomething.
4. If neuroscience data is of specific interest we could
do a themed
atlas release where we add datasets for a given community
or project
and make these available. These can be identified by
ArrayExpress or
GEO accession or pubmed and we can re-annotate the genes vsUniprot/Ensembl, add GO terms, etc and curate the sample
attributes
and experimental variables. These pipelines are already in
place as
part of our production workflow.
I think it's a great idea to do a themed atlas (e.g.,
neuro-atlas). I
just played with gxa a little bit. It's nice! For example, I couldfind genes that are over-expressed in the hippocampus brain regionacross different experiments. However, when I tried to do the samething for neurons, there are only a few neuron types that I canselect. It'd be nice if we can have more neuron types, for instance.
This is probably as we don't have data - here's a list of humanexperiments with the term neuron - if any of these are useful, then Ican prioritise their curation and inclusion in an atlas release
http://www.ebi.ac.uk/microarray-as/ae/browse.html?keywords=neu

ron&species=Homo+sapiens&array=&exptype=&pagesize=25>
&sortby=releasedate&sortorder=descending

and brain

http://www.ebi.ac.uk/microarray-as/ae/browse.html?keywords=bra

in&species=Homo+sapiens&array=&exptype=&pagesize=25>
&sortby=releasedate&sortorder=descending

I'd be very happy to collaborate, and for this group to
use our data,
we spend a lot of time adding semantic value to it, so
please let me
know if this is of interest
We are also looking into the possibility of establishing
collaboration
with the scientific discourse task force based on the
microarray use
case. We're planning to have a microarray-related presentation anddiscussion on Aug. 31 (Monday, 11 am EDT/5 pm CET). Details will beannounced later. It'd be great if you can join the BioRDF call toparticipate in the discussion.
Cheers,

-Kei
best regards

Helen






Kei Cheung wrote:
The minutes for yesterday's BioRDF call are available at:

http://esw.w3.org/topic/HCLSIG_BioRDF_Subgroup/Meetings/2009-07-20_Confe

rence_Call

Thanks to Lena for scribing and Eric for retrieving the

transcript

from the IRC log.

Cheers,

-Kei

Kei Cheung wrote:
This is a reminder that the next BioRDF teleconf. will

be held at

11 am EDT (5 pm CET) on Monday, July 20 (see details below).
I created the following wiki page for discussing the

microarray use

case:

http://esw.w3.org/topic/HCLSIG_BioRDF_Subgroup/QueryFederation2

Cheers,

-Kei

== Conference Details ==
* Date of Call: Monday July 20, 2009
* Time of Call: 11:00 am Eastern Time
* Dial-In #: +1.617.761.6200 (Cambridge, MA)
* Dial-In #: +33.4.89.06.34.99 (Nice, France)
* Dial-In #: +44.117.370.6152 (Bristol, UK)
* Participant Access Code: 4257 ("HCLS")

* IRC Channel: irc.w3.org port 6665 channel #hcls (see[http://www.w3.org/Project/IRC/ W3C IRC page] for

details, or see

[http://cgi.w3.org/member-bin/irc/irc.cgi Web IRC])
* Duration: ~1 hour
* Frequency: bi-weekly
* Convener: Kei Cheung

== Agenda ==
* Roll call and introduction (Kei)
* TCM data quick update (Jun, Kei)
* Query federation use case expanison (microarray) (All)

--
Helen Parkinson, PhD
ArrayExpress Production Coordinator,

Microarray Informatics Team,EBI


EBI 01223 494672
Skype: helen.parkinson.ebi

Re: BioRDF Telcon

Reply via email to