date:20090724

Re: BioRDF Telcon

2009-07-24 Thread Kei Cheung

Hi Helen,

Please see my response below.

Helen Parkinson wrote:

Responses in line.

1. We have text mined much of the Affymetrix GEO data, curated it
and imported it into ArrayExpress - there is now much better sample
annotation than the native data in GEO. We also are running QC
across all the data files so we know which should be excluded for
future analyses.

I think it's the right thing to do both to enrich data annotation and
to enhance data quality. This will help data integration a lot.

Currently, we are exploring query federation in the neuroscience
context. It'd be great if we can use the neuroscience use case(s) to
help drive your ontology development for text mining and data
visualization. In addition to the NIH neuroscience microarray
consortium, it may be possible to collaborate with the Neuroscience
Information Framework (NIF) to see if we can utilize some of its
resources (e.g., neuron ontology).

Re-use of the neuron ontology is possible, but it depends on whether
there is available data to annotate either in ArrayExpress or GEO. If
you can get me a list of experiments accessions or pubmed ids I can
see if this is feasible

That's related to our current mciroarray use case where we're exploring
a few examples (experiements) that hopefully contains enough
neuron-related annotation.

3. We have summary level data of genes x conditions for ~30,000 hybs
worth of data in our gene expression atlas with p values indicating
relative under/over-expression. We are planning to export these as
triples as soon as we publish the atlas - these may be of interest.
www.ebi.ac.uk/gxa - there's an API at present, but it will be
improved in the next month or so.

It fits well with what we're currently exploring in terms of gene
list representation and linking genes and samples to existing
ontologies. It'd be great if we can download or fetch RDF triples
from EBI atlas.

We have a student starting work on this in a month, if you can produce
concrete use cases for how you want to access these data we can do
something.

As part of the current biordf effort, we're in the process of coming up
with questions that we can ask across neuroscience microarray experiments.

4. If neuroscience data is of specific interest we could do a themed
atlas release where we add datasets for a given community or project
and make these available. These can be identified by ArrayExpress or
GEO accession or pubmed and we can re-annotate the genes vs
Uniprot/Ensembl, add GO terms, etc and curate the sample attributes
and experimental variables. These pipelines are already in place as
part of our production workflow.

I think it's a great idea to do a themed atlas (e.g., neuro-atlas). I
just played with gxa a little bit. It's nice! For example, I could
find genes that are over-expressed in the hippocampus brain region
across different experiments. However, when I tried to do the same
thing for neurons, there are only a few neuron types that I can
select. It'd be nice if we can have more neuron types, for instance.

This is probably as we don't have data - here's a list of human
experiments with the term neuron - if any of these are useful, then I
can prioritise their curation and inclusion in an atlas release

http://www.ebi.ac.uk/microarray-as/ae/browse.html?keywords=neuronspecies=Homo+sapiensarray=exptype=pagesize=25sortby=releasedatesortorder=descending

The experiment: E-GEOD-4757 compares gene expression prolifes of layer 2
stellate island neurons of the entorhinal cortex between normal and AD
subjects

The experiment: E-GEOD-9770 compares gene expression profiles of Layer
II stellate neurons (entorhinal cortex) and layer III cortical neurons
(hippocampus CA1, middle temporal gyrus, posterior cingulate, superior
frontal gyrus, primary visual cortex) for subjects with mild cognitive
impairment.

These two experiments are also in the NIH microarray consortium. There
is an overlap of neuron type, brain region, and phenotype (memory)
between the two experiements. This may be a start ...

and brain

http://www.ebi.ac.uk/microarray-as/ae/browse.html?keywords=brainspecies=Homo+sapiensarray=exptype=pagesize=25sortby=releasedatesortorder=descending

I'd be very happy to collaborate, and for this group to use our
data, we spend a lot of time adding semantic value to it, so please
let me know if this is of interest

We are also looking into the possibility of establishing
collaboration with the scientific discourse task force based on the
microarray use case. We're planning to have a microarray-related
presentation and discussion on Aug. 31 (Monday, 11 am EDT/5 pm CET).
Details will be announced later. It'd be great if you can join the
BioRDF call to participate in the discussion.

Cheers,

-Kei

best regards

Helen

Kei Cheung wrote:

The minutes for yesterday's BioRDF call are available at:

Re: Can RDFa be used on XML: pharma information

2009-07-24 Thread Kei Cheung

This may also be an interesting way of intersecting microarray (mageml) 
and semantic web (rdfa)  ...


-Kei

Ivan Herman wrote:


I am sorry if I come into this thread very late. Additionally to what
Ralph just said, the RDFa distiller running on the W3C site:

http://www.w3.org/2007/08/pyRdfa/

should actually work with an arbitrary XML file, although only SVG is
'announced' there (which is probably my mistake). If there is a problem
then, well... it is my bug:-(

Ivan

Ralph R. Swick wrote:
 


At 10:48 PM 6/23/2009 +1000, Rick Jelliffe wrote:
   


I see that the 2008 draft
http://www.w3.org/2006/07/SWD/RDFa/rdfa-overview
says
RDFa itself is intended to be a technique that allows for adding metadata to any 
(XML) markup document, including SMIL, RSS, SVG, MathML, etc. Note, however, that in the 
current state, RDFa is being defined only for the (X)HTML family of languages.
 


The RDFa specification was designed with the intent that other
languages than XHTML could take advantage of RDFa markup.
(The terminology host language was used in some drafts
to signal this direction.)  The charter under which the group
was operating was specific to XHTML, thus the wording in
the W3C Recommendation.

   


So I think I will go ahead and add some RDFa markup to the
XML, 
 


By all means, reuse the RDFa vocabulary if it seems appropriate
for your application.

RE: Can RDFa be used on XML: pharma information

2009-07-24 Thread Miller, Michael D (Rosetta)

hi kei,

there is already something better than RDFa tags in MAGE-ML, the
OntologyEntry tags.  Their purpose is exactly to provide the information
to link to the semantic web.  The examples you provided from NIH are
well annotated with those tags.

cheers,
michael

Michael Miller
Lead Software Developer
Rosetta Biosoftware Business Unit
www.rosettabio.com


 -Original Message-
 From: public-semweb-lifesci-requ...@w3.org 
 [mailto:public-semweb-lifesci-requ...@w3.org] On Behalf Of Kei Cheung
 Sent: Friday, July 24, 2009 10:56 AM
 To: Ivan Herman
 Cc: Ralph R. Swick; Rick Jelliffe; public-semweb-lifesci@w3.org
 Subject: Re: Can RDFa be used on XML: pharma information
 
 This may also be an interesting way of intersecting 
 microarray (mageml) 
 and semantic web (rdfa)  ...
 
 -Kei
 
  Ivan Herman wrote:
 
 I am sorry if I come into this thread very late. Additionally to what
 Ralph just said, the RDFa distiller running on the W3C site:
 
 http://www.w3.org/2007/08/pyRdfa/
 
 should actually work with an arbitrary XML file, although only SVG is
 'announced' there (which is probably my mistake). If there 
 is a problem
 then, well... it is my bug:-(
 
 Ivan
 
 Ralph R. Swick wrote:
   
 
 At 10:48 PM 6/23/2009 +1000, Rick Jelliffe wrote:
 
 
 I see that the 2008 draft
  http://www.w3.org/2006/07/SWD/RDFa/rdfa-overview
 says
 RDFa itself is intended to be a technique that allows for 
 adding metadata to any (XML) markup document, including SMIL, 
 RSS, SVG, MathML, etc. Note, however, that in the current 
 state, RDFa is being defined only for the (X)HTML family of 
 languages.
   
 
 The RDFa specification was designed with the intent that other
 languages than XHTML could take advantage of RDFa markup.
 (The terminology host language was used in some drafts
 to signal this direction.)  The charter under which the group
 was operating was specific to XHTML, thus the wording in
 the W3C Recommendation.
 
 
 
 So I think I will go ahead and add some RDFa markup to the
 XML, 
   
 
 By all means, reuse the RDFa vocabulary if it seems appropriate
 for your application.

Re: BioRDF Telcon

2009-07-24 Thread Kei Cheung


Hi Michael,

Thanks for your detailed description of mageml. For our use case, we 
probably don't need to use all the information captured in mageml. The 
types of information we are currently focusing on include 
experiment/sample annotation (including some provenance as you 
indicated) and gene lists and how they are linked to existing 
ontologies. A couple of convincing examples may be enough to start. I 
can relay your comments about the validity of mageml to the consortium, 
although I don't know whether they can address them.


Cheers,

-Kei

Miller, Michael D (Rosetta) wrote:


hi kei and helen,

like helen, i've been following the HCLS working groups with great
interest.  as one of the designers, with helen, of the MAGE-ML and
MAGE-TAB specs i might be able to provide a little technical insight
into the formats.

(from helen)
This is probably as we don't have data - here's a list of human 
experiments with the term neuron - if any of these are useful, then I 
can prioritize their curation and inclusion in an atlas release


kei, are the NIH Neuroscience Microarry Consortium exeriments you've
cited and others like them in GEO or ArrayExpress?  a set of those could
be a good starting point for helen.
 



My understanding is that the publicly visible mciroarray projects in the 
neuroscience microarray consortium should also be in geo and/or 
arrayexpress, although I don't know whether all the annotations are 
preserved.




first, MAGE-ML is based on a DTD[1], not an XSD.  in early 2002 as the
OMG Gene Expression specification[1] was being finalized, XSD was still
in its infancy so we weren't comfortable at that point generating a XSD.
the MAGE-OM UML[2], in a very early XMI format from Rational Rose and
UniSys, was used to generate the DTD with code we wrote ourselves[3]. 


the UML model was designed to capture the flow of a microarray
experiment and how the resulting arrays were organized in the experiment
based on how the samples were treated and/or on the samples' phenotypes
for the purpose of a reviewer understanding the methodology and for a
researcher replicating and/or re-analyzing the results.  


some of the details of the flow may not be of much interest, i.e. it
might be worth simply connecting the BioSource elements with their gene
expression data and not worrying about how the hybridization was
performed.  but that depends on what you want to do and you know that
better than i.

also, the data itself are specified in external files, typically in a
white-space delimited format where the column headers are specified in
the MAGE-ML file in the QuantitationTypeDimension element and the
identifiers of the row specified in one of the three
DesignElementDimension elements, Feature, Reporter, CompositeSequence,
depending on how derived the data is.  Also the data can be in a vendor
specific format such as the Affymetrix CEL (since the CEL file
internally specifies the dimensions often they are left out of the
MAGE-ML document).

the ExperimentalFactor elements are certainly relevant and if you've
looked at some of the examples you will noticed that the BioSource
elements, in particular, and other elements are annotated by
OntologyEntry elements.  from the gene expression specification:

OntologyEntry
A single entry from an ontology or a controlled vocabulary. For
instance, category
could be 'species name,' value could be 'homo sapiens' and ontology
would be
taxonomy database, NCBI.

for the element an ontology entry element is annotating, we looked at it
as a way of specifying something like the object identified by the
element is an instance of the class/individual specified by the
OntologyEntry

so from kitm-affy-droso-176167 one sees that the BioSource is an
instance of Drosophila, whole animal, whole head and an age of 3 days:

BioSource
identifier=arrayconsortium.tgen.org::biosource.181527 name=Oregon R
head 3d
   Characteristics_assnlist
  OntologyEntry category=Organism value=Drosophila
description=Drosophila
 OntologyReference_assn
DatabaseEntry accession=#Organism
URI=http://mged.sourceforge.net/ontologies/MGEDontology.php#Organism;
   Database_assnref
  Database_ref identifier=MO/
   /Database_assnref
/DatabaseEntry
!-- snip --
 /OntologyReference_assn
  /OntologyEntry
  OntologyEntry category=OrganismPart value=whole
animal description=
 OntologyReference_assn
DatabaseEntry accession=#OrganismPart
URI=http://mged.sourceforge.net/ontologies/MGEDontology.php#OrganismPar
t
   Database_assnref
  Database_ref identifier=MO/
   /Database_assnref
/DatabaseEntry
 /OntologyReference_assn
!-- snip --
  /OntologyEntry
  OntologyEntry

Re: Can RDFa be used on XML: pharma information

2009-07-24 Thread Helen Parkinson

This is probably technically possible - but you'd need to process a lot 
of complex mage-ml to get out some quite simple information - there's a 
node-edge sample processing graph, plus all the external data files in 
there - mage-ml is mostly tags and the files are large. We've moved 
internally to MAGE-TAB format, we have a MAGE-TAB parser that's being 
used by a couple of groups. We will be developing a standalone 
parser/backend database which will allow users to build a standalone 
atlas. There may be more mileage in developing that parser further to 
support RDF than to persue MAGE-ML.


thanks

Helen

Kei Cheung wrote:
This may also be an interesting way of intersecting microarray 
(mageml) and semantic web (rdfa)  ...


-Kei

Ivan Herman wrote:


I am sorry if I come into this thread very late. Additionally to what
Ralph just said, the RDFa distiller running on the W3C site:

http://www.w3.org/2007/08/pyRdfa/

should actually work with an arbitrary XML file, although only SVG is
'announced' there (which is probably my mistake). If there is a problem
then, well... it is my bug:-(

Ivan

Ralph R. Swick wrote:
 


At 10:48 PM 6/23/2009 +1000, Rick Jelliffe wrote:
  

I see that the 2008 draft
http://www.w3.org/2006/07/SWD/RDFa/rdfa-overview
says
RDFa itself is intended to be a technique that allows for adding 
metadata to any (XML) markup document, including SMIL, RSS, SVG, 
MathML, etc. Note, however, that in the current state, RDFa is 
being defined only for the (X)HTML family of languages.


The RDFa specification was designed with the intent that other
languages than XHTML could take advantage of RDFa markup.
(The terminology host language was used in some drafts
to signal this direction.)  The charter under which the group
was operating was specific to XHTML, thus the wording in
the W3C Recommendation.

  

So I think I will go ahead and add some RDFa markup to the
XML, 

By all means, reuse the RDFa vocabulary if it seems appropriate
for your application.

Re: BioRDF Telcon

2009-07-24 Thread Helen Parkinson


Hi

I meant to comment on this, I would not attempt a mage-ml-RDF 
transform, I can probably do something more quickly with an rdf export n 
of transformed data analysed for over/under expressions plus factor 
values and genes and we'll have a student to work on this I hope


Helen

Miller, Michael D (Rosetta) wrote:

hi kei and helen,

like helen, i've been following the HCLS working groups with great
interest.  as one of the designers, with helen, of the MAGE-ML and
MAGE-TAB specs i might be able to provide a little technical insight
into the formats.

(from helen)
This is probably as we don't have data - here's a list of human 
experiments with the term neuron - if any of these are useful, then I 
can prioritize their curation and inclusion in an atlas release


kei, are the NIH Neuroscience Microarry Consortium exeriments you've
cited and others like them in GEO or ArrayExpress?  a set of those could
be a good starting point for helen.

first, MAGE-ML is based on a DTD[1], not an XSD.  in early 2002 as the
OMG Gene Expression specification[1] was being finalized, XSD was still
in its infancy so we weren't comfortable at that point generating a XSD.
the MAGE-OM UML[2], in a very early XMI format from Rational Rose and
UniSys, was used to generate the DTD with code we wrote ourselves[3]. 


the UML model was designed to capture the flow of a microarray
experiment and how the resulting arrays were organized in the experiment
based on how the samples were treated and/or on the samples' phenotypes
for the purpose of a reviewer understanding the methodology and for a
researcher replicating and/or re-analyzing the results.  


some of the details of the flow may not be of much interest, i.e. it
might be worth simply connecting the BioSource elements with their gene
expression data and not worrying about how the hybridization was
performed.  but that depends on what you want to do and you know that
better than i.

also, the data itself are specified in external files, typically in a
white-space delimited format where the column headers are specified in
the MAGE-ML file in the QuantitationTypeDimension element and the
identifiers of the row specified in one of the three
DesignElementDimension elements, Feature, Reporter, CompositeSequence,
depending on how derived the data is.  Also the data can be in a vendor
specific format such as the Affymetrix CEL (since the CEL file
internally specifies the dimensions often they are left out of the
MAGE-ML document).

the ExperimentalFactor elements are certainly relevant and if you've
looked at some of the examples you will noticed that the BioSource
elements, in particular, and other elements are annotated by
OntologyEntry elements.  from the gene expression specification:

OntologyEntry
A single entry from an ontology or a controlled vocabulary. For
instance, category
could be 'species name,' value could be 'homo sapiens' and ontology
would be
taxonomy database, NCBI.

for the element an ontology entry element is annotating, we looked at it
as a way of specifying something like the object identified by the
element is an instance of the class/individual specified by the
OntologyEntry

so from kitm-affy-droso-176167 one sees that the BioSource is an
instance of Drosophila, whole animal, whole head and an age of 3 days:

 BioSource
identifier=arrayconsortium.tgen.org::biosource.181527 name=Oregon R
head 3d
Characteristics_assnlist
   OntologyEntry category=Organism value=Drosophila
description=Drosophila
  OntologyReference_assn
 DatabaseEntry accession=#Organism
URI=http://mged.sourceforge.net/ontologies/MGEDontology.php#Organism;
Database_assnref
   Database_ref identifier=MO/
/Database_assnref
 /DatabaseEntry
!-- snip --
  /OntologyReference_assn
   /OntologyEntry
   OntologyEntry category=OrganismPart value=whole
animal description=
  OntologyReference_assn
 DatabaseEntry accession=#OrganismPart
URI=http://mged.sourceforge.net/ontologies/MGEDontology.php#OrganismPar
t
Database_assnref
   Database_ref identifier=MO/
/Database_assnref
 /DatabaseEntry
  /OntologyReference_assn
!-- snip --
   /OntologyEntry
   OntologyEntry category=OrganismPartRegion value=whole
head description=
!-- snip --
   /OntologyEntry
!-- snip --
   OntologyEntry category=Age value=Age
  OntologyReference_assn
 DatabaseEntry accession=#Age
URI=http://mged.sourceforge.net/ontologies/MGEDontology.php#Age;
Database_assnref
   Database_ref identifier=MO/
/Database_assnref
 /DatabaseEntry

Re: BioRDF Telcon

Re: Can RDFa be used on XML: pharma information

RE: Can RDFa be used on XML: pharma information

Re: BioRDF Telcon

Re: Can RDFa be used on XML: pharma information

Re: BioRDF Telcon

6 matches

Site Navigation

Mail list logo

Footer information