Nice summary and comments, Bill. This is the idea of open innovation and
open community.
The example I gave includes hypothesis. In addition to the ontologies
you mentioned, we might also need to think about the SWAN ontology,
which captures hypotheses.
Cheers,
-Kei
Bill Bug wrote:
Hi Susie,
We certainly do need an "Experiment Ontology" - or Ontology of
Biomedical Investigation (OBI).
I believe Matthias, Michael, and Kei have all made exactly the points
I think are most important to consider:
1) Matthias's comments
Are you following "best practices" in creating the ontology. I
believe Matthias gives many instructive examples on how to adjust what
is here to bring it much more in sync with the emerging "best
practices" that are coming out of the community development
surrounding a variety of OBO Foundry ontologies. Matthias also makes
the point that its important to seek to re-use (or directly contribute
to) the emerging community ontologies to cover the required domains.
In the case of this particular Experiment Ontology, the ontologies to
consider are Ontology of Biomedical Investigation (OBI), the OBO
Relations Ontology, the Gene Ontology (specifically the Molecular
Function and Cellular Component branches, the latter of which is
designed to capture components down to the level of macromolecular
complexes), the Sequence Ontology, Protein Ontology (nascent - but
proceeding rapidly), the Cell Ontology - at a minimum. As many on
this list know - and I'm certain the talented folks at Lilly who
invested time in assembling this ontology also learned - many of these
are not fully ready for prime-time, and/or may not FULLY cover the
breadth and depth of the domains a specific application requires.
However, if one doesn't seek to work with these community efforts,
you cannot expect to achieve the ultimately goal, which is to make
your data maximally "semantically sticky", so as to ensure the least
amount of custom logic and human effort will be required to get the
most value from your data. Otherwise, you stand the chance of
creating what may be a useful ontology that meets your specific
requirements (as has been true of "investigation"-oriented ontologies
that have come before such as the MAGE Ontology, ExperiBase, EXPO,
myGRID KAVE, etc.), but don't help the community at-large to
appropriately re-use your data. In each case, these ontologies or KR
frameworks have been extremely useful in the local application context
for which they were constructed, but they cannot be effectively
employed as the basis for semantically-driven integration across data
sets that may not be able to accept the constraints (or lack thereof)
of this application-oriented ontology.
Would you know off-hand, Susie, whether the folks who worked on this
ontology at Lilly have both reviewed the relevant community efforts
cited above and/or have sought to interact with those groups to get
some input on how best to meet the overall requirements that underlie
this particular Experiment Ontology with the minimal required effort
and in a manner that could help to ensure Lilly's sunk investment
could be of benefit to us all.
2) Michael's comments
It's very helpful to know what the target is when it comes to
exporting/exchanging the actual data. As Michael points out, a great
deal of work has gone into the production of FuGE (and MaGE before it)
to come up with the appropriate division of labor between the
semantically-opaque, syntactical requirements as represented in a data
model such as MaGE or FuGE and the explicit semantics as captured in
the ontology. For those using FuGE, as Michael states, in the realm
of syntax, the intention for FuGE is to provide a shared structure for
universal elements such as biomaterials, experiment
populations/pools/groups, protocol details, reagents details, etc..
Built on that shared, generic foundation, any specific discipline -
e.g., microarray expression, GC-MS, FISH, MRI, etc. - can sub-class
FuGE components and add what additional detail required in their
discipline. In parallel with this effort on data structure, the OBI
ontology cooperative seeks to provide that same foundation for the
shared semantic domains, and a clear set of recommended practices for
how to re-use entities from other OBO Foundry ontologies such as
ChEBI, Sequence Ontology, Protein Ontology, OBO Cell, Organism
Taxonomy (OWL versions of NCBI Tax), etc. to specify the critical
biomedical entities and their complex relations. As I say above,
these are works in progress. For those of us who must have something
working now, the recommended practice is to actively participate in
these projects with an eye toward following their practice - and
replacing any "proxy" you create in the interim with the community
ontology, when it is ready for use. This is what we have done in the
BIRN ontology BIRNLex. We actually have an OWL module called
"BIRNLex-OBI-Proxy.owl" which we fully intend to replace with OBI
entities, when they are ready for use. We also have
"BIRNLex-Investigation.owl" that builds on this "proxy" to cover
entities BIRN researchers must capture. We expect to eventually see
the contents of "BIRNLex-Investigation" in OBI in some form. We
intend to "contribute" those elements from this OWL file directly to
OBI, when OBI is ready for them, and we have the time work through
this migration process.
3) Kei's comments
Examples - examples - examples. This is critical. Working through
the example Kei cites from the NIH Neuroscience Microarray Consortium
is a wonderful way to determine whether:
- there are existing community ontologies that can meet the KR and
processing requirements
- where the gaps are in those community ontologies
- whether the ontology you are creating effectively fills those gaps
(if it does, that makes it very clear how the community effort can
make effective use of your ontology)
In regards to Gene Lists, Kei is certainly correct. If these are
captured through algorithmic means, it's critical to capture the
details on that algorithm - typically both the version of the
algorithm as well as the version of the data repository you ran it
against.
Also - where gene entities are concerned - there is ongoing work
between the GO groups, the Sequence Ontology, and the Protein Ontology
that is particularly targeted toward capturing the specific relations
between types of genomic sequence elements and types of biologically
active protein-based molecules (e.g., macromolecular complexes
composed of a collection of proteins in a variety of
post-translationally modified states - e.g., GPC receptors, ion
channels, transporters, pathway enzymes, etc. - i.e., Rx drug
targets). These are the details we'll all require in order to do
round-trip pharmacogenetics - i.e.,effects of genetic constructs on
target susceptibility to drugs - AND - the ways in which drugs
ultimately alter macromolecular complexes by leading to changes in
gene expression.
Just my $0.02 filtering on these helpful comments from Matthias,
Michael, and Kei.
Cheers,
Bill
On Dec 3, 2007, at 1:00 PM, Kei Cheung wrote:
This is great!
I have a microarray experiment description (that has to do with
Alzheimer Disease) extracted from NINDS microarray consortium:
http://arrayconsortium.tgen.org/np2/viewProject.do?action=viewProject&projectId=433773
<http://arrayconsortium.tgen.org/np2/viewProject.do?action=viewProject&projectId=433773>
I just wonder how this example would fit this experiment ontology (as
well as others such as OBI) As shown in this example, we record
information such as organ type, organ region, cell type (layer II
pyramidal neuron), etc. NINDS microarry consortium uses different
array platforms (e.g., agilent, Affymetrix, and cDNA) for different
organisms so one may need to divide chips into groups corresponding
to different platform types. Each group can then be further divided
into subgroups corresponding to different organisms.
We also would like to capture gene lists (not the raw gene lists but
the ones (much shorter) that indicate what genes are over/under
expressed under certain experimental conditions). Such gene lists
would usually be extracted from the literature. Also the analysis
package (including version) that was used to generate a gene list
should be identified. One possible use of these gene lists is to
compare them to identify genes are differentially expressed under the
same/similar experimental condition across different microarray
experiments. This would help identify true signals from noises.
Hope it helps.
Cheers,
-Kei
Matthias Samwald wrote:
Hi Susie,
Susie wrote:
It would be great if you could take a look at it and provide
comments. The
ontology is available at:
http://esw.w3.org/topic/HCLSIG_BioRDF_Subgroup/Tasks/Experiment_Ontology
* Some of the entities/properties are missing a rdfs:label or have
an empty label (a string with lenght 0).
* Some of the entities could be taken from existing ontologies like
OBI, RO or some of the OBO Foundry ontologies. This would save work
and makes integration with other data sources and ontologies much
easier. By the way, there seem to be several groups working on
ontologies for mircoarray experiments, or are at least planning to
do that. It would be great if these groups could work together.
* The class 'Chip type' should be removed and be replaced by
subclasses of 'chip', e.g., 'chip (human)', 'chip (mouse)' etc.
* Some of the object properties appear like they are intended to be
datatype properties (e.g., 'has proteome id').
* Many of the datatype properties could be replaced with object
properties, possibly referring to third party ontologies -- of
course this would require a richer ontology and more work spent on
creating mappings. 'has molecular function' could refer to entities
from the gene ontology, 'has associated organ' could refer to an
ontology about anatomy and so on.
* Object properties and their ranges are quite redundant. Property
'has reagent' has range 'Reagent', property 'has treatment' has
range'Treatment' and so on. Maybe the ontology could be designed in
such a way that there are only some generic properties such as 'has
part'. This would make the ontology much easier to maintain, query
and understand in the long term.
* It is unclear how 'Gene list' is intended to be used.
* 'Hardware' and 'Software' should not be subclasses of 'Protocol'.
Many of the datatype properties in this ontology look very
interesting and might provide requirements for other ontologies. It
would be great if some of them could be described/commented in more
detail so that we know more about the requirements that motivated
the creation of these properties.
I hope that was somewhat helpful.
cheers,
Matthias Samwald
William Bug, M.S., M.Phil.
email: [EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]>
Ontological Engineer (Programmer Analyst III) work: (610) 457-0443
Biomedical Informatics Research Network (BIRN)
and
National Center for Microscopy & Imaging Research (NCMIR)
Dept. of Neuroscience, School of Medicine
University of California, San Diego
9500 Gilman Drive
La Jolla, CA 92093
Please note my email has recently changed