Many thanks for the clarification, Chris.
(see you next week in Bar Harbor)
To provide more info to others here not familiar with this effort
underway by the GO Consortium, could you provide just a line or two
describing the overall effort to add to GO underway now for a
approximately 2 years.
(now I'll answer my own request - but please, Chris, correct and/or
embellish what appears below as you see fit)
As I understand it, you all are trying to leverage both the wealth of
existing knowledge resources AND hard won "lessons" learned from the
last 8+ years of GO development to start addressing higher level
biomedical knowledge domains such as immunology and neurobiology.
The goal - again, as I understand it, is to promote re-use of the
existing, low-level ontologies such as the original GO molecular &
cellular domains, KEGG, etc. to build a formal representation of
knowledge in these higher level domains. There is also an associated
effort to identify additional low-level domains where formal
frameworks are still lacking - in particular, several efforts are
underway to define a "cell type" ontology for neuro and other
domains. The onus is on avoiding pre-coordination of multiple
entities, so as to keep the resulting ontologies as cleanly
computable down to the molecular level as is practical.
Another goal is to provide the domain ontologies required by the PATO
& FuGO formalisms in order to support annotation of a wide variety of
biomedical research data down to the level of fine-grained phenotype
described explicitly along with the context in which the experimental
observation was made - e.g., all required detail for the assays,
reagents, and environmental conditions on which the analysis and
interpretation of these observations are dependent.
In the BIRN project, we intend to make extensive use of this approach
in order to provide a means to re-pool complex data types - e.g.,
microarray results, in situ neuroimaging, functional brain imaging,
etc. - across a large body of experiments performed by myriad labs
with disparate initial hypotheses. The idea being if you want to re-
use existing data for novel meta-analysis in a manner analogous to
the way string pattern algorithms such as BLAST and the HMM gene
finders did in genomics, one must often go all the way back to the
primary data, which in the case of microarray data may be the images
with all the associated information regarding reagent conditions,
probe sets, image acquisition and post-acquisition processing. This
will be an impossible task without a consistent, comprehensive, yet
flexible formalism and the required ontological frameworks.
Cheers,
Bill
P.S.: Just as a clarification on FuGO - at least as I understand it -
though the name and the pedigree implies a focus on gene expression
data sets and techniques, the ULTIMATE scope will provide a formal
structure - and ontological framework - in which to describe ANY
experimental observation (in the biomedical domain). The BIRN
Ontology Task Force has started to participate in the FuGO effort
with the hope of using the FuGO formalism to address experimental
details relating to all domains of neuroimaging from EM through
various light microscopy techniques on through the various forms of
radiological imaging. We have committed to describing all data in
the context of phenotypes (PATO), using the FuGO formalism & content
to handle the primary data. There are extant knowledge resources
addressing these issues - from data formats (e.g., OME) on through
ontologies (OBO Imaging ontology) - but they do not nearly fit the
requirement of being consistent, comprehensive, yet flexible - nor
do they cover the domains in sufficient detail. The combination of
PATO & FuGO appears to be a more formally sound means to achieve this
end, even though both projects are still nascent (Though FuGO's
pedigree via MGED goes back almost 10 years, the more formal
approaches recently introduced along with the new name FuGO are
really quite recent).
If any of the other FuGO participants on this list don't see this as
being appropriate, I'd really appreciate getting feedback on this issue.
On Jun 6, 2006, at 2:43 PM, chris mungall wrote:
Hi Bill
Just a minor clarification - the neurodevelopment ontology will not
be distinct from GO, it will be part of the GO biological process
ontology (and thus part of the OBO Foundry) and available as OWL
Cheers
Chris
On Jun 6, 2006, at 7:57 AM, William Bug wrote:
Oops -
I forgot to mention the following:
There is an upcoming meeting at the Jackson Labs (next Wed - Fri)
hosted by Judy Blake of MGI on behalf of the Gene Ontology
Consortium. The work will focus on vetting/extending a
neurodevelopment ontology they have begun to work on to be placed
in the OBO Foundary.
Hopefully, a file will be available in RDF/OWL format at the OBO
site within the next month or so.
Cheers,
Bill
On Jun 6, 2006, at 10:41 AM, William Bug wrote:
Hi All,
Sorry - I'd thought I'd already subscribed to this list, but
apparently not - until now.
The need for a mereotopologically-sound, neuroanatomical ontology
is quite pressing across the community of neuroscientists
involved in neuroinformatics projects most of which include a
neuroimaging component. Generally there is only one thing
neuroscientists are interested in when analyzing images at
whatever resolution from the macromolecular (EM) on up to the
macroscopic - i.e., identifying biologically relevant shapes. In
order for these shapes to have any meaning in a context where one
attempts to pool data and perform relevant data reduction
operations, the shapes must exist within a shared coordinate
space of some sort. For instance, if two separate labs are
examining the change in the size of the Substantia Nigra during
the course of Parkinsonian neurodegeneration, in order for them
to compare their observations, they require several data
integration/semantic frameworks:
- a shared neuroanatomical terminology
- a shared coordinate space (to place the shapes from their
images in a comparable coordinate framework)
- a shared, well-founded anatomical ontology which encapsulates
mereotopological knowledge about shapes in - at least - 3D space.
Other knowledge resources can be helpful in supplementing this
array of tools, but, generally, these are the absolute minimum.
[NOTE: the Wikipedia has a moderately clear definition of
mereotopology (http://en.wikipedia.org/wiki/Mereotopology).
Basically, it combines a formal, ontological theory of shapes and
boundaries (mereology) with the mathematics of topology with the
goal of providing a computational formalism to support applying
logical operations to objects in space. As has been pointed out
by others, a great deal of the work in this field of applied
biomedical mereotopology derives from related work in the GIS
field. Use of mereotopology by geographers has been going on for
quite some time and is much more advanced. Work from GIS can be
adapted for use in the biomedical domain, but it must be done
with great care, as many of the assumptions behind the way
researchers represent space and manner of information being
represented can differ significantly across these disciplines.]
The same is true as you scale this problem up to field-wide
projects such as BIRN or The NeuroCommons.
As several have mentioned in this thread, there are already
existing resources that can begin to fill this need.
1) NeuroNames
Kei, Olivier, Peter Mork, and others have already given
sufficient references on NeuroNames in this thread, so that
others can dig in deeper to the specifics if they like.
Having worked with Doug Bowden, Mark Dubach, and their colleagues
over the last year or so in an advisory capacity on the specific
issue of use of NeuroNames for semantically-based,
neuroanatomical data set integration, I can add a few important
qualifying points:
a) Doug et al. have been working on the extremely difficult task
of unifying neuroanatomical terminologies across mammalian
species for 20 years now. Embedded in Neuronames & Braininfo,
there is a wealth of hard won empirical knowledge related to how
one achieves this end. I think it would be ill-advised to try to
duplicate their effort, as the myriad scientific problems related
to this effort would surely present themselves again and only
need to be worked out once one.
b) Doug et al. are extremely collegial and quite receptive to
feedback and collaboration - within the bounds of their limited
resources.
c) NeuroNames is a terminological resource - not a well-founded,
spatial ontology of brain anatomy capable of supporting
mereotopological reasoning. As with most research-based
terminologies, there are many semantically-based relations
embedded in the NeuroNames graphs, but as the primary goal of NN
is to disambiguate and integrate across the neuroanatomical
lexicon, the embedded semantic information can often lead to a
logical dead end. For instance, many neuroanatomical terms
critical to specifying location in the rodent brain have been
placed in the NN category "ancillary terms," as they don't fit
into the core hierarchy in an unambiguous way. This can make use
of NN for annotating mouse brain gene & protein expression
patterns (e.g., GENSAT, the Allen Brain Atlas, various BIRN
projects) extremely problematic.
d) The NN primary structures (http://
braininfo.rprc.washington.edu/indexabout.html) provide the
closest thing to an ontology in NN. As Peter Mork pointed out,
there has been an effort in the past to unite this core NN
hierarchy with the FMA, which does provide a mereotopologically
sound framework for anatomy. Barry Smith (formal ontologist who
has worked for over a decade on problems in biomedical ontology -
most especially, though hardly exclusively, in the area of
mereotopological reasoning) and his colleagues have worked
closely with the Cornelius Rosse and his colleagues at the FMA
project to create in association with the work started in the FMA
a foundational ontology for biomedicine (the Ontology of
Biological Reality) that is becoming increasingly important to
all of the ontologies being monitored by NCBO and incorporated
into the OBO site and the emerging OBO Foundary (http://
obofoundry.org/).
e) Doug and his colleagues have worked closely with Jack Park (a
consulting scientist to SRI's AI Center - http://www.ai.sri.com/)
to represent NN as a TopicMap (XTM). As many on this list may
know, there has been a moderate amount of effort to integrate and/
or reconcile XTM with RDF here at the W3C (search on "TopicMaps"
at the main RDF page - http://www.w3.org/RDF/). I'm not certain
how this effort will ultimately make NN more "semantic web"
compliant, but the bottom line is a great deal of effort has
already been expended to express NN in a semantically well-
grounded formalism.
f) Though - as Don points out - neuroanatomical representations
are likely to significantly evolve over the coming decades, as
the number of large scale gene & protein expression
characterization studies focussed on the brain continue to
accumulate. Having said that, the "conventional" view of
neuroanatomy will likely remain relevant for a long while to
come, not only because it has been used to characterize findings
in the literature for the last 125+ years, but also because it
did derive from a wealth of empirical observation which is likely
to remain valid in many domains of neuroanatomical study. I
would also modify Don's well informed comment regarding the
derivation of "conventional" views of neuroanatomy. To a large
extent they are related to functional studies of the brain - as
well as lesion based studies of functional deficits dating back
to the 19th century (think "Broca's Area"), but they are also
very much based on a study of the morphology of the brain - both
the external surface morphology (sulci, gyri, and lobes), as well
as histological examination of internal structures. Many of
these studies of structure in space are likely to stay with us
for some time to come (and are well-founded in reality), though
as Tim Clark & Don have pointed out in this thread, nomenclature
is still a very significant problem even in this very "old" field.
g) licensing of NN - Doug et al. formerly had a completely open
policy to distributing NN. The only a reason a license was
instituted was at some point about 5 years back another group
sucked down the entirety of NN, reworked a lot of what was there
- probably with very practical goals directed toward making NN
more "correct" and effective in their problem domain - then
"republished" their product as "NeuroNames". This lead to a
great deal of confusion. The fact they chose to do this on sly
also meant the work they did was not necessarily compatible with
the work done by Doug et al.. In order to avoid this happening
again, it was decided a license would be established to
discourage this sort of behavior. As anyone who has developed a
terminology and/or ontology, it is absolutely essential there
remain a single curating authority, if the value of the resource
is to remain in tact. The "vetting" performed by the central
authority - as is extensively done by the curators of the Gene
Ontology, for instance - is absolutely essential to the
guaranteeing the integrity of the knowledge resource. This is
not a "closed" or proprietary process, just a highly controlled
one. Unfortunately, Doug Bowden's resources are MUCH MUCH
smaller than those available to the curators/developers of GO, so
the NN curation effort necessarily moves at a slower pace.
2) Working with the Neuroscience community
As Kei, Don, and others have stated, it would be unwise to
proceed in creating an "open source" neuroanatomical ontology
without interacting with the researchers who've already put a lot
of effort into this problem over the past decade or so. With
this in mind, I have several suggestions:
a) The 5 ways of knowing neuroanatomy:
This is a pitch I've been making which I think helps to sum up
the current ways various sub-fields have attempted to identify/
label/collate brain morphology
i) Terminlogies - e.g., NN, BrainLex
ii) Ontologies - e.g., Neuro-FMA (the project Peter Mork
referred to)
iii) Literature Informatics (CocoMac, BrainMap, NeuroScholar,
BAMS, ArrowSmith, etc.).
These are very mature projects. Some include their own
mereotopological reasoning systems (e.g., CocoMac and BrainMap)
in order to be able to pool and compare the relatedness of
structures and connectivity across different studies in the
literature. The goal in this category is to perform large-scale
semantic mining of the literature to confirm/refute current
knowledge and uncover new correlations - very much along the
lines of what The NeuroCommons Project expects to achieve via use
of semantic web technologies. Some researchers in this category
are actually participating in The NeuroCommons Project (i.e.,
Gully Burns, who developed NeuroScholar).
iv) voxel/pixel analysis:
This approach applies computer vision algorithms to
automatically - or semi-automatically - identify 2D & 3D shapes
in digital anatomical images. This field is also extremely
mature, though there are many significant caveats to exactly how
much of this work can be effectively automated.
v) parameterized models:
Often these are derived from - or used to drive - the voxel/
pixel based analysis described in 'iv' - though the spatial
modeling is definitely a distinct approach from the pure voxel/
pixel approach.
None of studies you'd fit into these categories exclusively focus
on their technique/tool alone without some aspect of the other
"ways of knowing neuroanatomy" playing a role in what they do.
However, it is clear much fundamental work in this area primarily
focuses on one technique over the others.
Having said that, when the neuroscience community makes use of
this work to examine a specific biological problem, they will
often draw significant tools and resources from more than one of
these domains.
b) NCBO/NCOR sponsored meeting focused on mereotopology in
neuroanatomy:
Barry Smith is working to bring together researchers working in
the 5 domains described above. There is a very pressing need in
large-scale, field-wide neuroinformatics projects such as what is
being done in the BIRN project to have these 5 domains converge
and work more cooperatively. Right now, a lot of manual effort
has to be put out to bring them together. This is something BIRN
has been pursuing. In the last 6 months, we have received a
great deal of support and guidance on this effort from NCBO.
Daniel Rubin interacts directly with the BIRN Ontology Task
Force, and the work Barry Smith has been doing with FMA, OBO,
FuGO, and PATO have very much begun to create a much more well-
founded and computable path toward performing large-scale
annotation of neuroimaging data.
This meeting is on the NCBO/NCOR slate for 2007, but in the
interim I hope to see more effort invested in the coming year
across the 5 communities listed above toward the goal of
integrating across these "ways of knowing" now that the need has
been recognized.
3) Microarrays:
Just as Don, Kei, Alan R., and others have pointed out, high-
throughput assays - microarrays, BAC-based IHC, in situ studies
using the Gene Paint technology employed by the Allen Institute
of Brain Science to construct the Allen Brain Atlas of gene
expression in the brain - are going to transform our
understanding of neuroanatomy over the coming decades. This is
just a given. There is a pressing need to derive a means to
integrate spatially-mapped studies of gene & protein expression
into a neuroimaging setting. The spatial resolution may be very
coarse - e.g., "whole brain" - but they still provide sufficient
spatial information to be usable in the context of a
neuroanatomical coordinate system.
We are working in the BIRN project to create a means for
researchers to integrate these distinct approaches to studying
the brain. As Alan R. pointed out, FuGO is working to put
description of microarray experiments on a solid, formal footing,
and I would expect one aspect of that will be to represent
microarray data in RDF/OWL. This is not a trivial problem, given
as much of the available data is merely MIAME-compliant - MIAME
not even being a data format, but just a collection of minimal
data requirements. One need only look at the great complexity of
the data submission process at the NCBI GEO site to get an
appreciation for how difficult this problem can be. A great deal
of effort is being invested in the microarray field to come up
with a better means handle this issue, and the FuGO effort will
be a critical clearinghouse for this work. The important thing
to remember when it comes to field-wide data pooling and re-
analysis, it may sometimes be necessary to get right back to the
microarray primary image files so as to reapply different
criterial when performing the statistical tests and reductions
on pooled data. Given this requirement - one we also see in the
neuroimaging domain - I believe it is very important to proceed
in a well-reasoned manner when seeking to integrate across
microarray datasets using semantic web technologies. Alan R. and
myself - possibly others too - on this list are on the FuGO
Coordinators Committee, so hopefully we can help to keep those
lines of communication open.
Sorry to go on so, but this is a topic on which I've labored
quite intensively over the past year. There is a lot being done
on this issue, and I think all efforts will get much further more
quickly - and in a way that will carry more street cred with
practicing neuroscientists - if we all try to work together.
Cheers,
Bill
Bill Bug
Senior Analyst/Ontological Engineer
Laboratory for Bioimaging & Anatomical Informatics
www.neuroterrain.org
Department of Neurobiology & Anatomy
Drexel University College of Medicine
2900 Queen Lane
Philadelphia, PA 19129
215 991 8430 (ph)
610 457 0443 (mobile)
215 843 9367 (fax)
Please Note: I now have a new email - [EMAIL PROTECTED]
This email and any accompany attachments are confidential. This
information is intended solely for the use of the individual to
whom it is addressed. Any review, disclosure, copying,
distribution, or use of this email communication by others is
strictly prohibited. If you are not the intended recipient please
notify us immediately by returning this message to the sender and
delete all copies. Thank you for your cooperation.
Bill Bug
Senior Analyst/Ontological Engineer
Laboratory for Bioimaging & Anatomical Informatics
www.neuroterrain.org
Department of Neurobiology & Anatomy
Drexel University College of Medicine
2900 Queen Lane
Philadelphia, PA 19129
215 991 8430 (ph)
610 457 0443 (mobile)
215 843 9367 (fax)
Please Note: I now have a new email - [EMAIL PROTECTED]
This email and any accompany attachments are confidential. This
information is intended solely for the use of the individual to
whom it is addressed. Any review, disclosure, copying,
distribution, or use of this email communication by others is
strictly prohibited. If you are not the intended recipient please
notify us immediately by returning this message to the sender and
delete all copies. Thank you for your cooperation.
Bill Bug
Senior Analyst/Ontological Engineer
Laboratory for Bioimaging & Anatomical Informatics
www.neuroterrain.org
Department of Neurobiology & Anatomy
Drexel University College of Medicine
2900 Queen Lane
Philadelphia, PA 19129
215 991 8430 (ph)
610 457 0443 (mobile)
215 843 9367 (fax)
Please Note: I now have a new email - [EMAIL PROTECTED]
This email and any accompany attachments are confidential. This information is
intended solely for the use of the individual to whom it is addressed. Any
review, disclosure, copying, distribution, or use of this email communication
by others is strictly prohibited. If you are not the intended recipient please
notify us immediately by returning this message to the sender and delete all
copies. Thank you for your cooperation.