Brief comment regarding the topic mapping work Bill mentions here.
Doug Bowden recently participated in a workshop on "Ontology Federation"
here at SRI. We had several speakers from both the topic mapping world,
and bioinformatics: Douglas Bowden and Peter Karp, and Steve Newcomb,
Patrick Durusau, and myself. Vinay Chaudhri opened the workshop and
participated. Richard Fikes spoke from the perspective of the KR
community. KR lies, of course, at the roots of the work products we all
create.
We are moving away from the XTM topic mapping specification, and into
the TMRM [1] topic maps reference model, the product of which we now
call "subject maps" to distinguish topic maps from a slightly different
paradigm. Subject maps are no longer constrained by a preselected
ontology (XTM) and can be implemented using key/value properties of the
author's choice. This permits authors to create subject maps that can
mimic any frame-like language chosen, including, I suppose, OWL.
There exists a necessary and important tension between the use cases of
traditional KR and those for which the topic mapping paradigm has been
created and shown useful. When Bill mentions "more semantic web
compliant", I would ask questions derived from two important use cases.
The two use cases do not circumscribe the entire field of KR, but they
serve as place holders to delimit a useful discussion between
ontologists and subject mappers. I will argue that both ontologies and
subject maps are valuable, and they can serve users together. The two
use cases about which I speak are:
1- accurately answering questions according to some authority
2- understanding some universe of discourse, even where conflicting
world views exist
Use of authoritative ontologies is clearly the domain of question
answering. Understanding some universe of discourse is also rightly the
domain of ontologies, but here, subject maps offer the opportunity to
"federate" disparate world views into a unified framework organized
around subjects. Where ontological entities carry information resources
that speak to the same subject, then those entities are merged into a
single subject map entity -- a subject proxy -- regardless of conflicts
between messages conveyed. There are benefits to be derived from such
merging operations. Patrick Durusau and I spoke to this topic in a
teleconference to the Ontolog community [2], and slides and an mp3 of
the talk are available. There will be other papers released soon on
these opportunities.
It was in the spirit of this federation opportunity that Doug Bowden and
I first spoke. To be "semantic web compliant", it is always possible for
our subject map portal to carry plenty of RDF metadata. It remains to be
answered whether the goal of such metadata is to accurately answer
specific questions, or to just advertise the presence of world views.
Bioinformatics, in all of its many manifestations, I strongly believe,
will benefit from collaborations between ontologists and subject mappers.
Jack
[1] http://www.isotopicmaps.org/tmrm/
[2] http://ontolog.cim3.net/cgi-bin/wiki.pl?ConferenceCall_2006_04_27
William Bug wrote:
Hi All,
Sorry - I'd thought I'd already subscribed to this list, but
apparently not - until now.
The need for a mereotopologically-sound, neuroanatomical ontology is
quite pressing across the community of neuroscientists involved in
neuroinformatics projects most of which include a neuroimaging
component. Generally there is only one thing neuroscientists are
interested in when analyzing images at whatever resolution from the
macromolecular (EM) on up to the macroscopic - i.e., identifying
biologically relevant shapes. In order for these shapes to have any
meaning in a context where one attempts to pool data and perform
relevant data reduction operations, the shapes must exist within a
shared coordinate space of some sort. For instance, if two separate
labs are examining the change in the size of the Substantia Nigra
during the course of Parkinsonian neurodegeneration, in order for them
to compare their observations, they require several data
integration/semantic frameworks:
- a shared neuroanatomical terminology
- a shared coordinate space (to place the shapes from their images
in a comparable coordinate framework)
- a shared, well-founded anatomical ontology which encapsulates
mereotopological knowledge about shapes in - at least - 3D space.
Other knowledge resources can be helpful in supplementing this array
of tools, but, generally, these are the absolute minimum.
[NOTE: the Wikipedia has a moderately clear definition of
mereotopology (http://en.wikipedia.org/wiki/Mereotopology).
Basically, it combines a formal, ontological theory of shapes and
boundaries (mereology) with the mathematics of topology with the goal
of providing a computational formalism to support applying logical
operations to objects in space. As has been pointed out by others, a
great deal of the work in this field of applied biomedical
mereotopology derives from related work in the GIS field. Use of
mereotopology by geographers has been going on for quite some time and
is much more advanced. Work from GIS can be adapted for use in the
biomedical domain, but it must be done with great care, as many of the
assumptions behind the way researchers represent space and manner of
information being represented can differ significantly across these
disciplines.]
The same is true as you scale this problem up to field-wide projects
such as BIRN or The NeuroCommons.
As several have mentioned in this thread, there are already existing
resources that can begin to fill this need.
1) NeuroNames
Kei, Olivier, Peter Mork, and others have already given sufficient
references on NeuroNames in this thread, so that others can dig in
deeper to the specifics if they like.
Having worked with Doug Bowden, Mark Dubach, and their colleagues over
the last year or so in an advisory capacity on the specific issue of
use of NeuroNames for semantically-based, neuroanatomical data set
integration, I can add a few important qualifying points:
a) Doug et al. have been working on the extremely difficult task
of unifying neuroanatomical terminologies across mammalian species for
20 years now. Embedded in Neuronames & Braininfo, there is a wealth
of hard won empirical knowledge related to how one achieves this end.
I think it would be ill-advised to try to duplicate their effort, as
the myriad scientific problems related to this effort would surely
present themselves again and only need to be worked out once one.
b) Doug et al. are extremely collegial and quite receptive to
feedback and collaboration - within the bounds of their limited
resources.
c) NeuroNames is a terminological resource - not a well-founded,
spatial ontology of brain anatomy capable of supporting
mereotopological reasoning. As with most research-based
terminologies, there are many semantically-based relations embedded in
the NeuroNames graphs, but as the primary goal of NN is to
disambiguate and integrate across the neuroanatomical lexicon, the
embedded semantic information can often lead to a logical dead end.
For instance, many neuroanatomical terms critical to specifying
location in the rodent brain have been placed in the NN category
"ancillary terms," as they don't fit into the core hierarchy in an
unambiguous way. This can make use of NN for annotating mouse brain
gene & protein expression patterns (e.g., GENSAT, the Allen Brain
Atlas, various BIRN projects) extremely problematic.
d) The NN primary structures
(http://braininfo.rprc.washington.edu/indexabout.html) provide the
closest thing to an ontology in NN. As Peter Mork pointed out, there
has been an effort in the past to unite this core NN hierarchy with
the FMA, which does provide a mereotopologically sound framework for
anatomy. Barry Smith (formal ontologist who has worked for over a
decade on problems in biomedical ontology - most especially, though
hardly exclusively, in the area of mereotopological reasoning) and his
colleagues have worked closely with the Cornelius Rosse and his
colleagues at the FMA project to create in association with the work
started in the FMA a foundational ontology for biomedicine (the
Ontology of Biological Reality) that is becoming increasingly
important to all of the ontologies being monitored by NCBO and
incorporated into the OBO site and the emerging OBO Foundary
(http://obofoundry.org/).
e) Doug and his colleagues have worked closely with Jack Park (a
consulting scientist to SRI's AI Center - http://www.ai.sri.com/) to
represent NN as a TopicMap (XTM). As many on this list may know,
there has been a moderate amount of effort to integrate and/or
reconcile XTM with RDF here at the W3C (search on "TopicMaps" at the
main RDF page - http://www.w3.org/RDF/). I'm not certain how this
effort will ultimately make NN more "semantic web" compliant, but the
bottom line is a great deal of effort has already been expended to
express NN in a semantically well-grounded formalism.
f) Though - as Don points out - neuroanatomical representations
are likely to significantly evolve over the coming decades, as the
number of large scale gene & protein expression characterization
studies focussed on the brain continue to accumulate. Having said
that, the "conventional" view of neuroanatomy will likely remain
relevant for a long while to come, not only because it has been used
to characterize findings in the literature for the last 125+ years,
but also because it did derive from a wealth of empirical observation
which is likely to remain valid in many domains of neuroanatomical
study. I would also modify Don's well informed comment regarding the
derivation of "conventional" views of neuroanatomy. To a large extent
they are related to functional studies of the brain - as well as
lesion based studies of functional deficits dating back to the 19th
century (think "Broca's Area"), but they are also very much based on a
study of the morphology of the brain - both the external surface
morphology (sulci, gyri, and lobes), as well as histological
examination of internal structures. Many of these studies of
structure in space are likely to stay with us for some time to come
(and are well-founded in reality), though as Tim Clark & Don have
pointed out in this thread, nomenclature is still a very significant
problem even in this very "old" field.
g) licensing of NN - Doug et al. formerly had a completely open
policy to distributing NN. The only a reason a license was instituted
was at some point about 5 years back another group sucked down the
entirety of NN, reworked a lot of what was there - probably with very
practical goals directed toward making NN more "correct" and effective
in their problem domain - then "republished" their product as
"NeuroNames". This lead to a great deal of confusion. The fact they
chose to do this on sly also meant the work they did was not
necessarily compatible with the work done by Doug et al.. In order to
avoid this happening again, it was decided a license would be
established to discourage this sort of behavior. As anyone who has
developed a terminology and/or ontology, it is absolutely essential
there remain a single curating authority, if the value of the resource
is to remain in tact. The "vetting" performed by the central
authority - as is extensively done by the curators of the Gene
Ontology, for instance - is absolutely essential to the guaranteeing
the integrity of the knowledge resource. This is not a "closed" or
proprietary process, just a highly controlled one. Unfortunately,
Doug Bowden's resources are MUCH MUCH smaller than those available to
the curators/developers of GO, so the NN curation effort necessarily
moves at a slower pace.
2) Working with the Neuroscience community
As Kei, Don, and others have stated, it would be unwise to proceed in
creating an "open source" neuroanatomical ontology without interacting
with the researchers who've already put a lot of effort into this
problem over the past decade or so. With this in mind, I have several
suggestions:
a) The 5 ways of knowing neuroanatomy:
This is a pitch I've been making which I think helps to sum up
the current ways various sub-fields have attempted to
identify/label/collate brain morphology
i) Terminlogies - e.g., NN, BrainLex
ii) Ontologies - e.g., Neuro-FMA (the project Peter Mork
referred to)
iii) Literature Informatics (CocoMac, BrainMap, NeuroScholar,
BAMS, ArrowSmith, etc.).
These are very mature projects. Some include their own
mereotopological reasoning systems (e.g., CocoMac and BrainMap) in
order to be able to pool and compare the relatedness of structures and
connectivity across different studies in the literature. The goal in
this category is to perform large-scale semantic mining of the
literature to confirm/refute current knowledge and uncover new
correlations - very much along the lines of what The NeuroCommons
Project expects to achieve via use of semantic web technologies. Some
researchers in this category are actually participating in The
NeuroCommons Project (i.e., Gully Burns, who developed NeuroScholar).
iv) voxel/pixel analysis:
This approach applies computer vision algorithms to
automatically - or semi-automatically - identify 2D & 3D shapes in
digital anatomical images. This field is also extremely mature,
though there are many significant caveats to exactly how much of this
work can be effectively automated.
v) parameterized models:
Often these are derived from - or used to drive - the
voxel/pixel based analysis described in 'iv' - though the spatial
modeling is definitely a distinct approach from the pure voxel/pixel
approach.
None of studies you'd fit into these categories exclusively focus on
their technique/tool alone without some aspect of the other "ways of
knowing neuroanatomy" playing a role in what they do. However, it is
clear much fundamental work in this area primarily focuses on one
technique over the others.
Having said that, when the neuroscience community makes use of this
work to examine a specific biological problem, they will often draw
significant tools and resources from more than one of these domains.
b) NCBO/NCOR sponsored meeting focused on mereotopology in
neuroanatomy:
Barry Smith is working to bring together researchers working
in the 5 domains described above. There is a very pressing need in
large-scale, field-wide neuroinformatics projects such as what is
being done in the BIRN project to have these 5 domains converge and
work more cooperatively. Right now, a lot of manual effort has to be
put out to bring them together. This is something BIRN has been
pursuing. In the last 6 months, we have received a great deal of
support and guidance on this effort from NCBO. Daniel Rubin interacts
directly with the BIRN Ontology Task Force, and the work Barry Smith
has been doing with FMA, OBO, FuGO, and PATO have very much begun to
create a much more well-founded and computable path toward performing
large-scale annotation of neuroimaging data.
This meeting is on the NCBO/NCOR slate for 2007, but in the
interim I hope to see more effort invested in the coming year across
the 5 communities listed above toward the goal of integrating across
these "ways of knowing" now that the need has been recognized.
3) Microarrays:
Just as Don, Kei, Alan R., and others have pointed out,
high-throughput assays - microarrays, BAC-based IHC, in situ studies
using the Gene Paint technology employed by the Allen Institute of
Brain Science to construct the Allen Brain Atlas of gene expression in
the brain - are going to transform our understanding of neuroanatomy
over the coming decades. This is just a given. There is a pressing
need to derive a means to integrate spatially-mapped studies of gene &
protein expression into a neuroimaging setting. The spatial resolution
may be very coarse - e.g., "whole brain" - but they still provide
sufficient spatial information to be usable in the context of a
neuroanatomical coordinate system.
We are working in the BIRN project to create a means for
researchers to integrate these distinct approaches to studying the
brain. As Alan R. pointed out, FuGO is working to put description of
microarray experiments on a solid, formal footing, and I would expect
one aspect of that will be to represent microarray data in RDF/OWL.
This is not a trivial problem, given as much of the available data is
merely MIAME-compliant - MIAME not even being a data format, but just
a collection of minimal data requirements. One need only look at the
great complexity of the data submission process at the NCBI GEO site
to get an appreciation for how difficult this problem can be. A great
deal of effort is being invested in the microarray field to come up
with a better means handle this issue, and the FuGO effort will be a
critical clearinghouse for this work. The important thing to remember
when it comes to field-wide data pooling and re-analysis, it may
sometimes be necessary to get right back to the microarray primary
image files so as to reapply different criterial when performing the
statistical tests and reductions on pooled data. Given this
requirement - one we also see in the neuroimaging domain - I believe
it is very important to proceed in a well-reasoned manner when seeking
to integrate across microarray datasets using semantic web
technologies. Alan R. and myself - possibly others too - on this list
are on the FuGO Coordinators Committee, so hopefully we can help to
keep those lines of communication open.
Sorry to go on so, but this is a topic on which I've labored quite
intensively over the past year. There is a lot being done on this
issue, and I think all efforts will get much further more quickly -
and in a way that will carry more street cred with practicing
neuroscientists - if we all try to work together.
Cheers,
Bill
Bill Bug
Senior Analyst/Ontological Engineer
Laboratory for Bioimaging & Anatomical Informatics
www.neuroterrain.org
Department of Neurobiology & Anatomy
Drexel University College of Medicine
2900 Queen Lane
Philadelphia, PA 19129
215 991 8430 (ph)
610 457 0443 (mobile)
215 843 9367 (fax)
Please Note: I now have a new email - [EMAIL PROTECTED]