Alan does an excellent job summing up how the issues discussed in
this thread will ultimately need to be brought to bear in asserting
the details of biological reality in such a way that algorithms will
be able to assist us in reliably inferring new, MEANINGFUL relations.
As he states, there are ways in which a protein can end up in a
specific tissue other than being expressed by the cells in that tissue.
In fact, I'd go one further and say there will be times when the
combination of asserted and inferred relations will need to represent
the location of an instance of protein X - and the process(es) via
which it became located there - at multiple levels of resolution -
e.g., in an instance of a specific tissue A, in an instance of
specific cells in that tissue, in an instance of specific a sub-
cellular compartment in those cells, in a particular mereotopological
relation to instances of other protein classes in that sub-cellular
compartment.
There will also be applications where we'll need to represent both
the processes by which an instance of protein X ended up in a
specific location, the process(es) in which it participated along the
way (and at its final destination), and express how the instances of
the objects participating in the instances of those processes evolved
through time.
I know this may seem overly complex, but you could pick up virtually
any research article reporting a novel finding in biomedical science
- from the behavior of some set of organisms in an ecosystem to the
behavior of some set of atoms in a GC/Mass Spec device where that
seeming complexity is dealt with as a commonplace.
If we expect the application of formal semantic informatic techniques
to yield the manner of novelty that has accrued through use of linear
pattern discovery techniques in the biomolecular informatics
community (e.g., sequence homologies, hydrophobicity profiles, gene
finding, algorithmic probe set construction, restriction fragment re-
assembly, etc.), we'll need to encapsulate this manner of complexity
in our representations of biological reality.
Documenting associated provenance information for the statements -
both the asserted and the inferred statements - is obviously a
critical part of this process (as has been stated often by many on
this list - and has been pursued in systems such as SWAN and others)
- both to accommodate the required disagreement amongst authorities,
as well as to classify the statements in order to perform further
analysis - e.g., in examining the binding of ligands to receptors,
there will be situations where one will want to restrict the
inferencing/analysis to those statements derived from ligand-receptor
interactions that lead to functional consequences and for which there
is corroborating evidence from a functional assay - in other words,
not just statements such as "an instance of ligand X bound to an
instance of receptor Y", but an "an instance of ligand X bound to an
instance of receptor Y leading to consequence Z" (e.g., increased
intracellular Ca++, activation of Protein Kinase A, more frequent
openning of I.K.A ion channels, etc.), where the evidence = some
functional assay for consequence Z.
One might also want to restrict your analysis to statements made
about instances in public data repositories (as opposed to statements
derived from instances in a literature databases) to determine
whether the inferable statements match those in the literature based
on analysis of the same collection of experimental results.
Cheers,
Bill
On May 17, 2007, at 11:07 PM, Alan Ruttenberg wrote:
On May 17, 2007, at 6:34 PM, Eric Jain wrote:
There does indeed seem to be an existing has_participant
predicate, but is there also a "protein expression process" class?
This would seem rather contrived, from a biologists (if not an
ontologists) point of view (all we want to say, after all, is that
the protein can be found in some tissue)!
If you want to say that the protein is found in some tissue, that's
what should be said. However, in your email you wrote that the
protein is expressed in the tissue. They are not the same, and I
think that in our semweb representations we should take care to not
confuse them, though in language they are easily interchanged and
we still (often) understand what each other is talking about.
If it is know to be found in the tissue I would make the subclass
be the subclass of the protein each instance of which is located
in some instance of the tissue. No processes involved at all.
Using widely used concepts and predicates is no doubt a good
thing. But if you can instead make do with core RDF features,
that's even better -- not everyone uses OBO, no matter how
"foundational" it may be :-)
I don't think we can make due with core RDF features, if we want to
have agents that make reasonably inferences based on what they are
told. RDF is just too weak to do much of anything in this
direction. OTOH, if the RDF is always going to be interpreted by a
human - essentially you are using RDF as an opaque (from a machine
agent point of view) syntax, then there is no problem. I guess I am
hoping my machines to help me more than that.
Note that the reification "design pattern" allows you to add
attribution information on statements that you did not at first
think would ever need such information, without breaking the data
model.
As long as those statements are single triples. It gets more
involved when statements are more than a single triple, as they
often will be.
-Alan
Bill Bug
Senior Research Analyst/Ontological Engineer
Laboratory for Bioimaging & Anatomical Informatics
www.neuroterrain.org
Department of Neurobiology & Anatomy
Drexel University College of Medicine
2900 Queen Lane
Philadelphia, PA 19129
215 991 8430 (ph)
610 457 0443 (mobile)
215 843 9367 (fax)
Please Note: I now have a new email - [EMAIL PROTECTED]