Re: Advancing translational research with the Semantic Web

William Bug Thu, 17 May 2007 23:36:53 -0700

Alan does an excellent job summing up how the issues discussed inthis thread will ultimately need to be brought to bear in assertingthe details of biological reality in such a way that algorithms willbe able to assist us in reliably inferring new, MEANINGFUL relations.

As he states, there are ways in which a protein can end up in aspecific tissue other than being expressed by the cells in that tissue.

In fact, I'd go one further and say there will be times when thecombination of asserted and inferred relations will need to representthe location of an instance of protein X - and the process(es) viawhich it became located there - at multiple levels of resolution -e.g., in an instance of a specific tissue A, in an instance ofspecific cells in that tissue, in an instance of specific a sub-cellular compartment in those cells, in a particular mereotopologicalrelation to instances of other protein classes in that sub-cellularcompartment.

There will also be applications where we'll need to represent boththe processes by which an instance of protein X ended up in aspecific location, the process(es) in which it participated along theway (and at its final destination), and express how the instances ofthe objects participating in the instances of those processes evolvedthrough time.

I know this may seem overly complex, but you could pick up virtuallyany research article reporting a novel finding in biomedical science- from the behavior of some set of organisms in an ecosystem to thebehavior of some set of atoms in a GC/Mass Spec device where thatseeming complexity is dealt with as a commonplace.

If we expect the application of formal semantic informatic techniquesto yield the manner of novelty that has accrued through use of linearpattern discovery techniques in the biomolecular informaticscommunity (e.g., sequence homologies, hydrophobicity profiles, genefinding, algorithmic probe set construction, restriction fragment re-assembly, etc.), we'll need to encapsulate this manner of complexityin our representations of biological reality.

Documenting associated provenance information for the statements -both the asserted and the inferred statements - is obviously acritical part of this process (as has been stated often by many onthis list - and has been pursued in systems such as SWAN and others)- both to accommodate the required disagreement amongst authorities,as well as to classify the statements in order to perform furtheranalysis - e.g., in examining the binding of ligands to receptors,there will be situations where one will want to restrict theinferencing/analysis to those statements derived from ligand-receptorinteractions that lead to functional consequences and for which thereis corroborating evidence from a functional assay - in other words,not just statements such as "an instance of ligand X bound to aninstance of receptor Y", but an "an instance of ligand X bound to aninstance of receptor Y leading to consequence Z" (e.g., increasedintracellular Ca++, activation of Protein Kinase A, more frequentopenning of I.K.A ion channels, etc.), where the evidence = somefunctional assay for consequence Z.

One might also want to restrict your analysis to statements madeabout instances in public data repositories (as opposed to statementsderived from instances in a literature databases) to determinewhether the inferable statements match those in the literature basedon analysis of the same collection of experimental results.


Cheers,
Bill

On May 17, 2007, at 11:07 PM, Alan Ruttenberg wrote:

On May 17, 2007, at 6:34 PM, Eric Jain wrote:
There does indeed seem to be an existing has_participantpredicate, but is there also a "protein expression process" class?This would seem rather contrived, from a biologists (if not anontologists) point of view (all we want to say, after all, is thatthe protein can be found in some tissue)!
If you want to say that the protein is found in some tissue, that'swhat should be said. However, in your email you wrote that theprotein is expressed in the tissue. They are not the same, and Ithink that in our semweb representations we should take care to notconfuse them, though in language they are easily interchanged andwe still (often) understand what each other is talking about.
If it is know to be found in the tissue I would make the subclassbe the subclass of the protein each instance of which is locatedin some instance of the tissue. No processes involved at all.
Using widely used concepts and predicates is no doubt a goodthing. But if you can instead make do with core RDF features,that's even better -- not everyone uses OBO, no matter how"foundational" it may be :-)
I don't think we can make due with core RDF features, if we want tohave agents that make reasonably inferences based on what they aretold. RDF is just too weak to do much of anything in thisdirection. OTOH, if the RDF is always going to be interpreted by ahuman - essentially you are using RDF as an opaque (from a machineagent point of view) syntax, then there is no problem. I guess I amhoping my machines to help me more than that.
Note that the reification "design pattern" allows you to addattribution information on statements that you did not at firstthink would ever need such information, without breaking the datamodel.
As long as those statements are single triples. It gets moreinvolved when statements are more than a single triple, as theyoften will be.
-Alan




Bill Bug
Senior Research Analyst/Ontological Engineer

Laboratory for Bioimaging  & Anatomical Informatics
www.neuroterrain.org
Department of Neurobiology & Anatomy
Drexel University College of Medicine
2900 Queen Lane
Philadelphia, PA    19129
215 991 8430 (ph)
610 457 0443 (mobile)
215 843 9367 (fax)


Please Note: I now have a new email - [EMAIL PROTECTED]

Re: Advancing translational research with the Semantic Web

Reply via email to