Alan does an excellent job summing up how the issues discussed in this thread will ultimately need to be brought to bear in asserting the details of biological reality in such a way that algorithms will be able to assist us in reliably inferring new, MEANINGFUL relations.

As he states, there are ways in which a protein can end up in a specific tissue other than being expressed by the cells in that tissue.

In fact, I'd go one further and say there will be times when the combination of asserted and inferred relations will need to represent the location of an instance of protein X - and the process(es) via which it became located there - at multiple levels of resolution - e.g., in an instance of a specific tissue A, in an instance of specific cells in that tissue, in an instance of specific a sub- cellular compartment in those cells, in a particular mereotopological relation to instances of other protein classes in that sub-cellular compartment.

There will also be applications where we'll need to represent both the processes by which an instance of protein X ended up in a specific location, the process(es) in which it participated along the way (and at its final destination), and express how the instances of the objects participating in the instances of those processes evolved through time.

I know this may seem overly complex, but you could pick up virtually any research article reporting a novel finding in biomedical science - from the behavior of some set of organisms in an ecosystem to the behavior of some set of atoms in a GC/Mass Spec device where that seeming complexity is dealt with as a commonplace.

If we expect the application of formal semantic informatic techniques to yield the manner of novelty that has accrued through use of linear pattern discovery techniques in the biomolecular informatics community (e.g., sequence homologies, hydrophobicity profiles, gene finding, algorithmic probe set construction, restriction fragment re- assembly, etc.), we'll need to encapsulate this manner of complexity in our representations of biological reality.

Documenting associated provenance information for the statements - both the asserted and the inferred statements - is obviously a critical part of this process (as has been stated often by many on this list - and has been pursued in systems such as SWAN and others) - both to accommodate the required disagreement amongst authorities, as well as to classify the statements in order to perform further analysis - e.g., in examining the binding of ligands to receptors, there will be situations where one will want to restrict the inferencing/analysis to those statements derived from ligand-receptor interactions that lead to functional consequences and for which there is corroborating evidence from a functional assay - in other words, not just statements such as "an instance of ligand X bound to an instance of receptor Y", but an "an instance of ligand X bound to an instance of receptor Y leading to consequence Z" (e.g., increased intracellular Ca++, activation of Protein Kinase A, more frequent openning of I.K.A ion channels, etc.), where the evidence = some functional assay for consequence Z.

One might also want to restrict your analysis to statements made about instances in public data repositories (as opposed to statements derived from instances in a literature databases) to determine whether the inferable statements match those in the literature based on analysis of the same collection of experimental results.

Cheers,
Bill

On May 17, 2007, at 11:07 PM, Alan Ruttenberg wrote:


On May 17, 2007, at 6:34 PM, Eric Jain wrote:

There does indeed seem to be an existing has_participant predicate, but is there also a "protein expression process" class? This would seem rather contrived, from a biologists (if not an ontologists) point of view (all we want to say, after all, is that the protein can be found in some tissue)!

If you want to say that the protein is found in some tissue, that's what should be said. However, in your email you wrote that the protein is expressed in the tissue. They are not the same, and I think that in our semweb representations we should take care to not confuse them, though in language they are easily interchanged and we still (often) understand what each other is talking about.

If it is know to be found in the tissue I would make the subclass be the subclass of the protein each instance of which is located in some instance of the tissue. No processes involved at all.

Using widely used concepts and predicates is no doubt a good thing. But if you can instead make do with core RDF features, that's even better -- not everyone uses OBO, no matter how "foundational" it may be :-)

I don't think we can make due with core RDF features, if we want to have agents that make reasonably inferences based on what they are told. RDF is just too weak to do much of anything in this direction. OTOH, if the RDF is always going to be interpreted by a human - essentially you are using RDF as an opaque (from a machine agent point of view) syntax, then there is no problem. I guess I am hoping my machines to help me more than that.

Note that the reification "design pattern" allows you to add attribution information on statements that you did not at first think would ever need such information, without breaking the data model.

As long as those statements are single triples. It gets more involved when statements are more than a single triple, as they often will be.

-Alan




Bill Bug
Senior Research Analyst/Ontological Engineer

Laboratory for Bioimaging  & Anatomical Informatics
www.neuroterrain.org
Department of Neurobiology & Anatomy
Drexel University College of Medicine
2900 Queen Lane
Philadelphia, PA    19129
215 991 8430 (ph)
610 457 0443 (mobile)
215 843 9367 (fax)


Please Note: I now have a new email - [EMAIL PROTECTED]




Reply via email to