Re: blog: semantic dissonance in uniprot

Pat Hayes Wed, 01 Apr 2009 13:57:52 -0700


On Mar 30, 2009, at 3:53 PM, Oliver Ruebenacker wrote:

    Hello Pat, All,

Let me try to take a step back and summarize what I think I learnedso far:


 The Ontologists have gained impressive mastery over what I would
call the World of Discrete Particulars. They know how to deal with
particulars, classes of particulars and cardinality restrictions, in
other words, integers and intervals of integers.

 They know how to say, for example, that a particular petri dish has
2000-3000 cells, each of which has 150-200 mitochondria, each of which
has 30-40 ATP molecules.

Actually no. What we can do very neatly is saying things like, everyPetri dish with such-and-such a relationship to certain substances andorganisms (those being defined possibly by the number of mitochondriatheir cells contain) will be a container which contains a substancewhich has at least a 0.03 percent concentration of ATP (or whatever: Idon't know enough biochemistry to make the numbers plausible.) But mypoint is, we are not obliged to be constantly talking about moleculesor cells. (This obsession with very small things and very largenumbers seems to be quite common, I have no idea why.)

 However, the Science community typically deals with what I would
call the World of Reproducible Fluxes. They think in terms of
reproducible scenarios, expectation values, variances, continuous
change, Gaussian distribution and differential equations.

Do they, indeed? Not the ones I talk to, but maybe I am in anunusually restricted community.

(Literally,
these is what I learned in the first meeting of the Physics 101 class
when I went to College.)

Yes, it sounds like Philosophy of Science 101 For Physicists, aprerequisite course for Quantum Physics 102, no doubt.

 They know how to say, for example, that if you dump a typical human
liver cell into a 0.2 percent solution of some drug, the ATP
concentration in the mitochondria will drop by a relative rate of 1.2
percent per second during the first ten minutes.

OK, that sounds like something that one might tackle using the OBOontologies (apart from the 'typical' word, a well-known issue. If thisis really critical, then one might have to resort to one of the knownworkarounds for nonmonotonic reasoning, such as having a class of'typical human liver cells' or a category of 'typicality' which isassumed to be true unless proven false.)


 Asking a scientist to define "expectation value" is akin to asking
an Ontologist to define "class".

 When the Ontologists say, they have "solved mereology long time
ago", they seem to mean: In the World of Discrete Particulars. Not in
the World of Reproducible Fluxes.

 As it happens, Scientists discover OWL to produce ontologies that
live in the World of Reproducible Fluxes, such as BioPAX. Even
Scientists agree that it has weaknesses.

 Ontologist: BioPAX is nonsense! BioPAX is ill-defined! It is not an
ontology. It is not OWL.

 Scientist: I know it has weaknesses, but is it really that bad?

 Ontologist: Totally! We need to rebuild it from scratch.

 Scientist: OK, if you say so.

 (Ontologist goes and much later comes back with a new BioPAX, living
entirely in the World of Discrete Particulars)

 Ontologist: Here! A wonderful new BioPAX. Perfect ontology, perfect
OWL. Everything is clear and well-defined.

 Scientist: I am sorry, I can not use that.

 Ontologist: Why not?

 Scientist: It lacks the most basic terms Science is based on, such
as reproducible scenarios, expectation values and rate of change.

 Ontologist: You need to translate those terms into the World of
Discrete Particulars, and then I will include them.

 Scientist: How do I do that?

 Ontologist: I have no clue.

 But one thing both the Ontologist and the Scientist would agree:
Mereology in the World of Discrete Particulars is not Rocket Science.

Sounds like you should be doing something other than taking up time onontology discussion lists. Good luck with whatever it is that youScientists do.


Pat Hayes

    Take care
    Oliver

On Mon, Mar 30, 2009 at 12:35 PM, Pat Hayes <pha...@ihmc.us> wrote:
On Mar 30, 2009, at 9:59 AM, Oliver Ruebenacker wrote:
   Hello Pat, All,

On Sun, Mar 29, 2009 at 11:35 PM, Pat Hayes <pha...@ihmc.us> wrote:
On Mar 29, 2009, at 11:15 AM, Oliver Ruebenacker wrote:
 I am assuming that these classes all make a commitment about what
their instances mean, so users could declares instances and relyon
that commitment to be useful, right?
As I have been taken to task (offline) for agreeing with you,allow me tointercede. On this point, I think everyone is right. Yes, classesare
things
that have, or can have, instances, and that is all that a classis, in
effect. (RDFS makes this quite explicit by _defining_ classes to be
things
in the range of the rdf:type property.) On the other hand, it is
certainly
correct that ontologies can say a lot about classes without ever
mentioning
instances. On the other hand, it is also the case that, weresomeone to
(not
unreasonably) wish to connect such classes with their instances,theresulting conclusions should be correct, and if they were not,then this
would be a serious critique of the ontology.
 I have no idea what the person who took you to task was thinking,
but it seems related to the ongoing controversy over whether
substances should be instances or classes.
Which seems like a much more interesting topic, indeed.
 In BioPAX, we have a class physical entity with subclasses such as
protein. EGFR would be an instance of protein, and we could say in
BioPAX that EGFR has a sequence. (I would argue it should rather say
that EGFR matches a sequence pattern, but that is another story.)
The problem is that there are certain assumptions which BioPAXusers
are encouraged to follow, such as (1) if two physical entities refer
to the same record, they are identical (2) if two physical entities
refer to different records in the same source, they are notidentical
(3) if they are not identical, they have no overlap.
I don't think these assumptions are even asserted in the ontologyorthe documentation, but some BioPAX developers actively encourageusers
to rely on them ("Come on, they are true at least 95 percent of the
time"), and BioPAX lacks support for cases where they break down.
The fix I advocate is straight-forward: let the language beexplicit
about whether above assumptions are met or not and add support for
cases where they are not.
Hard to disagree with that.
This would include a property that expresses
that EGFR includes human EGFR, but since EGFR is not a class, itwould
not be owl:subClass.

 Others advocate a different approach: make all reference to
substances references to classes
Classes of what? That is, what would be the ultimate elements oftheseclasses and subclasses? I think it is vital to get this straightbeforeproceeding. Possible answers include: molecules (so EGFR is theclass of allmolecules that would be classified as an EGFR molecule); pieces of'stuff'in the mereological sense ("aggregates" as someone called them inthis
thread); protein-types, where a type is something that can always be
subdivided into subtypes according to some criterion, possibly oneyet to bediscovered; kinds of substance, where a substance is something thatcanpartake in mixtures or compounds to create other kinds ofsubstance, andpieces of which occupy space. And no doubt there are others, also.Thepoint, I should perhaps emphasize, is not to refer to individualsof thesevarious kinds, but to pin down a particular way of thinking thatcan be used
consistently to justify ontological design decisions.
, e.g. EGFR would be a subclass of
protein. This, they say, is "more natural".
It fits with the first model, above, in which we are always talkingabout
classes of molecule. Not so well with the 'substances' view.
 The obvious benefit is that it makes it clear that two distinct
substances may have overlap, since two distinct classes may have a
non-empty intersection, e.g. human EGFR and phospho-EFR would havethe
intersection human phospho-EGFR (assuming EGFR to be defined to
include phospho-EGFR).
That suggests the 'type/subtype" way of thinking.
 The obvious drawback is that everything becomes more complicated
since instead of properties of instances we would have property
restrictions over classes (e.g. instead of "EGFR matchesEGFRSequence"we would say "Every element of EGFR matches EGFRSequence"). Thataloneis a serious issue, since typical users like it as simple aspossible.
I don't think this is a serious issue, in fact. It is as easy tostate theproperty restriction than the property, in OWL; and in any case,simplicityin this sense is largely a matter of good human interface design,and it isvery bad engineering to base ontological decisions on interfacedesign.
 But that is not the most serious issue.
OK
The most severe problem seems to be that the class approach seemstobe incompatible with (1) observables being about statisticalensembles
and (2) populations being defined by location, not individual
membership - at least, if we want to avoid extreme complexity.

 (note: in what follows, all numbers are made up and probably not
realistic)
For example, how would we describe that "the concentration of ATPin
the mitochondrion is (3.2 +/- 0.7) mol per liter"? What does the
concentration inhere in?
Im not sure what this means. Are we talking about a particular
mitochondrion, or mitochondria in general? I guess the latter. Inwhich
case, the answer to the question is, it inheres in the class of
Mitochondria, which is presumably a subclass of CellularStructuresor some
such.
 Maybe the concentration is just a proxy for the particle number?
Say, the particle number of ATP in the mitochondrion is 24.7 +/-1.6.
What does that number inhere in?
Same answer, I guess. Though I don't know what a particle numberis, so this
really is a guess.
 Can we restrict ourselves to cases of definite particle numbers? In
Systems Biology, we often use differential equations to model how
things change over time, and that assumes they change gradually. But
nevertheless, let us say that the number of ATP in the mitochondrion
is 23. What does that number inhere in? One particular set of 23
molecules? But are we talking only about one particular cell, or are
we making a more general statement that applies to many cells?
I don't know, what are you wanting to say? General statements aremade (inOWL) by relating properties to classes (everything in this classhas this
value of this property...)
 A new ATP molecule is created, increasing the number of ATP
molecules to 24. The original set of 23 molecules still exists, and
its number is still 23. But that's not the number of ATP moleculesinthe mitochondrion any more. Also, what happens when an ATPmolecule is
destroyed, or wanders off to some place else?
Well, these are issues of describing change and time. That is a whole
ontological area that has been fairly extensively explored. But ifyou wantto be able to describe change and dynamics, you will have tointroduce timeexplicitly into your ontological framework one way or another.There are no
magic bullets for avoiding the resulting complications.
 Finally, what happens when the number of ATP molecules in the
mitochondrion drops to zero? What does the zero inhere in - in the
empty set?
No, in the mitochondrion (or mitochondria) which have no ATP inthem. Thisis an old issue, thoroughly explored. (What kind of flock does ashepherd
have who has sold all his sheep?)
What if the number of ADP in the mitochondrion also drops
to zero, does the zero also inhere in the empty set? How many empty
sets are there?
There is only one empty set. But in the example under discussion,this wouldbe an issue only if there were no mitochondria in the universe, acase Iassume we can safely ignore. (And, BTW, in many ontology languages- though
not, regrettably, OWL-DL - there can be a number of distinct empty
_classes_.)
 Maybe in principle, it is possible to reformulate the problem and
build up a description from scratch, relying on terms such as
molecule, that would allow to express the above scenariosaccurately.But that approach would make a complex system of relatedrestrictions
necessary to make even the most simple assertions used in Systems
Biology.
Indeed, i suspect that Systems Biology would be an extremely complex
ontology, if formalized adequately. (Even supposing the state of the
formalizing art is up to the task, which I doubt.) Note howeverthat thisdoes not mean that every assertion made using the concepts of theontologyneed be complex, only that the defining ontology for the conceptswill be.
Fortunately, the defining ontology only has to be created once.
 What I think we need instead is a term that refers to "ATP in the
mitochondrion", a term that refers to "(24.7 +/- 1.6)", and a simple
property to connect these two in one statement.
In other words, an equation without any definitions of the termsused in it.
Sure, go ahead, but please don't call it an ontology.

Pat
   Take care
   Oliver

--
Oliver Ruebenacker, Computational Cell Biologist
BioPAX Integration at Virtual Cell (http://vcell.org/biopax)
Center for Cell Analysis and Modeling
http://www.oliver.curiousworld.org
------------------------------------------------------------
IHMC (850)434 8903 or (650)4943973
40 South Alcaniz St.           (850)202 4416   office
Pensacola                            (850)202 4440   fax
FL 32502                              (850)291 0667   mobile
phayesAT-SIGNihmc.us       http://www.ihmc.us/users/phayes
--
Oliver Ruebenacker, Computational Cell Biologist
BioPAX Integration at Virtual Cell (http://vcell.org/biopax)
Center for Cell Analysis and Modeling
http://www.oliver.curiousworld.org


------------------------------------------------------------
IHMC                                     (850)434 8903 or (650)494 3973
40 South Alcaniz St.           (850)202 4416   office
Pensacola                            (850)202 4440   fax
FL 32502                              (850)291 0667   mobile
phayesAT-SIGNihmc.us       http://www.ihmc.us/users/phayes

Re: blog: semantic dissonance in uniprot

Reply via email to