As usual, I agree with Scott (this is becoming a habit! LOL! Scott, we should really try to work together more closely!)

It speaks to a conversation that I had with my review committee this morning about how The Web was built by simply being completely open. Anyone could (can) publish anything in any way they want, so long as they adhere to the simple rules of HTML. I am very concerned that the Semantic Web is not learning its lessons from the WWW. We are trying to institutionalize everything, and that simply doesn't work (it doesn't scale!).

If we, as the HCLS community, simply say "it is best-practice to be explicit about your units when you publish a value", and create the ontologies that make it easy and obvious how to do so, then we let the semantic web organically grow... people will eventually see the wisdom of doing it "our way", and moreover, we can add a semantic layer on top of that ontology that easily allows us to switch between different units (e.g. we do this in SADI when calculating Body Mass Index through a SADI Web Service - Luke pointed-out to the review committee this morning that we even accept the British "Stone" as a unit of weight when doing our calculations... because any unit is legitimate and we have to accept that different people have different preferred units! The semantic layer ensures that we don't make Mars Lander kinds of errors.)

Trying to impose rules on a global population is simply a non-starter. We really need to be prepared to deal with (in our semantic layer) any possibility... the only "rule" on the Semantic Web (IMO) is that you *must* be explicit about everything you publish.

...but as Chris Mungall said to me at ISMB - I am a Centralization Skeptic... and he's right! :-) (he actually said that I am a "big ontology skeptic", but I have generalized his statement)

I'm a Semantic Web Libertarian!  ...and I agree with Scott!

M



On Fri, 10 Sep 2010 14:42:02 -0700, M. Scott Marshall <mscottmarsh...@gmail.com> wrote:

Hi Eric,

The business of standardizing units reminds me of:

http://science.nasa.gov/science-news/science-at-nasa/2007/08jan_metricmoon/
followed by:
http://news.bbc.co.uk/2/hi/science/nature/462264.stm

For me, the story of losing an orbiter because of an accidental clash
between imperial and metric units was a poster child for Semantic Web,
as well as the problem you describe. You see, the machines will never
know what the numbers mean unless we use a Semantic layer as well as a
syntactic layer. The problem with units is that they seem to be
somehow both semantic and syntactic, somewhere in between.

Hard as I try, I don't understand why you want to change the way that
you describe data to constrain the data that is being described. Well,
actually I do. You want to force anyone annotating or publishing data
in the TMO vocabulary to use a single set of units (right?). It could
be an effective way to achieve the goal but it seems rather heavy
handed. Overloading a predicate and adding English parameters to it
might make the requirements obvious to people that they're only
supposed to use your units (because you provide no others) when they
use your  ontology but it doesn't solve the problem. Yes,
normalization of units is necessary in order to integrate data. But
the problem of normalization won't go away if you glob two semantic
aspects together in the *description of the data* (i.e. blood pressure
measurement type and units). I see from your language that you think
that it will force users to "inject" data into the data model with the
preferred units when publishing data in the TMO vocabulary but doesn't
this just point to the processing that is unavoidable for
integrating/comparing data? We will always need to get data into the
same units in order to integrate it. I feel your pain as you try to
solve it in SPARQL (and I see that it can be a very real problem), but
I think there must be a better way than to overload a predicate and
thereby obfuscate the data model. If nothing else, let's depend on
consistency checks and good documentation, as already suggested. We
can't expect to accomplish *everything* in SPARQL.

Actually, isn't this a data publishing issue? If someone publishes
systolic blood pressure values as linked data using TMO, shouldn't
they refer to the TMO ontology and the units that they used in the
provenance of the named graph containing it? If we know from the
provenance about the named graph that it uses TMO [<graphURI>
void:usesVocabulary TMO] and MmHg [bloodPressureMeasurements hasUnits
MmHg] to describe blood pressure, then we can use that information in
order to pre-select the graph during federation (in a world of
abundance and sloppy units). In this way, we could automatically
convert values as needed, presumably based on conversions that derive
from the unit ontology (non?). Although such a software feat might
require coding or reasoning outside SPARQL, it already does.

Clear tagging of the data with units should be a best practice in and
outside the Semantic Web. I am in favor of a two component approach,
complemented by good provenance practice.

-Scott

On Fri, Sep 10, 2010 at 10:30 PM, Michel_Dumontier
<michel_dumont...@carleton.ca> wrote:

But then anyone merging two TMO documents with different units has the
normalization burden. If we pick a unit and annotate the predicates,
then the folks who would have to do the work of merging with non-TMO
documents (who would have to introduce some rules/canonicalization
pipeline anyways) have the OWL hooks to automate that merging.

Again, if we are considering TMO, then we can impose a restriction to specify the unit - we can also make this clear in documentation relating to the measurements with units.

> Also, having domain-independent predicates makes it easier to render
a view
> of the data (for human consumption) that includes visual cues
regarding the
> units of measures associated with values directly from the data since
such
> tools will always expect the same set of terms to capture a value and
its
> unit of measurement.

If you've bought the argument for early normalization, isn't it
needlessly dangerous to offer the freedom to express BP in mmHg in an
ontology that's required to have BP in MPa? It does put more burden on
the use of generic data browsers (they'd have to read the OWL in order
to present the user with units), but I think that use case is small
compared to the cost of data consumption.

I don't think we should tailor our data model to generic data browsers - they are far too simple for the complex knowledge that we have to represent.

m.

Reply via email to