I guess we should keep in mind that this discussion was (at least
originally) not about how units are represented on the Semantic Web, but how
they should be represented for a specific project: the TMO. Different
people, projects and communities will have different needs, and we will not
be able to achieve a consensus that will make everyone happy. Therefore, it
might be reasonable to focus on the specific case of TMO -- and maybe some
of the consensus we reach there can be generalized to other areas.
David wrote:
the Mars Climate Orbiter was famously lost because one team assumed Metric
units and another team assumed English units
It is silly not to include explicit information about units, but it might be
equally silly not to use SI units in a science or technology environment. I
guess it might be easy to say this as a continental European, but non-SI
units should be eradicated from sci/tech data. That might have more impact
on interoperability than any standardized vocabularies, mapping algorithms
etc., and it might be simpler to implement in the long run.
However, I see one problem with requiring data providers to convert their
units to standard units (besides the extra effort involved): in some
settings it might be important to capture the _original_ value and unit of
the measurement, just for the sake of knowing the original datum. This might
even be a legal requirement in some clinical settings. In my understanding,
the goal of TMO is to be used in translational research, not clinical
practice, and therefore this will probably not be an issue.
Mark wrote:
It speaks to a conversation that I had with my review committee this
morning about how The Web was built by simply being completely open.
Anyone could (can) publish anything in any way they want, so long as they
adhere to the simple rules of HTML. I am very concerned that the Semantic
Web is not learning its lessons from the WWW. We are trying to
institutionalize everything, and that simply doesn't work (it doesn't
scale!).
I guess the classic web and its tremendous global success is a good
inspiration, but I am not sure about how easily the principles of the web
can be translated into principles of the web of data. The 'anything goes'
approach might just shift the problem from the data publishing phase to the
data consumption phase, which could result in the temporary belief of having
solved the problem.
Let me make a bold statement: there is no lack of biomedical RDF data
anymore. In fact, we are now in a situation where the same open dataset is
often RDFized several times by different groups. This growing number of
duplicated efforts is an interesting new development, and I might try to
document and analyze this trend when I find the time.
Still, it is far from trivial to actually query these datasets, because of
their heterogeneity. The answer is not to institutionalize everything, but
to simply make RDF publishers better aware of concerns about overabundant
heterogeneity and lack of transparency. And it could be a good reason to
reduce sources of heterogeneity in a project that is under our control, such
as the TMO.
Cheers,
Matthias Samwald
// DERI Galway, Ireland
// Konrad Lorenz Institute for Evolution and Cognition Research, Austria
// http://samwald.info