I often like to put together proposals as "strawmen", in order to
provoke discussion while not needing to push a personal agenda
through...
Based on the URI discussion we are having, and Alan's request to
distinguish data records (Entrez, ENSEMBL, Uniprot) clearly from the
"biological or chemical things" they are about, I have listed a few
possible best practices to consider and the reasons why:
1) Dereferencing: The dereferencing of a URI to a data record results
in the return of all the "authority managed" information about it
(locally curated data) in the form of a RDF graph. Outside annotations
would not be included unless the authority provided an open annotative
service. This is what you get back when you query sources such as NCBI
or EBI.
2) Versioning: A few useful pieces of metadata for changeable (mutable)
URI-referenced RDF graphs (dereferenced) is what version is current,
when it was assigned or created (date and time, UTC), and a reference
to the sorted list of all earlier versions. This would allow precise
rolling back to any version for performing a re-analysis of info from
an earlier time.
3) Signifiers: Life science data records of bio or chem entities
(genes, snps, protein, chemicals, agents, diseases, pathways,
anatomical parts) should always reference a community agreed upon
conceptualized bio/chem-entity, i.e., to what the scientist in his or
her mind commonly and collectively regard when hearing "human GSK3
beta". These could have ontologies layered on them when they become
available. These entities represent the 'signifiers or signs' for the
'signified or real-world objects' such as "Hu GSK3b" or " Mus MAP12"
(for the curious, see http://en.wikipedia.org/wiki/Sign_(semiotics),
btw the full RDF graph around an entity would be equivalent to Peirce's
'interpretant'). They would exist as non-data objects, more like
scientific placeholders, but can use rdfs:seeAlso to point to real data
records of them. Data records by themselves WOULD NOT be of this
special meta-class. If this sounds fuzzy to you, consider what it took
to align most of the gene synonym names to one agreed symbol;
sociologically this is no different.
4) Covering Mapping: Propose an initial set of properties to support
the above model. As a starter, define an equivalent of rdfs:isDefinedBy
for life science that would specifically map an instance graph of the
data record to the singular conceptualized bio/chem-entity, using
something on the order of hcls:isDefinedAs :
<http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?
db=gene&cmd=Retrieve&list_uids=2932> <hcls:isDefinedAs>
<http://purl.org/hcls/bioentity/hu_gsk3b>
In line with what Chimezie proposed, rdfs:seeAlso could be used to
declare the inverse relation for a select set of data records; not sure
if any new relation is needed here.
In the absence of any formal ontology that could cover all life
sciences data records (e.g., Genes), a relational instance model might
be more practical and appealing; A transitive rule could be proposed
that states all data records referencing the same bio/chem-entity would
be viewed as "bio/chem entity" equivalent, regardless of what
ontology/rdfschema were used to define each of them:
(?data1 hcls:isDefinedAs ?ent) AND (?data2 hcls:isDefinedAs ?ent) ->
(?data1 hcls:sameEntityAs ?data2 )
This is an example of what I had suggested as a "Covering", since there
is no explicit need to use ontologies to map data records to common
class-based concepts. owl:sameAs could be used hear, but the
'sameEntityAs' relation could have more selective meaning for this
community in terms of data records and 'things'. I leave it open for
discussion...
I'd be interested to hear how important and practical the points raised
here are. The main objective I have is to try and get our common
discussion to focus on some basic, agreeable points that we can work
together on over the next (hopefully) few weeks.
cheers,
Eric
Eric Neumann, PhD
co-chair, W3C Healthcare and Life Sciences,
and Senior Director Product Strategy
Teranode Corporation
83 South King Street, Suite 800
Seattle, WA 98104
+1 (781)856-9132
www.teranode.com