I often like to put together proposals as "strawmen", in order to provoke discussion while not needing to push a personal agenda through...

Based on the URI discussion we are having, and Alan's request to distinguish data records (Entrez, ENSEMBL, Uniprot) clearly from the "biological or chemical things" they are about, I have listed a few possible best practices to consider and the reasons why:

1) Dereferencing: The dereferencing of a URI to a data record results in the return of all the "authority managed" information about it (locally curated data) in the form of a RDF graph. Outside annotations would not be included unless the authority provided an open annotative service. This is what you get back when you query sources such as NCBI or EBI.

2) Versioning: A few useful pieces of metadata for changeable (mutable) URI-referenced RDF graphs (dereferenced) is what version is current, when it was assigned or created (date and time, UTC), and a reference to the sorted list of all earlier versions. This would allow precise rolling back to any version for performing a re-analysis of info from an earlier time.

3) Signifiers: Life science data records of bio or chem entities (genes, snps, protein, chemicals, agents, diseases, pathways, anatomical parts) should always reference a community agreed upon conceptualized bio/chem-entity, i.e., to what the scientist in his or her mind commonly and collectively regard when hearing "human GSK3 beta". These could have ontologies layered on them when they become available. These entities represent the 'signifiers or signs' for the 'signified or real-world objects' such as "Hu GSK3b" or " Mus MAP12" (for the curious, see http://en.wikipedia.org/wiki/Sign_(semiotics), btw the full RDF graph around an entity would be equivalent to Peirce's 'interpretant'). They would exist as non-data objects, more like scientific placeholders, but can use rdfs:seeAlso to point to real data records of them. Data records by themselves WOULD NOT be of this special meta-class. If this sounds fuzzy to you, consider what it took to align most of the gene synonym names to one agreed symbol; sociologically this is no different.

4) Covering Mapping: Propose an initial set of properties to support the above model. As a starter, define an equivalent of rdfs:isDefinedBy for life science that would specifically map an instance graph of the data record to the singular conceptualized bio/chem-entity, using something on the order of hcls:isDefinedAs :

<http://www.ncbi.nlm.nih.gov/entrez/query.fcgi? db=gene&cmd=Retrieve&list_uids=2932> <hcls:isDefinedAs> <http://purl.org/hcls/bioentity/hu_gsk3b>

In line with what Chimezie proposed, rdfs:seeAlso could be used to declare the inverse relation for a select set of data records; not sure if any new relation is needed here.

In the absence of any formal ontology that could cover all life sciences data records (e.g., Genes), a relational instance model might be more practical and appealing; A transitive rule could be proposed that states all data records referencing the same bio/chem-entity would be viewed as "bio/chem entity" equivalent, regardless of what ontology/rdfschema were used to define each of them: (?data1 hcls:isDefinedAs ?ent) AND (?data2 hcls:isDefinedAs ?ent) -> (?data1 hcls:sameEntityAs ?data2 )

This is an example of what I had suggested as a "Covering", since there is no explicit need to use ontologies to map data records to common class-based concepts. owl:sameAs could be used hear, but the 'sameEntityAs' relation could have more selective meaning for this community in terms of data records and 'things'. I leave it open for discussion...

I'd be interested to hear how important and practical the points raised here are. The main objective I have is to try and get our common discussion to focus on some basic, agreeable points that we can work together on over the next (hopefully) few weeks.

cheers,
Eric


Eric Neumann, PhD
co-chair, W3C Healthcare and Life Sciences,
and Senior Director Product Strategy
Teranode Corporation
83 South King Street, Suite 800
Seattle, WA 98104
+1 (781)856-9132
www.teranode.com


Reply via email to