RE: HCLS IG Note on mapping and publishing life sciences RDF
hi scott, finally got a chance to go through the note and, yes, it is well put together. being naive on this subject, some of my comments may safely be ignored. Introduction: * instead of being in the body of the test, shouldn't the explanation for Figure 1 be a caption? Section 2: * what is a "Linked Data interface"? it doesn't seem to be a defined standard, rather it seems like each different RDF data would define its own interface. some clarity on what is meant by this term would help. * Q2 grammar: "Also, it is often unnecessary to convert every table into a class and can create scaling problems. " these points are mentioned but i didn't see any discussion about how they affect the DB to RDF mapping (the specific case of data warehousing is covered but that is but one way to denormalize): "RDB schemas can vary in their level of normalization as quantified by normalized forms (Date 2009). " and "In practice, many databases are not normalized because the overhead of working with the schema is not worth the extra reliability and space savings that may result. " * Q3 perhaps a comment on what in the original non-relational information affects the quality of the RDF would be nice * Q5 doesn't multiple FROM clauses also allow combining datasets but from different graphs? This sentence implies that "Structure descriptors" always link datasets containing drugs and small molecules, i think this is supposed to be more general: "Structure descriptors, such as SMILES strings, and InChi identifiers may be used to establish links between datasets containing drugs and small molecules. " should be : " Structure descriptors, such as SMILES strings and InChi identifiers, may be used to establish links between datasets. "? * Q7 not a sentence: " Use of the BioPortal for matching entities and their URIs (including ontologies from Open Biomedical Ontology (OBO) Foundry (OBO 2011))." * Q12 since this is a note on "Mapping and linking life science data using RDF", how does the following help one map their RDF data to the web (it's an important point but seems a little off target in this note, maybe the emphasis should be how one can use these tools in publishing their data)? "An important part of improving the utility of the Web is by documenting the reliability and performance of information services. In the area of biomedical information services,..." * Q14 grammar (delete 'a'?): "... and to use classes as a values in the metadata for a graph;" Section 4: perhaps change "reflect the state of the art" to " reflect the current state of the art"? cheers, michael Michael Miller Software Engineer Institute for Systems Biology > -Original Message- > From: M. Scott Marshall [mailto:mscottmarsh...@gmail.com] > Sent: Tuesday, March 13, 2012 2:31 PM > To: David Booth; Erich Gombocz > Cc: HCLS; biohackat...@googlegroups.com; > linkedlifedatapracticesn...@googlegroups.com; public-lod@w3.org > Subject: Re: Fwd: HCLS IG Note on mapping and publishing life sciences RDF > > On Tue, Mar 13, 2012 at 10:17 PM, David Booth wrote: > > On Tue, 2012-03-13 at 21:16 +0100, M. Scott Marshall wrote: > > [ . . . ] > >> IG Note (Draft) HCLS IG Note on mapping and publishing life sciences RDF > >> [1] > https://docs.google.com/document/d/1XzdsjCfPylcyOoNtDfAgz15HwRdCD- > 0e0ixh21_U0y0/edit?hl=en_US > > > > Nice work on this! A couple of small editorial suggestions: > > Thanks for the encouragement from you and Erich. > > About the use of a priori , a posteriori - I will mull that over. I > was pretty happy with the way it seemed to communicate our thoughts, a > little attached actually.. :( > > > 2. The intro mentions that "a query for Homo sapiens gene label "Alg2" > > in Entrez Gene (http://www.ncbi.nlm.nih.gov/gene) returns multiple > > results. Among them is one gene located in chromosome 5 (Entrez > > ID:85365) and the other in chromosome 9 (Entrez ID:313231), each with > > multiple aliases". But the results that I see show ID:85365 as the ID > > for the one on chromosome 9, and the other one (maybe?) has ID 10016: > > http://www.ncbi.nlm.nih.gov/gene?term=Alg2[sym]%20homo%20sapiens > > Oops! Thanks for catching that. We had corrected id mixup in the > article but forgot to correct it in the note. > > Thanks!, > Scott
Re: HCLS IG Note on mapping and publishing life sciences RDF
I read the document and i really enjoyed. Nice work. One comment: Under Q4. How should the RDF representation be mapped to global ontologies or reference terminologies? In relation to MIREOT: it may be worth to mention Ontofox ( http://ontofox.hegroup.org/) as a reference implementation of the MIREOT principle working with a large set of ontologies commonly used in HCLS. Cheers, Carlo On Tue, Mar 13, 2012 at 1:16 PM, M. Scott Marshall wrote: > Here is another request for comments before we move the HCLS IG > Interest Group note below into html (still fluid but more viscous). I > am requesting comments from the LOD mailing list now as well, where > there have been several related discussions. > > In the draft document below, we attempt to supply a guide for those > who would like to produce and publish data in RDF based on the > experiences of several of the LODD members. We would like you to lend > us your extensive expertise and would very much appreciate and > carefully consider your candid comments, questions or suggestions to > improve the note. Keep in mind that it is not 'all encompassing' but > meant to provide a good starting point. > > IG Note (Draft) HCLS IG Note on mapping and publishing life sciences RDF > [1] > https://docs.google.com/document/d/1XzdsjCfPylcyOoNtDfAgz15HwRdCD-0e0ixh21_U0y0/edit?hl=en_US > > The note above is based on a number of use cases described in an > article accepted to the Journal of Web Semantics. With the publisher's > permission, we have created a more 'W3C Note'-like version of the same > material and edited sections based on a wide range of comments. The > original use cases have been removed. If you are interested, I will > send you a pre-print. > > We have attempted to frame the discussion in terms of applications > that make use of SPARQL queries (minimally), but also with (OWL) > reasoning and resolvable URIs - three separate yet interdependent sets > of concerns that seem to strongly influence opinions about appropriate > design in the community. > > Note that we also suggest that metadata be made available in > statements about the graph URI *in the graph itself*, in addition to a > location specified in SPARQL-SD, and in the RDF returned by the graph > URI. > > Cheers, > Scott > > -- Forwarded message -- > From: M. Scott Marshall > Date: Fri, Dec 16, 2011 at 4:13 PM > Subject: HCLS IG Note on mapping and publishing life sciences RDF > To: HCLS , > linkedlifedatapracticesn...@googlegroups.com, > biohackat...@googlegroups.com > > > Dear Colleagues, > > With data sharing becoming more widely known and accepted, the need > for the means to accomplish data sharing *in practice* is an important > technical challenge. The Linked Open Drug Data task force in HCLS has > attempted to address this need by developing a DRAFT IG Note regarding > practices for mapping and linking life science data using RDF. The > document, largely based on a recently submitted article, is being > staged as a Google Doc for your review and comment[1]. > > In the draft document above, we attempt to supply a guide for those > who would like to produce and publish data in RDF. We would like you > to lend us your extensive expertise and would very much appreciate and > carefully consider your candid comments, questions or suggestions to > improve the note. > > Ideally, someone with basic knowledge of the Semantic Web stack and > the desire to 'publish' linked data will be able to get started from > this online document. We have removed the use case descriptions to > make it more 'W3C note-like' (concise). The use case descriptions will > be available in an article (in review) that covers much of the same > material (pre-prints available on request). > > Kind regards, > > M. Scott Marshall > LODD Chair, on behalf of the LODD Editors and Contributors > > IG Note (Draft) > [1] > https://docs.google.com/document/d/1XzdsjCfPylcyOoNtDfAgz15HwRdCD-0e0ixh21_U0y0/edit?hl=en_US > > P.S. Lee Harland just alerted me to a relevant resource that we will > probably cite or otherwise integrate into the above note: > Looks very interesting: > Interactively Mapping Data Sources into the Semantic Web (presented at > ISWC) > http://ceur-ws.org/Vol-783/paper2.pdf > > -- Carlo