RE: HCLS IG Note on mapping and publishing life sciences RDF

2012-03-18 Thread Michael Miller
hi scott,

finally got a chance to go through the note and, yes, it is well put
together.  being naive on this subject, some of my comments may safely be
ignored.

Introduction:
  * instead of being in the body of the test, shouldn't the explanation
for Figure 1 be a caption?
Section 2:
 * what is a "Linked Data interface"?  it doesn't seem to be a defined
standard, rather it seems like each different RDF data would define its
own interface.  some clarity on what is meant by this term would help.
 * Q2
grammar: "Also, it is often unnecessary to convert every table into a
class and can create scaling problems. "
these points are mentioned but i didn't see any discussion about how
they affect the DB to RDF mapping (the specific case of data warehousing
is covered but that is but one way to denormalize): "RDB schemas can vary
in their level of normalization as quantified by normalized forms (Date
2009). " and "In practice, many databases are not normalized because the
overhead of working with the schema is not worth the extra reliability and
space savings that may result. "
  * Q3
perhaps a comment on what in the original non-relational information
affects the quality of the RDF would be nice
  * Q5
doesn't multiple FROM clauses also allow combining datasets but from
different graphs?
This sentence implies that "Structure descriptors" always link
datasets containing drugs and small molecules, i think this is supposed to
be more general: "Structure descriptors, such as SMILES strings, and InChi
identifiers may be used to establish links between datasets containing
drugs and small molecules. " should be : " Structure descriptors, such as
SMILES strings and InChi identifiers, may be used to establish links
between datasets. "?
  * Q7
not a sentence: " Use of the BioPortal for matching entities and their
URIs (including ontologies from Open Biomedical Ontology (OBO) Foundry
(OBO 2011))."
  * Q12
since this is a note on "Mapping and linking life science data using
RDF", how does the following help one map their RDF data to the web (it's
an important point but seems a little off target in this note, maybe the
emphasis should be how one can use these tools in publishing their data)?
"An important part of improving the utility of the Web is by documenting
the reliability and performance of information services. In the area of
biomedical information services,..."
  * Q14
grammar (delete 'a'?): "... and to use classes as a values in the
metadata for a graph;"
Section 4:
perhaps change "reflect the state of the art" to " reflect the current
state of the art"?

cheers,
michael

Michael Miller
Software Engineer
Institute for Systems Biology

> -Original Message-
> From: M. Scott Marshall [mailto:mscottmarsh...@gmail.com]
> Sent: Tuesday, March 13, 2012 2:31 PM
> To: David Booth; Erich Gombocz
> Cc: HCLS; biohackat...@googlegroups.com;
> linkedlifedatapracticesn...@googlegroups.com; public-lod@w3.org
> Subject: Re: Fwd: HCLS IG Note on mapping and publishing life sciences
RDF
>
> On Tue, Mar 13, 2012 at 10:17 PM, David Booth  wrote:
> > On Tue, 2012-03-13 at 21:16 +0100, M. Scott Marshall wrote:
> > [ . . . ]
> >> IG Note (Draft) HCLS IG Note on mapping and publishing life sciences
RDF
> >> [1]
> https://docs.google.com/document/d/1XzdsjCfPylcyOoNtDfAgz15HwRdCD-
> 0e0ixh21_U0y0/edit?hl=en_US
> >
> > Nice work on this!  A couple of small editorial suggestions:
>
> Thanks for the encouragement from you and Erich.
>
> About the use of a priori , a posteriori - I will mull that over. I
> was pretty happy with the way it seemed to communicate our thoughts, a
> little attached actually.. :(
>
> > 2. The intro mentions that "a query for Homo sapiens gene label "Alg2"
> > in Entrez Gene (http://www.ncbi.nlm.nih.gov/gene) returns multiple
> > results. Among them is one gene located in chromosome 5 (Entrez
> > ID:85365) and the other in chromosome 9 (Entrez ID:313231), each with
> > multiple aliases".  But the results that I see show ID:85365 as the ID
> > for the one on chromosome 9, and the other one (maybe?) has ID 10016:
> > http://www.ncbi.nlm.nih.gov/gene?term=Alg2[sym]%20homo%20sapiens
>
> Oops! Thanks for catching that. We had corrected id mixup in the
> article but forgot to correct it in the note.
>
> Thanks!,
> Scott



Re: HCLS IG Note on mapping and publishing life sciences RDF

2012-03-13 Thread Carlo Torniai
I read the document and i really enjoyed.
Nice work.

One comment:
Under  Q4. How should the RDF representation be mapped to global ontologies
or reference terminologies?

In relation to  MIREOT:  it may be worth to mention Ontofox (
http://ontofox.hegroup.org/) as a reference implementation of the MIREOT
principle working with a large set of ontologies commonly used in HCLS.

Cheers,
Carlo


On Tue, Mar 13, 2012 at 1:16 PM, M. Scott Marshall  wrote:

> Here is another request for comments before we move the HCLS IG
> Interest Group note below into html (still fluid but more viscous). I
> am requesting comments from the LOD mailing list now as well, where
> there have been several related discussions.
>
> In the draft document below, we attempt to supply a guide for those
> who would like to produce and publish data in RDF based on the
> experiences of several of the LODD members. We would like you to lend
> us your extensive expertise and would very much appreciate and
> carefully consider your candid comments, questions or suggestions to
> improve the note. Keep in mind that it is not 'all encompassing' but
> meant to provide a good starting point.
>
> IG Note (Draft) HCLS IG Note on mapping and publishing life sciences RDF
> [1]
> https://docs.google.com/document/d/1XzdsjCfPylcyOoNtDfAgz15HwRdCD-0e0ixh21_U0y0/edit?hl=en_US
>
> The note above is based on a number of use cases described in an
> article accepted to the Journal of Web Semantics. With the publisher's
> permission, we have created a more 'W3C Note'-like version of the same
> material and edited sections based on a wide range of comments. The
> original use cases have been removed. If you are interested, I will
> send you a pre-print.
>
> We have attempted to frame the discussion in terms of applications
> that make use of SPARQL queries (minimally), but also with (OWL)
> reasoning and resolvable URIs - three separate yet interdependent sets
> of concerns that seem to strongly influence opinions about appropriate
> design in the community.
>
> Note that we also suggest that metadata be made available in
> statements about the graph URI *in the graph itself*, in addition to a
> location specified in SPARQL-SD, and in the RDF returned by the graph
> URI.
>
> Cheers,
> Scott
>
> -- Forwarded message --
> From: M. Scott Marshall 
> Date: Fri, Dec 16, 2011 at 4:13 PM
> Subject: HCLS IG Note on mapping and publishing life sciences RDF
> To: HCLS ,
> linkedlifedatapracticesn...@googlegroups.com,
> biohackat...@googlegroups.com
>
>
> Dear Colleagues,
>
> With data sharing becoming more widely known and accepted, the need
> for the means to accomplish data sharing *in practice* is an important
> technical challenge. The Linked Open Drug Data task force in HCLS has
> attempted to address this need by developing a DRAFT IG Note regarding
> practices for mapping and linking life science data using RDF.  The
> document, largely based on a recently submitted article, is being
> staged as a Google Doc for your review and comment[1].
>
> In the draft document above, we attempt to supply a guide for those
> who would like to produce and publish data in RDF. We would like you
> to lend us your extensive expertise and would very much appreciate and
> carefully consider your candid comments, questions or suggestions to
> improve the note.
>
> Ideally, someone with basic knowledge of the Semantic Web stack and
> the desire to 'publish' linked data will be able to get started from
> this online document. We have removed the use case descriptions to
> make it more 'W3C note-like' (concise). The use case descriptions will
> be available in an article (in review) that covers much of the same
> material (pre-prints available on request).
>
> Kind regards,
>
> M. Scott Marshall
> LODD Chair,  on behalf of the LODD Editors and Contributors
>
> IG Note (Draft)
> [1]
> https://docs.google.com/document/d/1XzdsjCfPylcyOoNtDfAgz15HwRdCD-0e0ixh21_U0y0/edit?hl=en_US
>
> P.S. Lee Harland just alerted me to a relevant resource that we will
> probably cite or otherwise integrate into the above note:
> Looks very interesting:
> Interactively Mapping Data Sources into the Semantic Web (presented at
> ISWC)
> http://ceur-ws.org/Vol-783/paper2.pdf
>
>


-- 
Carlo