RE: Linked Data Glossary is published!

2013-07-02 Thread Michael Miller
hi all,



XML takes on many levels of machine readability.  i would argue that if XML
came with an DTD/XML schema it is at least 3 star and possibly 4 star.
that at least was my experience with MAGE- ML (i'd say 3 star) and the
clinical XML for the TCGA project (4 star)



cheers,

michael



Michael Miller

Software Engineer

Institute for Systems Biology





*From:* KANZAKI Masahide [mailto:mkanz...@gmail.com]
*Sent:* Monday, July 01, 2013 7:19 PM
*To:* John Erickson
*Cc:* Bernadette Hyland; W3C public GLD WG WG; Linked Data community;
egov-ig mailing list; HCLS
*Subject:* Re: Linked Data Glossary is published!



Hello John, thanks for reply, very much appreciated.



2013/7/2 John Erickson 

Thus, I think we should distinguish between "plain old XML" and Office
Open XML/OOXML/OpenXML; based on my understanding and what I read <>
OpenXML could be listed as an example three-star format.



Well, that's true. I hope this distinction will be incorporated into this
glossary, rather simply showing "XML" as 2-stars example (which is
misleading not only for me, but also for others around me).





* I think the POINT is that the data should be published in a way

suited for machine consumption. A format should NOT be considered
"machine readable" simply because someone cooked up a hack on
Scraperwiki for getting the data out of an otherwise opaque data dump
on a site



Yes, it is desirable that data is published for machine "consumption" in
Linked Data space, though my point was that the term "Machine Readable" is
too general to be redefined for LD perspective.





* The argument against having a separate term is simply that
(arguably) the common case for publishing "machine readable" data *is*
structured data, and adding the a special "structured" category merely
confuses adopters.
* The argument for a new term is, if the reason we want "machine
readable data" is because we expect (and usually get) structured data,
then we should specify that what we REALLY want is "machine readable
structured data..." (and explain what that means)



Well, "machine readable" data is *not necessarily* structured in general,
so the second argument seems more reasonable, although I'm not arguing to
add separate term, rather, thinking it is not good idea to redefine term
"machine readable" just for a specific community.





Thank you very much for the discussion.



cheers,





-- 
@prefix : <http://www.kanzaki.com/ns/sig# <http://www.kanzaki.com/ns/sig>>
. <> :from [:name
"KANZAKI Masahide"; :nick "masaka"; :email "mkanz...@gmail.com"].


RE: HCLS IG Note on mapping and publishing life sciences RDF

2012-03-18 Thread Michael Miller
hi scott,

finally got a chance to go through the note and, yes, it is well put
together.  being naive on this subject, some of my comments may safely be
ignored.

Introduction:
  * instead of being in the body of the test, shouldn't the explanation
for Figure 1 be a caption?
Section 2:
 * what is a "Linked Data interface"?  it doesn't seem to be a defined
standard, rather it seems like each different RDF data would define its
own interface.  some clarity on what is meant by this term would help.
 * Q2
grammar: "Also, it is often unnecessary to convert every table into a
class and can create scaling problems. "
these points are mentioned but i didn't see any discussion about how
they affect the DB to RDF mapping (the specific case of data warehousing
is covered but that is but one way to denormalize): "RDB schemas can vary
in their level of normalization as quantified by normalized forms (Date
2009). " and "In practice, many databases are not normalized because the
overhead of working with the schema is not worth the extra reliability and
space savings that may result. "
  * Q3
perhaps a comment on what in the original non-relational information
affects the quality of the RDF would be nice
  * Q5
doesn't multiple FROM clauses also allow combining datasets but from
different graphs?
This sentence implies that "Structure descriptors" always link
datasets containing drugs and small molecules, i think this is supposed to
be more general: "Structure descriptors, such as SMILES strings, and InChi
identifiers may be used to establish links between datasets containing
drugs and small molecules. " should be : " Structure descriptors, such as
SMILES strings and InChi identifiers, may be used to establish links
between datasets. "?
  * Q7
not a sentence: " Use of the BioPortal for matching entities and their
URIs (including ontologies from Open Biomedical Ontology (OBO) Foundry
(OBO 2011))."
  * Q12
since this is a note on "Mapping and linking life science data using
RDF", how does the following help one map their RDF data to the web (it's
an important point but seems a little off target in this note, maybe the
emphasis should be how one can use these tools in publishing their data)?
"An important part of improving the utility of the Web is by documenting
the reliability and performance of information services. In the area of
biomedical information services,..."
  * Q14
grammar (delete 'a'?): "... and to use classes as a values in the
metadata for a graph;"
Section 4:
perhaps change "reflect the state of the art" to " reflect the current
state of the art"?

cheers,
michael

Michael Miller
Software Engineer
Institute for Systems Biology

> -Original Message-
> From: M. Scott Marshall [mailto:mscottmarsh...@gmail.com]
> Sent: Tuesday, March 13, 2012 2:31 PM
> To: David Booth; Erich Gombocz
> Cc: HCLS; biohackat...@googlegroups.com;
> linkedlifedatapracticesn...@googlegroups.com; public-lod@w3.org
> Subject: Re: Fwd: HCLS IG Note on mapping and publishing life sciences
RDF
>
> On Tue, Mar 13, 2012 at 10:17 PM, David Booth  wrote:
> > On Tue, 2012-03-13 at 21:16 +0100, M. Scott Marshall wrote:
> > [ . . . ]
> >> IG Note (Draft) HCLS IG Note on mapping and publishing life sciences
RDF
> >> [1]
> https://docs.google.com/document/d/1XzdsjCfPylcyOoNtDfAgz15HwRdCD-
> 0e0ixh21_U0y0/edit?hl=en_US
> >
> > Nice work on this!  A couple of small editorial suggestions:
>
> Thanks for the encouragement from you and Erich.
>
> About the use of a priori , a posteriori - I will mull that over. I
> was pretty happy with the way it seemed to communicate our thoughts, a
> little attached actually.. :(
>
> > 2. The intro mentions that "a query for Homo sapiens gene label "Alg2"
> > in Entrez Gene (http://www.ncbi.nlm.nih.gov/gene) returns multiple
> > results. Among them is one gene located in chromosome 5 (Entrez
> > ID:85365) and the other in chromosome 9 (Entrez ID:313231), each with
> > multiple aliases".  But the results that I see show ID:85365 as the ID
> > for the one on chromosome 9, and the other one (maybe?) has ID 10016:
> > http://www.ncbi.nlm.nih.gov/gene?term=Alg2[sym]%20homo%20sapiens
>
> Oops! Thanks for catching that. We had corrected id mixup in the
> article but forgot to correct it in the note.
>
> Thanks!,
> Scott



RE: Fwd: HCLS IG Note on mapping and publishing life sciences RDF

2012-03-13 Thread Michael Miller
hi david and scott,

from my understanding, the common usage is a bit more confusing.  good old
wikipedia has:

" The terms a priori ("from the earlier") and a posteriori ("from the
later") are used in philosophy (epistemology) to distinguish two types of
knowledge, justifications or arguments. A priori knowledge or justification
is independent of experience (for example "All bachelors are unmarried"); a
posteriori knowledge or justification is dependent on experience or
empirical evidence (for example "Some bachelors are very happy"). A
posteriori justification makes reference to experience; but the issue
concerns how one knows the proposition or claim in question—what justifies
or grounds one's belief in it. Galen Strawson wrote that an a priori
argument is one in which "you can see that it is true just lying on your
couch. You don't have to get up off your couch and go outside and examine
the way things are in the physical world. You don't have to do any
science."[1] There are many points of view on these two types of assertions,
and their relationship is one of the oldest problems in modern philosophy. "
[1] and goes on to describe how confused and certain every one is about the
notions.

cheers,
michael

[1] http://en.wikipedia.org/wiki/A_priori_and_a_posteriori

Michael Miller
Software Engineer
Institute for Systems Biology

> -Original Message-
> From: David Booth [mailto:da...@dbooth.org]
> Sent: Tuesday, March 13, 2012 2:17 PM
> To: M. Scott Marshall
> Cc: HCLS; biohackat...@googlegroups.com;
> linkedlifedatapracticesn...@googlegroups.com; public-lod@w3.org
> Subject: Re: Fwd: HCLS IG Note on mapping and publishing life sciences RDF
>
> On Tue, 2012-03-13 at 21:16 +0100, M. Scott Marshall wrote:
> [ . . . ]
> > IG Note (Draft) HCLS IG Note on mapping and publishing life sciences RDF
> > [1]
> https://docs.google.com/document/d/1XzdsjCfPylcyOoNtDfAgz15HwRdCD-
> 0e0ixh21_U0y0/edit?hl=en_US
>
> Nice work on this!  A couple of small editorial suggestions:
>
> 1. AFAICT the phrases "a posteriori" and "a priori" are being misused to
> mean "afterward" and "beforehand".  These terms actually mean:
>
> http://www.onelook.com/?w=a+posteriori&ls=a
> a posteriori: "involving reasoning from facts or particulars to general
> principals or from effects to causes ("A posteriori demonstration")'
>
> http://www.onelook.com/?w=a+priori&ls=a
> a priori: 'involving deductive reasoning from a general principle to a
> necessary effect; not supported by fact ("An a priori judgment")'
>
>
> 2. The intro mentions that "a query for Homo sapiens gene label "Alg2"
> in Entrez Gene (http://www.ncbi.nlm.nih.gov/gene) returns multiple
> results. Among them is one gene located in chromosome 5 (Entrez
> ID:85365) and the other in chromosome 9 (Entrez ID:313231), each with
> multiple aliases".  But the results that I see show ID:85365 as the ID
> for the one on chromosome 9, and the other one (maybe?) has ID 10016:
> http://www.ncbi.nlm.nih.gov/gene?term=Alg2[sym]%20homo%20sapiens
>
>
> Thanks!
>
>
> --
> David Booth, Ph.D.
> http://dbooth.org/
>
> Opinions expressed herein are those of the author and do not necessarily
> reflect those of his employer.
>



RE: provenance questionnaire, v2

2011-09-06 Thread Michael Miller
hi all,

i would think that an authorization trail would be very important (a quick
search on the web for 'provenance authorization' came up with many hits so
perhaps there is already one that can be incorporated).

in the museum world, for art works that disappeared during world war two,
if a museum with a missing piece didn't have authorization from the museum
it came from as part of the provenance, they are obliged to return it.  i
would also think it would be important facet for certain queries.

cheers,
michael

Michael Miller
Software Engineer
Institute for Systems Biology

> -Original Message-
> From: public-semweb-lifesci-requ...@w3.org [mailto:public-semweb-
> lifesci-requ...@w3.org] On Behalf Of Deus, Helena
> Sent: Tuesday, September 06, 2011 2:18 AM
> To: Egon Willighagen
> Cc: public-lod@w3.org; public-semweb-life...@w3.org
> Subject: RE: provenance questionnaire, v2
>
> Thanks Egon,
> The provenance wg has been briefly concerned with authorization, but
> nothing too concrete has been devised yet.
> I will forward you concerns to the provenance workgroup.
>
> Cheers,
> Lena
>
> -Original Message-
> From: Egon Willighagen [mailto:egon.willigha...@gmail.com]
> Sent: 06 September 2011 09:03
> To: Deus, Helena
> Cc: public-lod@w3.org; public-semweb-life...@w3.org
> Subject: Re: provenance questionnaire, v2
>
> On Thu, Sep 1, 2011 at 11:42 PM, Deus, Helena 
> wrote:
> > For those of you who haven't answered and would like to give your 2c
> > about how provenance should be dealt with on the semantic web, here's
> your chance!
>
> Authorization would probably not be considered provenance, but I was
> wondering if the WG has been talking about that, and if there is an
> existing ontology that would be suitable for that, compatible with the
> provenance ontology... it's clear that at least the depositors
> (provenance) have authorization, so compatibility at that level seems
> needed... Or?
>
> Egon
>
>
> --
> Dr E.L. Willighagen
> Postdoctoral Researcher
> Institutet för miljömedicin
> Karolinska Institutet (http://ki.se/imm)
> Homepage: http://egonw.github.com/
> LinkedIn: http://se.linkedin.com/in/egonw
> Blog: http://chem-bla-ics.blogspot.com/
> PubList: http://www.citeulike.org/user/egonw/tag/papers




RE: best practice relation for linking to image/machine-opaque docs? biomedical use case

2011-01-10 Thread Michael Miller
hi tim and scott,

in looking at the ImageSelector, i'm surprised there are no units
specifically specified, either as a default or as a property.  are they
assumed to be pixels?

also, you might want to take a look at GelML
(http://psidev.info/index.php?q=node/448) for a bit more sophisticated way
to specify a position.  the specification allows four different types of
basic shapes: BoundaryChain, BoundaryPointSet, Circle, and Rectangle.
altho it's an XML Schema spec, it should be easy enough to translate to
RDF.

cheers,
michael


> -Original Message-
> From: public-semweb-lifesci-requ...@w3.org [mailto:public-semweb-
> lifesci-requ...@w3.org] On Behalf Of Tim Clark
> Sent: Monday, January 10, 2011 11:35 AM
> To: M. Scott Marshall
> Cc: HCLS IG; public-lod@w3.org; Daniel Rubin; John F. Madden; Vasiliy
> Faronov; Toby Inkster; Peter DeVries; Tim Berners-Lee; Paolo Ciccarese;
> Anita de Waard; Maryann Martone
> Subject: Re: best practice relation for linking to image/machine-opaque
> docs? biomedical use case
>
> Hi Scott,
>
> For referring to a portion of an image, let me point you to work in my
> group done in collaboration with HCLS Scientific Discourse Task, UCSD,
> Elsevier, and one of the major pharmas.  Paolo Ciccarese is the main
> author, and this work is based on the earlier W3C project Annotea.
>
> AO, Annotation ontology, here: http://code.google.com/p/annotation-
> ontology/, presented at Bio Ontologies 2010, and full-length paper in
> press at BMC Bioinformatics.
>
> Bio Ontologies 2010 slides here:
> http://www.slideshare.net/paolociccarese/ao-annotation-ontology-for-
> science-on-the-web
>
> AO uses a special subclass of Selector to specify the part of the
> document (image) being referred to.
>
> see here for Selectors: http://code.google.com/p/annotation-
> ontology/wiki/Selectors
>
> and here for an example of image annotation:
> http://code.google.com/p/annotation-ontology/wiki/AnnotationTypes
>
> Best
>
> Tim
>
> On Jan 10, 2011, at 11:30 AM, M. Scott Marshall wrote:
>
> > [Scott dusts off old use case and pulls from the shelf. Adjusts
> > subject of thread. Was: best practice for referring to PDF]
> >
> > In Health Care and Life Science domains, image data is a common form
> > of data under discussion so a best practice for referring to an image
> > or to an (extractable) feature *within* an image would cover a
> > fundamental need in biomedicine to point to 'raw' data as evidence
> (as
> > well as giving meaning to the raw data!).
> >
> > A clinical example from breast cancer:
> > There is a scan that produces an image that contains features
> referred
> > to by the radiologist as 'microcalcifications', which can be
> > indicative of the presence of a tumor.
> >
> > I can think of a few scenarios that would refer to the image data
> > (mammogram). There are probably more:
> > 1) The radiology report (in RDF) asserts the presence of
> > microcalcifications and refers to the entire image as evidence.
> > 2) The radiology report (in RDF) asserts the presence of
> > microcalcifications and refers to the entire image as evidence, along
> > with a image processing/feature extraction program that will
> highlight
> > the phenomenon in the image.
> > 3) The radiology report (in RDF) asserts the presence of
> > microcalcifications and refers to a specific region in the image as
> > evidence using some function of a 2D coordinate system such as
> > polyline.
> >
> > The question: How can we refer to the microcalcifications as an
> > indication of a certain type of tumor in each case 1, 2, and 3 in
> RDF?
> >
> > I am especially interested in the 'structural' aspects: How do we
> > refer to the image document as containingEvidence ? How can we refer
> > to a *region* of the image in the document? How can we refer to the
> > software that will extract the relevant features with statistical
> > confidence, etc.?
> >
> > Any ideas or pointers to existing practices would be appreciated. I'm
> > aware of some related work in multimedia to refer to temporal regions
> > but I am specifically interested in spatial regions.
> >
> > Note that an analogous question of practice exists for textual
> > documents such as literature in PubMed that can be text-mined for
> > (evidence of) assertions.
> >
> > * Note: 2D is a simplification that should come in handy in
> > implementations and often deemed necessary, such as thumbnails.
> >
> > -Scott
> >
> > --
> > M. Scott Marshall, W3C HCLS IG co-chair, http://www.w3.org/blog/hcls
> > Leiden University Medical Center / University of Amsterdam
> > http://staff.science.uva.nl/~marshall
> >
> > On Mon, Jan 10, 2011 at 4:01 PM, Tim Berners-Lee 
> wrote:
> >> It is well to look at and make best practices for the things
> >> we have if we don't
> >>
> >> It was the FOAF folks who, initially, instead of using linked data,
> >> used an Inverse Functional Property to uniquely identify
> >> someone and then rdfs:seeAlso to find the data about them.
> >> So any FOA