Re: BioRDF Telcon

mdmiller Wed, 02 Dec 2009 08:13:39 -0800

hi jim and lena,

great progress!  this will be a nice tool.


a couple of comments.

1) i think ProtocolApplication is based seen as an individual instance ofthe Protocol class. quite often there are arguments whether ontologiesshould have individuals or be simply classes. to me, that doesn't applyhere where real world objects are being connected to ontologies. theBioSource is realized as the 'Source Name' column in MAGE-TAB and thoseentries represent real people in studies, mice or rats in non-clinicalstudies, etc., and the characteristics values like age represent realindividual instances of age. in the same way, the values in the ProtocolREF column of MAGE-TAB are real wet-lab or analysis individual instances ofprotocols, called protocol applications in MAGE-OM.

failure to make this distinction, to me, has obscured how much valueontologies can have in the real world. too often i see ontologies seen inand of themselves, which has its own value certainly, but not for the usecases i have dealing with real biological data.

2) the usefulness, for this use case, of the information between the 'SourceName' and its characteristics and the 'Derived Array Data Matrix File' or'Derived Array Data File' has limited usefulness, error correction andnormalization can make some difference but if the provider of the MAGE-TABis trusted, all that is pretty routine these days. the above combined withexperimental factors and experiment design info is probably 95% to 99.9% theworthwhile information from the MAGE-TAB. if one notices a difference inthe final gene set between two experiments that look the same, only then itmight be worthwhile going into more detail.

and has been noted the MAGE-TAB information needs to be supplemented withthe information on the final gene set, its expression values, and thehigher-level level analysis that was used, that is buried in the paperusually.

3) i'm not sure if there was a desire to capture the raw data in the RDF.that will be, for affymetrix, a million to six million probes in the CELfile, even the processed data in the CHP file would have 20,000 to 60,000probe sets. i'm not sure if that is the best way to represent that.


cheers,
michael

Michael Miller
mdmille...@comcast.net

----- Original Message -----From: "Jim McCusker" <james.mccus...@yale.edu>

To: "Helena Deus" <helenad...@gmail.com>

Cc: "Kei Cheung" <kei.che...@yale.edu>; "mdmiller" <mdmille...@comcast.net>;"HCLS" <public-semweb-lifesci@w3.org>

Sent: Monday, November 30, 2009 8:19 AM
Subject: Re: BioRDF Telcon


I'm following a similar strategy, but have been folowing the MGED
ontology where possible. I've finished aligning the IDF portion, and
have started on SDRF. MGED ontology is missing a property and class
for what is often termed as ProtocolApplication, which usually serves
as an edge between derived from and derived nodes, while linking to
the protocol used for the derivation. I am planning on creating this
link in a MAGE extensions ontology, but would like to vet the
structure here:

ProtocolApplication is a class.

New properties:

has_derivation_source
has_derivative

And then ProtocolApplication would have the restrictions:

has_protocol some Protocol

I don't put, domains, etc. on the derived properties to allow use in
directly describing derivations if people so choose. There is no
superclass for all nodes that can be derived or derived from, so I'm
not bothering with restrictions for those, although I could add a
union restriction to it.

If this structure us acceptable to people, I can publish the ontology
for general use pretty quickly, and let us work from the same data
structure. I would appreciate any feedback.

Jim

On Monday, November 30, 2009, Helena Deus <helenad...@gmail.com> wrote:

@Kei,



When you said data structure, did you mean the RDF structure
For now, all I have is the java object returned by parser. I've been usingLimpopo, which creates an object that I can then parse to RDF uing Jena.The challenge, though, has been coming up with the predicates to formalizethe relationships between the various elements. I'm using the XMLstructures fir IDF/SDRF etc. at http://magetab-om.sourceforge.net toautomatically generate the structure that will contain the data. My planis to then create the RDF triples that use the attributes described inthose documents and populate them with the data from the MAGE-TAB javaobject created by Limpopo.
Right now all I have is a very raw RDF/XML document describing therelationships in the IDF structure:http://magetab2rdf.googlecode.com/svn/trunk/magetabpredicates.rdfThe triples for that had to be encoded manually using Jena by reading themodel.
@Satya and Jun
I would very much like to be involved in that effort, do you already havea URL that I can look at?
ThanksLena
On Tue, Nov 24, 2009 at 2:19 PM, Kei Cheung <kei.che...@yale.edu> wrote:
Hi Lena et al,
When you said data structure, did you mean the RDF structure. If so, is apointer to the structure that we can look at?
As discussed during yesterday's call, Jun and Satya will help create awiki page for listing some of the requirements for provenance/workflow inthe context of gene lists, perhaps we should also use it to helpcoordinate some of the future activities (people also brought up Tavernaduring the call yesterday). Please coordinate with Satya and Jun.
Cheers,

-Kei

Helena Deus wrote:

Hi all,
I apologize for missing the call yesterday! It seems you had a prettyinteresting discussion! :-)If I understand Michael's statement, parsing the MAGE-TAB/MAGE-ML into RDFwould result in obtaining only the raw and processed data files but notthe mechanism used to process it nor the resulting gene list. That's alsowhat I concluded after looking at the data structure created by TonyBurdett's Limpopo parser. However, having the raw data as linked data isalready a great start! Kei, should I be looking into Taverna in order toreprocessed the raw files with a traceable analysis workflow?
Thanks!
Lena
On Tue, Nov 24, 2009 at 9:59 AM, mdmiller <mdmille...@comcast.net<mailto:mdmille...@comcast.net>> wrote:
 hi all,

 (from the minutes)

 "Yolanda/Kei/Scott: semantic annotation/description of workflow
 would enable the retrieval of data relevant to that workflow (i.e.
 data that could be used to populate that workflow for a different
 experimental scenario)"

 what is typically in a MAGE-TAB/MAGE-ML document are the protocols
 for how the source was processed into the extract then how the
 hybridization, feature extraction, error and normalization were
 performed. these are interesting and different protocols can
 cause differences at this level but it is pretty much a known art
 and usually not of too much interest or variability.

 what is usually missing from those documents, along with the final
 gene list, is how that gene list was obtained, what higher level
 analysis was used, that is generally only in the paper unfortunately.

 cheers,
 michael
 .
 ----- Original Message ----- From: "Kei Cheung"

 <kei.che...@yale.edu <mailto:kei.che...@yale.edu>>
 To: "HCLS" <public-semweb-lifesci@w3.org

 <mailto:public-semweb-lifesci@w3.org>>
 Sent: Monday, November 23, 2009 1:27 PM
 Subject: Re: BioRDF Telcon



 Today's BioRDF minutes are available at the following:


http://esw.w3.org/topic/HCLSIG_BioRDF_Subgroup/Meetings/2009/11-23_Conference_Call

 Thanks to Rob for scribing.

 Cheers,

 -Kei

 Kei Cheung wrote:

 This is a reminder that the next BioRDF telcon call will
 be held at 11 am EDT (5 pm CET) on Monday, November 23
 (see details below).

 Cheers,

 -Kei

 == Conference Details ==
 * Date of Call: Monday November 23, 2009
 * Time of Call: 11:00 am Eastern Time
 * Dial-In #: +1.617.761.6200 (Cambridge, MA)
 * Dial-In #: +33.4.89.06.34.99 (Nice, France)
 * Dial-In #: +44.117.370.6152 (Bristol, UK)
 * Participant Access Code: 4257 ("HCLS")

 * IRC Channel: irc.w3.org <http://irc.w3.org> port 6665
 channel #


--
Jim
--
Jim McCusker
Programmer Analyst
Krauthammer Lab, Pathology Informatics
Yale School of Medicine
james.mccus...@yale.edu | (203) 785-6330
http://krauthammerlab.med.yale.edu

PhD Student
Tetherless World Constellation
Rensselaer Polytechnic Institute
mcc...@cs.rpi.edu
http://tw.rpi.edu

Re: BioRDF Telcon

Reply via email to