Re: BioRDF Telcon

Kei Cheung Sun, 13 Dec 2009 18:34:05 -0800

Hi Michael,

Thanks for pointing to MSigDB. I inlucded this in the related linkssection of the microarray use case description(http://esw.w3.org/topic/HCLSIG_BioRDF_Subgroup/QueryFederation2). Alsoplease see my response below.


Cheers,

-Kei

mdmiller wrote:

hi all,

here is he link to Molecular Signatures Database (MSigDB):

[1]: http://www.broadinstitute.org/gsea/msigdb/

cheers,
michael

----- Original Message ----- From: "mdmiller" <mdmille...@comcast.net>
To: "Kei Cheung" <kei.che...@yale.edu>
Cc: "Jim McCusker" <james.mccus...@yale.edu>; "Helena Deus"<helenad...@gmail.com>; "HCLS" <public-semweb-lifesci@w3.org>
Sent: Thursday, December 10, 2009 6:49 AM
Subject: Re: BioRDF Telcon
hi kei,
To me, ontologies can be used to facilitate integrated semanticqueries across experiments/datasets.
yes, and this is starting to become a reality. this effort, alongwith other HCLS initiatives are helping to pave the way.
While some of the protocols are standardized, the data protocols forobtaining things like gene lists vary a lot. One of my questions isthat can such data analysis protocols be somehow entered into mage-tab.
yes it can be, along with the gene list, but in practice this is notdone by the submitter. after the Derived Array Data representing thenormalized data, like CHP files, there can be one or more ProtocolREF columns describing the analysis to obtain the gene list followedby a Derived Array Data Matrix File that is the gene list with itssignature.
perhaps MIAME needs to be extended to state this. it's somethingi'll be bringing up with the MGED board. it's just now that this hasbecome something of value to be machine readable. besides GeneSigDB,there is another effort, MSiqDB [1], that is also curating genelists. so the community is beginning to see the value of this.

Yes, for these gene lists to be of value to researchers, rich annotationis key. The challenge here is that it's quite tedious to enter thecustom data analysis protocols in a structured way by hand.

At least for now, I don't think we need to convert the huge primarydata files (e.g., CEL file) into RDF. For the time being, we aremore focused on the processed gene lists that may be associated withmore biological meanings.
perhaps its worthwhile considering using an ontology 'raw data' classfor raw data that contains a reference to the data file. one couldthen use appropriate analysis tools to produce normalized data whichcould then also be referenced by a 'normalized data' class.

It seems to make sense.

cheers,
michael

----- Original Message ----- From: "Kei Cheung" <kei.che...@yale.edu>
To: "mdmiller" <mdmille...@comcast.net>
Cc: "Jim McCusker" <james.mccus...@yale.edu>; "Helena Deus"<helenad...@gmail.com>; "HCLS" <public-semweb-lifesci@w3.org>
Sent: Monday, December 07, 2009 7:32 AM
Subject: Re: BioRDF Telcon
mdmiller wrote:
hi jim and lena,

great progress!  this will be a nice tool.

a couple of comments.
1) i think ProtocolApplication is based seen as an individualinstance of the Protocol class. quite often there are argumentswhether ontologies should have individuals or be simply classes.to me, that doesn't apply here where real world objects are beingconnected to ontologies. the BioSource is realized as the 'SourceName' column in MAGE-TAB and those entries represent real people instudies, mice or rats in non-clinical studies, etc., and thecharacteristics values like age represent real individual instancesof age. in the same way, the values in the Protocol REF column ofMAGE-TAB are real wet-lab or analysis individual instances ofprotocols, called protocol applications in MAGE-OM.
It sounds like we need to look at how to map column names andentries to classes, instances, and relationships appropriately.
failure to make this distinction, to me, has obscured how muchvalue ontologies can have in the real world. too often i seeontologies seen in and of themselves, which has its own valuecertainly, but not for the use cases i have dealing with realbiological data.
To me, ontologies can be used to facilitate integrated semanticqueries across experiments/datasets.
2) the usefulness, for this use case, of the information betweenthe 'Source Name' and its characteristics and the 'Derived ArrayData Matrix File' or 'Derived Array Data File' has limitedusefulness, error correction and normalization can make somedifference but if the provider of the MAGE-TAB is trusted, all thatis pretty routine these days. the above combined with experimentalfactors and experiment design info is probably 95% to 99.9% theworthwhile information from the MAGE-TAB. if one notices adifference in the final gene set between two experiments that lookthe same, only then it might be worthwhile going into more detail.
and has been noted the MAGE-TAB information needs to besupplemented with the information on the final gene set, itsexpression values, and the higher-level level analysis that wasused, that is buried in the paper usually.
While some of the protocols are standardized, the data protocols forobtaining things like gene lists vary a lot. One of my questions isthat can such data analysis protocols be somehow entered into mage-tab.
3) i'm not sure if there was a desire to capture the raw data inthe RDF. that will be, for affymetrix, a million to six millionprobes in the CEL file, even the processed data in the CHP filewould have 20,000 to 60,000 probe sets. i'm not sure if that isthe best way to represent that.
At least for now, I don't think we need to convert the huge primarydata files (e.g., CEL file) into RDF. For the time being, we aremore focused on the processed gene lists that may be associated withmore biological meanings.
Cheers,

-Kei
cheers,
michael

Michael Miller
mdmille...@comcast.net
----- Original Message ----- From: "Jim McCusker"<james.mccus...@yale.edu>
To: "Helena Deus" <helenad...@gmail.com>
Cc: "Kei Cheung" <kei.che...@yale.edu>; "mdmiller"<mdmille...@comcast.net>; "HCLS" <public-semweb-lifesci@w3.org>
Sent: Monday, November 30, 2009 8:19 AM
Subject: Re: BioRDF Telcon


I'm following a similar strategy, but have been folowing the MGED
ontology where possible. I've finished aligning the IDF portion, and
have started on SDRF. MGED ontology is missing a property and class
for what is often termed as ProtocolApplication, which usually serves
as an edge between derived from and derived nodes, while linking to
the protocol used for the derivation. I am planning on creating this
link in a MAGE extensions ontology, but would like to vet the
structure here:

ProtocolApplication is a class.

New properties:

has_derivation_source
has_derivative

And then ProtocolApplication would have the restrictions:

has_protocol some Protocol

I don't put, domains, etc. on the derived properties to allow use in
directly describing derivations if people so choose. There is no
superclass for all nodes that can be derived or derived from, so I'm
not bothering with restrictions for those, although I could add a
union restriction to it.

If this structure us acceptable to people, I can publish the ontology
for general use pretty quickly, and let us work from the same data
structure. I would appreciate any feedback.

Jim
On Monday, November 30, 2009, Helena Deus <helenad...@gmail.com>wrote:
@Kei,



When you said data structure, did you mean the RDF structure
For now, all I have is the java object returned by parser. I'vebeen using Limpopo, which creates an object that I can then parseto RDF uing Jena. The challenge, though, has been coming up withthe predicates to formalize the relationships between the variouselements. I'm using the XML structures fir IDF/SDRF etc. athttp://magetab-om.sourceforge.net to automatically generate thestructure that will contain the data. My plan is to then createthe RDF triples that use the attributes described in thosedocuments and populate them with the data from the MAGE-TAB javaobject created by Limpopo.
Right now all I have is a very raw RDF/XML document describing therelationships in the IDF structure:http://magetab2rdf.googlecode.com/svn/trunk/magetabpredicates.rdfThe triples for that had to be encoded manually using Jena byreading the model.
@Satya and Jun
I would very much like to be involved in that effort, do youalready have a URL that I can look at?
ThanksLena
On Tue, Nov 24, 2009 at 2:19 PM, Kei Cheung <kei.che...@yale.edu>wrote:
Hi Lena et al,
When you said data structure, did you mean the RDF structure. Ifso, is a pointer to the structure that we can look at?
As discussed during yesterday's call, Jun and Satya will helpcreate a wiki page for listing some of the requirements forprovenance/workflow in the context of gene lists, perhaps weshould also use it to help coordinate some of the futureactivities (people also brought up Taverna during the callyesterday). Please coordinate with Satya and Jun.
Cheers,

-Kei

Helena Deus wrote:

Hi all,
I apologize for missing the call yesterday! It seems you had apretty interesting discussion! :-)If I understand Michael's statement, parsing the MAGE-TAB/MAGE-MLinto RDF would result in obtaining only the raw and processed datafiles but not the mechanism used to process it nor the resultinggene list. That's also what I concluded after looking at the datastructure created by Tony Burdett's Limpopo parser. However,having the raw data as linked data is already a great start! Kei,should I be looking into Taverna in order to reprocessed the rawfiles with a traceable analysis workflow?
Thanks!
Lena
On Tue, Nov 24, 2009 at 9:59 AM, mdmiller <mdmille...@comcast.net<mailto:mdmille...@comcast.net>> wrote:
 hi all,

 (from the minutes)

 "Yolanda/Kei/Scott: semantic annotation/description of workflow
 would enable the retrieval of data relevant to that workflow (i.e.
 data that could be used to populate that workflow for a different
 experimental scenario)"

 what is typically in a MAGE-TAB/MAGE-ML document are the protocols
 for how the source was processed into the extract then how the
 hybridization, feature extraction, error and normalization were
 performed. these are interesting and different protocols can
 cause differences at this level but it is pretty much a known art
 and usually not of too much interest or variability.

 what is usually missing from those documents, along with the final
 gene list, is how that gene list was obtained, what higher level
analysis was used, that is generally only in the paperunfortunately.
 cheers,
 michael
 .
 ----- Original Message ----- From: "Kei Cheung"

 <kei.che...@yale.edu <mailto:kei.che...@yale.edu>>
 To: "HCLS" <public-semweb-lifesci@w3.org

 <mailto:public-semweb-lifesci@w3.org>>
 Sent: Monday, November 23, 2009 1:27 PM
 Subject: Re: BioRDF Telcon



 Today's BioRDF minutes are available at the following:
http://esw.w3.org/topic/HCLSIG_BioRDF_Subgroup/Meetings/2009/11-23_Conference_Call
 Thanks to Rob for scribing.

 Cheers,

 -Kei

 Kei Cheung wrote:

 This is a reminder that the next BioRDF telcon call will
 be held at 11 am EDT (5 pm CET) on Monday, November 23
 (see details below).

 Cheers,

 -Kei

 == Conference Details ==
 * Date of Call: Monday November 23, 2009
 * Time of Call: 11:00 am Eastern Time
 * Dial-In #: +1.617.761.6200 (Cambridge, MA)
 * Dial-In #: +33.4.89.06.34.99 (Nice, France)
 * Dial-In #: +44.117.370.6152 (Bristol, UK)
 * Participant Access Code: 4257 ("HCLS")

 * IRC Channel: irc.w3.org <http://irc.w3.org> port 6665
 channel #

Re: BioRDF Telcon

Reply via email to