Hi Lena,

Thanks for finding the IDF and SDRF files corresponding to the experiment (E-GEOD-4757). It looks like the SDRF file contains richer metadata that can support richer semantic queries across experiments (e.g., finding experiments that involve the same cell types for the same/related brain regions for the same species).

I noticed in the SDRF file that there are 20 samples (10 normal and 10 AD with neurofibriallary tangle). According to the abstract of the paper (http://www.ncbi.nlm.nih.gov/pubmed/16242812?dopt=Abstract), it says the following:

" ... we compared gene expression profiles of NFT-bearing entorhinal cortex neurons from 19 AD patients, adjacent non-NFT-bearing entorhinal cortex neurons from the same patients, and non-NFT-bearing entorhinal cortex neurons from 14 non-demented, histopathologically normal controls (ND). "

If I understand it correctly, there should be a total of 33 samples (19 AD and 14 normal). This may be more of a curation question for the ArrayExpress team. Maybe I missed something.

Cheers,

-Kei

Helena Deus wrote:
Hi Kei,

Furtunatelly arrayexpress provides both the IDF and SDRF for that acession number, at http://www.ebi.ac.uk/microarray-as/ae/browse.html?keywords=E-GEOD-4757

I have a small RDF document of that IDF at http://magetab2rdf.googlecode.com/svn/trunk/E-GEOD-4757.idf.rdf

Thanks
Lena

On Tue, Dec 1, 2009 at 9:20 PM, Kei Cheung <kei.che...@yale.edu <mailto:kei.che...@yale.edu>> wrote:

    Hi Lena,


    Helena Deus wrote:

        @Kei,

           When you said data structure, did you mean the RDF structure


        For now, all I have is the java object returned by parser.
        I've been using Limpopo, which creates an object that I can
        then parse to RDF uing Jena. The challenge, though, has been
        coming up with the predicates to formalize the relationships
        between the various elements. I'm using the XML structures fir
        IDF/SDRF etc. at http://magetab-om.sourceforge.net to
        automatically generate the structure that will contain the
        data. My plan is to then create the RDF triples that use the
        attributes described in those documents and populate them with
        the data from the MAGE-TAB java object created by Limpopo.



    Thanks for the pointer and explaining your strategy. We might not
    need to convert everything from mage-tab for our purposes.



        Right now all I have is a very raw RDF/XML document describing
        the relationships in the IDF structure:
        http://magetab2rdf.googlecode.com/svn/trunk/magetabpredicates.rdf
        The triples for that had to be encoded manually using Jena by
        reading the model.


    I think IDF is a good start. For a real example for our use case,
    I wonder if any mage-tab file is available for experiment
    E-GEOD-4757 (transcription profiling of human neurons with and
    without neurofibriallary tangles from Alzheimer's patients). Helen
    may know.

    Cheers,

    -Kei


        @Satya and Jun

        I would very much like to be involved in that effort, do you
        already have a URL that I can look at?

        Thanks
        Lena

        On Tue, Nov 24, 2009 at 2:19 PM, Kei Cheung
        <kei.che...@yale.edu <mailto:kei.che...@yale.edu>
        <mailto:kei.che...@yale.edu <mailto:kei.che...@yale.edu>>> wrote:

           Hi Lena et al,

           When you said data structure, did you mean the RDF
        structure. If
           so, is a pointer to the structure that we can look at?

           As discussed during yesterday's call, Jun and Satya will help
           create a wiki page for listing some of the requirements for
           provenance/workflow in the context of gene lists, perhaps we
           should also use it to help coordinate some of the future
           activities (people also brought up Taverna during the call
           yesterday). Please coordinate with Satya and Jun.

           Cheers,

           -Kei

           Helena Deus wrote:

               Hi all,

               I apologize for missing the call yesterday! It seems
        you had a
               pretty interesting discussion! :-)
               If I understand Michael's statement, parsing the
               MAGE-TAB/MAGE-ML into RDF would result in obtaining
        only the
               raw and processed data files but not the mechanism used to
               process it nor the resulting gene list. That's also what I
               concluded after looking at the data structure created
        by Tony
               Burdett's Limpopo parser. However, having the raw data as
               linked data is already a great start! Kei, should I be
        looking
               into Taverna in order to reprocessed the raw files with a
               traceable analysis workflow?

               Thanks!
               Lena



               On Tue, Nov 24, 2009 at 9:59 AM, mdmiller
               <mdmille...@comcast.net <mailto:mdmille...@comcast.net>
        <mailto:mdmille...@comcast.net <mailto:mdmille...@comcast.net>>
               <mailto:mdmille...@comcast.net
        <mailto:mdmille...@comcast.net>

               <mailto:mdmille...@comcast.net
        <mailto:mdmille...@comcast.net>>>> wrote:

                  hi all,

                  (from the minutes)

                  "Yolanda/Kei/Scott: semantic annotation/description
        of workflow
                  would enable the retrieval of data relevant to that
               workflow (i.e.
                  data that could be used to populate that workflow for a
               different
                  experimental scenario)"

                  what is typically in a MAGE-TAB/MAGE-ML document are the
               protocols
                  for how the source was processed into the extract
        then how the
                  hybridization, feature extraction, error and
        normalization were
                  performed.  these are interesting and different
        protocols can
                  cause differences at this level but it is pretty much a
               known art
                  and usually not of too much interest or variability.

                  what is usually missing from those documents, along with
               the final
                  gene list, is how that gene list was obtained, what
        higher
               level
                  analysis was used, that is generally only in the paper
               unfortunately.

                  cheers,
                  michael
                  .
                  ----- Original Message ----- From: "Kei Cheung"
                  <kei.che...@yale.edu <mailto:kei.che...@yale.edu>
        <mailto:kei.che...@yale.edu <mailto:kei.che...@yale.edu>>
               <mailto:kei.che...@yale.edu
        <mailto:kei.che...@yale.edu> <mailto:kei.che...@yale.edu
        <mailto:kei.che...@yale.edu>>>>


                  To: "HCLS" <public-semweb-lifesci@w3.org
        <mailto:public-semweb-lifesci@w3.org>
               <mailto:public-semweb-lifesci@w3.org
        <mailto:public-semweb-lifesci@w3.org>>
                  <mailto:public-semweb-lifesci@w3.org
        <mailto:public-semweb-lifesci@w3.org>
               <mailto:public-semweb-lifesci@w3.org
        <mailto:public-semweb-lifesci@w3.org>>>>

                  Sent: Monday, November 23, 2009 1:27 PM
                  Subject: Re: BioRDF Telcon



                      Today's BioRDF minutes are available at the
        following:

http://esw.w3.org/topic/HCLSIG_BioRDF_Subgroup/Meetings/2009/11-23_Conference_Call

                      Thanks to Rob for scribing.

                      Cheers,

                      -Kei

                      Kei Cheung wrote:

                          This is a reminder that the next BioRDF
        telcon call
               will
                          be held at 11 am EDT (5 pm CET) on Monday,
        November 23
                          (see details below).

                          Cheers,

                          -Kei

                          == Conference Details ==
                          * Date of Call: Monday November 23, 2009
                          * Time of Call: 11:00 am Eastern Time
                          * Dial-In #: +1.617.761.6200 (Cambridge, MA)
                          * Dial-In #: +33.4.89.06.34.99 (Nice, France)
                          * Dial-In #: +44.117.370.6152 (Bristol, UK)
                          * Participant Access Code: 4257 ("HCLS")
                          * IRC Channel: irc.w3.org
        <http://irc.w3.org> <http://irc.w3.org>
               <http://irc.w3.org> port 6665

                          channel #HCLS (see W3C IRC page for details,
        or see Web
                          IRC), Quick Start: Use
http://www.mibbit.com/chat/?server=irc.w3.org:6665&channel=%23hcls
        <http://www.mibbit.com/chat/?server=irc.w3.org:6665&channel=%23hcls>
<http://www.mibbit.com/chat/?server=irc.w3.org:6665&channel=%23hcls
        <http://www.mibbit.com/chat/?server=irc.w3.org:6665&channel=%23hcls>>
<http://www.mibbit.com/chat/?server=irc.w3.org:6665&channel=%23hcls
        <http://www.mibbit.com/chat/?server=irc.w3.org:6665&channel=%23hcls>
<http://www.mibbit.com/chat/?server=irc.w3.org:6665&channel=%23hcls
        <http://www.mibbit.com/chat/?server=irc.w3.org:6665&channel=%23hcls>>>
                          for IRC access.
                          * Duration: ~1 hour
                          * Frequency: bi-weekly
                          * Convener: Kei Cheung
                          * Scribe: to-be-determined

                          == Agenda ==
                          * Roll call & introduction (Kei)
                          * RDF representation of microarray
        experiment and
               data (All)
                          * Provenance and workflow (All)


















Reply via email to