Hi Michael et al,

The following tools, for example, are available for microarray gene annotation.

SOURCE -- http://nar.oxfordjournals.org/cgi/content/full/31/1/219
KARMA -- http://nar.oxfordjournals.org/cgi/content/full/32/suppl_2/W441
RESOURCERER -- http://pga.tigr.org/tigr-scripts/magic/r1.pl
DRAGON -- http://pevsnerlab.kennedykrieger.org/dragon.htm

These tools take a gene list of interest and return annotation collected from multiple sources (e.g., gene ontology, UniProt, and KEGG). It might be useful if these tools can be made semantic-web-aware.

Cheers,

-Kei

Miller, Michael D (Rosetta) wrote:

Hi Bill and Allan,
You misunderstand my use case. My researcher doesn't much care that the world knows about his/her microarray experiment yet--in fact he/she may very well be searching for interesting information about the gene set to see whether it is worth going further or whether the experiment was just retreading old ground or whatever. There's this new tool, the semantic web, so the researcher is going to submit this set of genes and hopefully get useful information on them as a set. Now this researcher probably makes the assumption that as long as the naming source of the genes is indicated, no further work is required. This naming source may very well be GenBank, which, of course, isn't likely to be set up for easy access for pure semantic web tools for many years, if ever, as many people on this list would like but better be supported by the semantic web because for all its faults, and all the faults of the current sequence databases, if the semantic web can't garner information from them I don't see much hope for adoption from the common researcher. So perhaps the researcher gets back that the genes are part of a particular pathway, there were a few papers in PubMed that mentions them, some microarray experiments in public repositories had them significantly up or down regulated or effected by some drug but the conclusions for this experiment appear to be worthy so hopefully the experiment will get annotated (with these semantic web results as well), be made part of a submittal, and the experiment itself deposited in a public database to now be accessible for others searching the semantic web. "In translating the instance data into OWL, it should then be possible to perform the sort of higher level sorting and re-analysis Alan describes." Although this wasn't the use case I was talking about in this thread, it is obviously a very interesting use case also. I believe in an earlier e-mail I talked about something similar, but one will be unlikely to find out, in general, about the individuals in the experiment (outside of genes), because they will be truly unique instances of things like samples, hybridization and feature extraction and data but, if the researcher annotates these individuals from rich ontologies or from not so great sources that tools are developed to compensate for, then I agree entirely with you and Alan that much useful reasoning can be done on the semantic web. "The tendency when presenting these results in research articles - and often when sharing the data - is to provide the analyzed/reduced view of the data" Actually, I just heard Leroy Hood of ISB and other fame and Eric Schadt for Rosetta Inpharmatics (our parent company) give excellent talks at the MGED9 meeting where their research is opening up and bringing in information and data from a vast number of resources and tying it together into big pictures, all without the semantic web. I'm sure they would love to have the kind of power envisioned by the W3C for the semantic web but they won't touch it until it is easy--they are busy doing their core jobs. So I really think that we need to: 1) make sure the semantic web allows people to poke at it, I.e. ask the question is there anything interesting about a particular object, without having to say why they are interested 2) provide tools so that they can annotate their objects well so that when they are submitted they can be incorporated into the web (moving forward, this is one aim of the MAGEstk for gene expression experiments) 3) provide that existing imperfect resources have semantic web tools that can overcome those imperfections and get the usefulness from them people are currently getting 4) most importantly get a useful semantic web out there now, there's plenty of information available, then make it better as time goes along The resources that are ready set up for easy integration into the semantic web will come along for free. cheers,
Michael
    -----Original Message-----
    *From:* William Bug [mailto:[EMAIL PROTECTED]
    *Sent:* Friday, September 08, 2006 8:39 PM
    *To:* Alan Ruttenberg
    *Cc:* Miller, Michael D (Rosetta); Marco Brandizi;
    [EMAIL PROTECTED]; public-semweb-lifesci@w3.org
    *Subject:* Re: Playing with sets in OWL...

    I think Alan is making a very important general point here.

    MAGE-ML/MAGE-OM is perfectly tuned to the needs of:
    a) transferring entire microarray data sets across systems
    b) persisting microarray data sets (at least in certain scenarios)
    c) providing a systematic, normative interface for writing code to
    access specific elements and data collections one typically finds
    in the description of a microarray data set
    This is the sort of functionality data models are particularly
well suited at supporting.
    MAGE-OM/MAGE-ML is also the result of a huge amount of
    deliberation from dozens of experts in the informatics fields
    involved in generating, storing, and manipulating microarray data.

    When it comes to manipulating the information associated with a
    microarray experiment - or collection of experiments - in a
    semantically explicitly manner, however, RDF is really the
    preferred formalism providing the required explicit semantics,
    while still providing the expressiveness needed to characterize
    the inherent variety, complexity, and granularity in this
    information.  When it comes to filling out the assertions to the
    point of being able to reason on them - even simple reasoning such
    as consistency checks - some dialect of OWL will be the formalism
    of choice, I believe.

    I think Alan gives a very clear example of how to use OWL in this
    particular situation described by Marco.

    I have just a few questions in followup:
    1) The MAGE-ML XML Schema provides for a great deal of flexibility
    via the use of optional fields.  Still, any given use in a
    specific lab for a specific collection of microarray experiments
    is likely to develop it's own conventions for which fields to use
    and which not to use - and how to populate the more "open"
    elements.  With this in mind, it seems it should be possible under
    those circumstances to create an XSLT to translate the individuals
    contained in a MAGE-ML instance according to the elemental OWL
    classes Alan described -
    Expression_technology, Expression_technology_map, Spot_mapping,
    Expression_profile_experiment, Spot_intensity,
    Gene_expression_computation.  The latter can probably be
    reconstituted from the MAGE-ML elements BioAssay, BioAssayData,
    HigherLevelAnalysis, Measurement, and QuantitationType.  In
    translating the instance data into OWL, it should then be possible
    to perform the sort of higher level sorting and re-analysis Alan
    describes.  The translation should probably take the "open world"
    assumption into account, so the resulting OWL statements will
    provide the intended semantic completeness, even if that isn't
    represented in the MAGE-ML instances themselves.

    2) I think the use of OWL Alan describes here is going to be
    critical to performing broad field, large scale re-analysis of
    complex data sets such as microarray experiments and various types
    of neuro-images containing segmented geometric objects (in many
    ways equivalent to the segmentation performed on microarray images
    to determine the location and intensity of spots).  The tendency
    when presenting these results in research articles - and often
    when sharing the data - is to provide the analyzed/reduced view of
    the data.  In the context of these complex experiments, many forms
    of re-analysis will not be possible without access to the
    originally collected data.  Think of how critical BLAST-based
meta-analysis was for GeneBank through the 1990s (and still is). There are several underlying assertions making it possible to
    perform such analysis.  Primary among them is the acceptance that
    each form of sequencing technology provides a reliable way of
    determining the probability of finding a particular nucleotide at
    a particular location.  Many sequences are submitted with the
    simple assertion that at position N in sequence X there is a 100%
    probability (or 95% confidence, to be more specific) of finding
    nucleotide A|T|G|C.  To some extent, the statistical analysis
    performed by BLAST (and other position-sensitive,
    cross-correlative statistical algorithms) relied on these "ground
    facts".  For the most part, it was safe to assume this level of
    reduced data could be safely pooled with other such sequence
    determinations regardless of the specific sequencing device,
    underlying biochemical protocols, and specific lots of reagents
    used.  These same assumptions can not generally be safely assumed
    for microarray experiments, segmented MRI images - and many other
    types of images such as IHC or in situ based images.  As an
    example, just look to the debates in the last year or two
    regarding the sometimes problematic nature of replicating "gene
    expression" level results with different arrays covering the
    "same" genes.  If we are to support the same sort of meta analysis
    as was common with BLAST across GenBank sequences, then we will
    have to often supply access to the low level data elements.  This
    in fact was a major impetus behind providing the MAGE-OM (and
    FuGE-OM).  As I state at the top of this email with points 'a',
    'b', & 'c', MAGE-OM/MAGE-ML is extremely useful for several
critical tasks related to the handling of this detailed data. When it comes to supporting the semantically-grounded analytical
    requirements of such complex, broad field, meta-analysis, however,
    I think OWL (and sometimes RDF alone) is going to prove a critical
    enabling technology.

3) Re:anonymous classes/individuals of the type Alan describes: These are essentially "blank nodes" in the RDF sense - "unnamed"
    nodes based on a collection of necessary restrictions, if I
    understand things correctly.  Please pardon the naive question,
    but aren't there some caveats in terms of processing very large
RDF and/or OWL graphs containing "blank" or "anonymous" nodes. For many OWL ontologies, this might not be a concern, but if one
    were to be tempted to express a large variety of such sets based
    on different groupings of the sequence probes on a collection of
    arrays - groupings relevant to specific types of analysis - I
    could see how these anonymous entities - especially the anonymous
    sets of individuals - could really proliferate.

    Many thanks for providing this very helpful exemplar, Alan.

    Cheers,
    Bill


    On Sep 8, 2006, at 9:50 PM, Alan Ruttenberg wrote:


    Yes. However I don't think I would change anything I wrote.
    Because OWL works  in the open world, we can say that all these
    things exists, but only supply the details that we need. But
    having the framework which explains the meaning of what is
    supplied is one of the points of using ontologies. In this case,
    if all we know is that there was some computation that led to
    this gene set we could use some arbitrary name for it
    (remembering that if we decided to represent it later/ merge it
    with the experimental run we can use owl:sameAs to merge our name
    with the actual name).

    So. with reference to this ontology (generated by Marco, or
    imported  from some standard) he could simply state:

    Individual(c1 type(Computation)
       value(geneComputedAsExpressed g1)
       value(geneComputedAsExpressed g2)
       value(geneComputedAsExpressed g3)
     )

    If he wanted to state that the source was an array experiment
    (but he didn't know the details), he could add to c1

       value(fromExperiment Individual(
    type(ExpressionProfileExperiment)))

    which uses an anonymous individual (blank node) of the
    appropriate type. Now you know that the data originally came from
    an expression profile experiment,  though you haven't needed to
    add any other information other than that.

    The pattern that Marco mentions that is closest to this is

    set1 isA GeneSet
    set1 hasMember g1, g2, g3


    in that we are using the property values on an instance to
    represent the set. But the point I wanted to make was that a gene
    set isn't some arbitrary set. It is a choice, chosen for a
    reason/purpose, and that the ontology should explicitly represent
    those reasons/purposes.

    If there are defined kinds of follow up, then he could define
    define an instance to represent that process too.

    Finally, I wanted to make the technical point that that he
    doesn't need to use constructs of the form:

    set1 derivesFromUnionOf set2, set3


    OWL provides the ability to say these things, even when the "set"
    is the property values of an instance, for example, given

    Individual(c1 type(Computation)
       value(geneComputedAsExpressed g1)
     )

     Individual(c2 type(Computation)
       value(geneComputedAsExpressed g2)
       value(geneComputedAsExpressed g3)
    )

    supposing that he wanted to represent a followup list to be
    verified by RT PCR represented by the class RTPCRFollowup.
    Let's say that wanted to call the property geneToFollowUp, with
    inverse geneFollowedUpIn

    Individual(RTPCRFollowup1  type(RTPCRFollowup))

    EquivalentClasses(
      unionOf(
        restriction(GeneExpressedAccordingTo hasValue(c1))
        restriction(GeneExpressedAccordingTo hasValue(c2)))
      restriction(geneFollowedUpIn hasValue(RTPCRFollowup1))))

    Now, e.g. Pellet, will conclude that the values of the property
    geneToFollowUp of instance RTPCRFollowup1 is exactly g1, g2, g3

    Of course that's not the only way to do it, but it does show that
    OWL reasoning can make it economical to represent and work with
    sets without having to go off and recapitulate set theory.

    -Alan

    On Sep 8, 2006, at 7:41 PM, Miller, Michael D (Rosetta) wrote:


    Hi Alan,

    What you are describing is described in MAGE-OM/MAGE-ML, as a
    UML model
    to capture the real world aspects of running a microarray
    experiment.

    Typically at the end of this process a set of genes is identified as
    being interesting for some reason and one wants to know more
    about this
    set of genes beyond the microarray experiment that has been
    performed.

    I might be wrong but I think that is where Marco is starting, at
    the end
    of the experiment for follow-up.

    cheers,
    Michael

    -----Original Message-----
    From: [EMAIL PROTECTED]
    <mailto:[EMAIL PROTECTED]>
    [mailto:[EMAIL PROTECTED] On Behalf Of
    Alan Ruttenberg
    Sent: Friday, September 08, 2006 3:07 PM
    To: Marco Brandizi
    Cc: [EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]>;
    public-semweb-lifesci@w3.org <mailto:public-semweb-lifesci@w3.org>
    Subject: Re: Playing with sets in OWL...



    Hi Marco,

    There are a number of ways to work with sets, but I don't think I'd
    approach this problem from that point of view.
    Rather,  I would start by thinking about what my domain instances
    are, what their properties are, and what kinds of questions I
    want to
    be able to ask based on the representation. I'll sketch this out a
    bit, though the fact that I name an object or property doesn't mean
    that you have to supply it (remember OWL is open-world) - still
    listing these make the ontology makes your intentions clearer  and
    the ontology easier to work with by others.

    The heading in each of these is a class, of which you would
    make one
    or more instances to represent your results.
    The indented names are properties on instances of that class.

    An expression technology:
        Vendor:
        Product: e.g. array name
        Name of spots on the array
        Mappings:  (maps of spot to gene - you might use e.g.
    affymetrix,
    or you might compute your own)

    ExpressionTechnologyMap
       SpotMapping: (each value a spot mapping)

    Spot mapping:
       SpotID:
       GeneID:

    An expression profile experiment (call yours exp0)
        When done:
        Who did it:
        What technology was used: (an expression technology)
        Sample: (a sample)
        Treatment: ...
        Levels: A bunch of pairs of spot name, intensity

    Spot intensity
       SpotID:
       Intensity:

    A  computation of which spots/genes are "expressed" (call yours c1)
        Name of the method : e.g. mas5 above threshold
        Parameter of the method: e.g. the threshold
        Experiment: exp0
        Spot Expressed: spots that were over threshold
        Gene Computed As Expressed: genes that were over threshold

    And maybe:

    Conclusion
        What was concluded:
        By who:
        Based on: c1

    All of what you enter for your experiment are instances (so
    there are
    no issues of OWL Full)

    Now, The gene set you wanted can be expressed as a class:

    Let's define an inverse property of
    "GeneComputedAsExpressed", call
    it "GeneExpressedAccordingTo"

    Class(Set1 partial restriction(GeneExpressedAccordingTo
    hasValue(c1))

    Instances of Set1 will be those genes. You may or may not want to
    actually define this class. However I don't think that youneed
    to add any properties to it. Everything you would want to say
    probably wants to be said on one of the instances - the experiment,
    the computation, the conclusion, etc.

    Let me know if this helps/hurts - glad to discuss this some more

    -Alan




    2)

    On Sep 8, 2006, at 11:58 AM, Marco Brandizi wrote:


    Hi all,

    sorry for the possible triviality of my questions, or the

    messed-up

    mind
    I am possibly showing...

    I am trying to model the grouping of individuals into sets. In my
    application domain, the gene expression, people put

    together, let's

    say
    genes, associating a meaning to the sets.

    For instance:

    Set1 := { gene1, gene2, gene3 }

    is the set of genes that are expressed in experiment0

    (genei and exp0 are OWL individuals)


    I am understanding that this may be formalized in OWL by:

    - declaring Set1 as owl:subClassOf Gene
    - using oneOf to declare the membership of g1,2,3
    (or simpler: (g1 type Set1), (g2 type Set1), etc. )
    - using hasValue with expressed and exp0

    (right?)

    Now, I am trying to build an application which is like a semantic
    wiki.

    Hence users have a quite direct contact with the underline
    ontology, and
    they can write, with a simplified syntax, statements about a
    subject
    they are describing (subject-centric approach).

    Commiting to the very formal formalism of OWL looks a bit

    too much...

    formal... ;-) and hard to be handled with a semantic wiki-like
    application.

    Another problem is that the set could have properties on

    its own, for

    instance:

    Set1 hasAuthor Jhon

    meaning that John is defining it. But hasAuthor is

    typically used for

    individuals, and I wouldn't like to fall in OWL-Full, by

    making an OWL

    reasoner to interpret Set1 both as an individual and a class.

    Aren't there more informal (although less precise) methods to
    model
    sets, or list of individuals?

    An approach could be modeling some sort of set-theory over
    individuals:

    set1 isA GeneSet
    set1 hasMember g1, g2, g3
    ...

    set1 derivesFromUnionOf set2, set3

    ...

    But I am not sure it would be a good approach, or if someone else
    already tried that.

    Any suggestion?


    Thanks in advance for a reply.

    Cheers.

--

    ==============================================================
    ========

    =========
    Marco Brandizi <[EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]>>
    http://gca.btbs.unimib.it/brandizi











    Bill Bug
    Senior Research Analyst/Ontological Engineer

    Laboratory for Bioimaging  & Anatomical Informatics
    www.neuroterrain.org
    Department of Neurobiology & Anatomy
    Drexel University College of Medicine
    2900 Queen Lane
    Philadelphia, PA    19129
    215 991 8430 (ph)
    610 457 0443 (mobile)
    215 843 9367 (fax)


    Please Note: I now have a new email - [EMAIL PROTECTED]
    <mailto:[EMAIL PROTECTED]>




This email and any accompanying attachments are confidential. This information is intended solely for the use of the individual to whom it is addressed. Any review, disclosure, copying, distribution, or use of this email communication by others is strictly prohibited. If you are not the intended recipient please notify us immediately by returning this message to the sender and delete all copies. Thank you for your cooperation.



Reply via email to