Hi Michael,

Our use case is considered a pilot project for exploring how to use semantic web to represent some of the information about microarray experiments including co-expressed/differentially expressed genes and the context of how such genes are identified (as described in papers). While keeping things simple and well defined (based on a limited number of examples), we hope to demonstrate how such information/knowledge on the semantic web can help researchers locate microarray datasets more easily. For example, users may be interested in collecting (from different databases) raw datasets belonging to different microarray experiments using a particular microarray platform (e.g., Affymetrix) to study Alzheimer Disease (AD) for particular neural cell types and brain regions, Such a collection may help researchers (biostatisticians) perform meta analysis to identify biomarkers for a given stage of AD, for example. Also, please see my response below.

Michael Liebman wrote:

I usually monitor this group and don't contribute but seeing the recent
exchanges about
Gene expression, I feel a need to put things into a better perspective than
the one currently
Being shared

I'm glad you contribute.

My experience comes from many years overseeing bioinformatics (and gene
expression, proteomics
And clinical data) at Wyeth, Roche, UPenn and with a DOD center at Windber-
There appear to be several issues that are not being realistically addressed
in the current discussion
Thanks for the introduction.

1. there is significant experimental variability across individual studies,
published or not-
Because of variation in tissue/cell handling/storage/preparation,
experimental variability in
The experiment and significant variability in the data analysis.  i.e.
experimental reproducibility
Inter-lab is poor and even intra-lab can be a major challenge

I agree with you on the challenge of variability inherent in microarray and other high-throughput technologies.

2. the measurement that is usually referred to as "up or down gene
expression/regulation" refers to The comparison between 2 experiments (sample under 2 different conditions)
but typically does not
Adequately correct for individual experimental variability other than
"simple" scaling.  We have shown
That this is inadequate.
Yes, that’s why more sophisticated normalization methods have been developed to address some of the variability issues.

3. leaving the interpretation to the author is significantly limited as it
tends to reflect the bias of
The author to "observe/confirm" what they are looking for in many of these
studies- i.e. a biostatistician
Will tell you that these experiments are extremely under-powered to reveal
the true statistically significant
Results they would like to achieve
We're not agreeing or disagreeing with the authors. We just want to capture the information as described in the paper. We'll let others judge the validity of the results presented in the paper.

4. human nature looks to favor the "big differences" as being most
significant- unfortunately nature doesn't
Work this way- many of the largest differences are not functionally relevant
but reflect the fact that biological
Control of these specific genes may not be critical to function and so large
variability can be observed and should
Not be interpreted, all of the time, as being most significant.  In fact, we
have developed analytical methods to
Look at large libraries of gene expression studies and evaluate the overall
stability/variability of individual
Genes (and probes) to establish a significance in difference between states
based on how much variation should be
Expected vs how much is observed, especially in genes that show extremely
small levels of expression overall and which
Would not be considered by typical approaches to data analysis
It sounds like your group is developing new methodologies to tackle the problem of determing the significance of gene expression.

Sorry to interrupt the exchange but I believe that it is critical, when
considering the development of systems to
Represent, store, exchange, model data, that an understanding of the
specifics and uniqueness of the underlying
Data and analytical approaches must be considered beyond simple statistics.
No problem. Thanks for the input.

Best,

-Kei

Michael

Michael N. Liebman, PhD
President/Managing Director
Strategic Medicine, Inc
231 Deepdale Drive
Kennett Square, PA 19348

(814) 659 5450 mobile

m.lieb...@strategicmedicine.com
www.strategicmedicine.com
-----Original Message-----
From: public-semweb-lifesci-requ...@w3.org
[mailto:public-semweb-lifesci-requ...@w3.org] On Behalf Of mdmiller
Sent: Wednesday, May 26, 2010 1:47 PM
To: Kei Cheung
Cc: HCLS
Subject: Re: BioRDF Telcon

hi kei,

Just want to clarify that what I meant was that it might be beyond the scope of our use case to accurately, comprehensively, and precisely define

what gene expression really  mean given the degree of complexity involved.

exactly, i believe we can trust the authors of the gene expression papers and the journals themselves for this

cheers,
michael

----- Original Message ----- From: "Kei Cheung" <kei.che...@yale.edu>
To: "mdmiller" <mdmille...@comcast.net>
Cc: "HCLS" <public-semweb-lifesci@w3.org>
Sent: Wednesday, May 26, 2010 7:23 AM
Subject: Re: BioRDF Telcon


Hi Michael,

mdmiller wrote:
hi kei,

What do we mean by differentially expressed genes? One definition is that differentially expressed genes are genes with significantly different expression in two samples/conditions/experimental factors/dimensions (e.g., treated vs. untreated, disease vs, normal, time point1 vs. time point 2) of microarray experiments.
yes, this was my meaning.

this is to differentiate between a gene that is always expressed under normal conditions because it is part of an essential pathway that is always running, that gene is only interesting if its expression level changes--similarly for a normally unexpressed gene.
Thanks for confirming. A consensus definition (even it's broad) is important to our gene list representation. There are a variety of methods (e.g., statistical tests) that can be used to identify a list of differentially expressed genes in two different groups. That's Scott's point about the importance of capturing as part of the genelist context what methods have been used for detecting differentially expressed genes. I hope the use case can help convince the community the need/use of a common vocabulary for describing such methods.
How to measure or infer gene expression (e.g., from mRNA) is a whole complex question that may be beyond the scope of our use case.
yes, which i think was scott's point in his reply. in fact, for the BioRDF use case, initially at least, it is probably sufficient that the authors of the paper state that a gene is part of the significant gene list.
Just want to clarify that what I meant was that it might be beyond the scope of our use case to accurately, comprehensively, and precisely define

what gene expression really  mean given the degree of complexity involved.

Cheers,

-Kei
cheers,
michael


----- Original Message ----- From: "Kei Cheung" <kei.che...@yale.edu>
To: "mdmiller" <mdmille...@comcast.net>
Cc: "M. Scott Marshall" <marsh...@science.uva.nl>; "HCLS" <public-semweb-lifesci@w3.org>
Sent: Tuesday, May 25, 2010 8:52 PM
Subject: Re: BioRDF Telcon


Hi Michael et al,

What do we mean by differentially expressed genes? One definition is that differentially expressed genes are genes with significantly different expression in two samples/conditions/experimental factors/dimensions (e.g., treated vs. untreated, disease vs, normal, time point1 vs. time point 2) of microarray experiments.

How to measure or infer gene expression (e.g., from mRNA) is a whole complex question that may be beyond the scope of our use case.

Cheers,

-Kei

mdmiller wrote:

hi scott,

i think you, jim and lena are doing a great job moving the technical aspect of this work forward. i'm looking forward to seeing the end results.

cheers,
michael

----- Original Message ----- From: "M. Scott Marshall" <marsh...@science.uva.nl>
To: "mdmiller" <mdmille...@comcast.net>
Cc: "Kei Cheung" <kei.che...@yale.edu>; "HCLS" <public-semweb-lifesci@w3.org>
Sent: Tuesday, May 25, 2010 10:21 AM
Subject: Re: BioRDF Telcon


Hi Michael,

Thanks for the clarification. I also explained those concepts during
the BioRDF teleconference but it is difficult for the scribe to
capture such details accurately from a phone conversation. Just
knowing that a gene has changed (either up or down) already gives us
something to work with. Since we started with the microarray use case,
we have aimed to focus on the list of differentially expressed genes
as our entry point into related molecular information, phenotypes,
pathways, diseases, etc.

In addition to the gene list and experimental factors, there is some
data provenance information that characterizes the origins of the gene
list, such as the type of significant analysis or technique that was
performed (ANOVA, LIMMA, ..) and p-value cutoff for the list discussed
in the associated article(s), software packages used (specific R
package from BioConductor, GeneSpring, NextBio, ..). It would be handy
if there was a common vocabulary for this type of information (URI's
for statistical techniques and software packages). I think that some
related resources have been described by myGrid/myExperiment. However,
lacking a complete vocabulary, it is still possible to make use of the
gene list without such a fine grained description of its provenance.

Cheers,
Scott

On Tue, May 25, 2010 at 9:35 AM, mdmiller <mdmille...@comcast.net> wrote:

hi all,

sorry i ended up not being able to make the call.

"P value
The probability (ranging from zero to one) that the results observed in a study could have occurred by chance if the null hypothesis was true. A P
value of ? 0.05 is often used as a threshold to indicate statistical
significance." (1)

the exact meaning of p-value depends on what is being measured.

also, sometimes it isn't so important that a gene is up or down regulated
but whether its expression changes from up or down regulated over the
experimental factors, e.g. if you increase the dose of the drug do the
target genes go from non-expressed to up regulated.

cheers,
michael

1)

http://www.ncbi.nlm.nih.gov/bookshelf/br.fcgi?book=antiepi&part=appendixes.a
pp2
----- Original Message ----- From: "Kei Cheung" <kei.che...@yale.edu>
To: "HCLS" <public-semweb-lifesci@w3.org>
Sent: Monday, May 24, 2010 11:40 AM
Subject: Re: BioRDF Telcon


Today's minutes are available at:



http://esw.w3.org/HCLSIG_BioRDF_Subgroup/Meetings/2010/05-24_Conference_Call
Thanks to Matthias for scribing.

Cheers,

-Kei

mdmiller wrote:

hi kei,

look forward to joining the call,
michael

----- Original Message ----- From: "Kei Cheung" <kei.che...@yale.edu>
To: "mdmiller" <mdmille...@comcast.net>; "HCLS"
<public-semweb-lifesci@w3.org>
Sent: Saturday, May 22, 2010 12:10 PM
Subject: Re: BioRDF Telcon


Hi Michael,

Yes, May 24 was what I meant. It was a typo.

Thanks,

-Kei

mdmiller wrote:

hi kei,

do you mean monday (may 24)?

cheers,
michael

----- Original Message ----- From: "Kei Cheung" <kei.che...@yale.edu>
To: "JunZhao" <jun.z...@zoo.ox.ac.uk>
Cc: <public-semweb-lifesci@w3.org>
Sent: Friday, May 21, 2010 2:28 PM
Subject: Re: BioRDF Telcon


Since there were only Jun and Scott who attended the last BioRDF

call
(I was not able to attend due to some emergency meetings), we decided to
have the next BioRDF call on the coming Monday (May 21) at 11 am

(EDT). The
agenda will be the same (see below).

Cheers,

-Kei

JunZhao wrote:

This is a reminder that the next BioRDF telcon call will be held at
11
am EDT (4 pm CET) on Monday, May 17 (see details below).

Cheers,

-Jun


== Conference Details ==
* Date of Call: Monday, May 17, 2010
* Time of Call: 11:00 am Eastern Time (4 pm CET)
* Dial-In #: +1.617.761.6200 (Cambridge, MA)
* Dial-In #: +33.4.89.06.34.99 (Nice, France)
* Dial-In #: +44.117.370.6152 (Bristol, UK)
* Participant Access Code: 4257 ("HCLS")
* IRC Channel: irc.w3.org port 6665 channel #HCLS (see W3C IRC page
for
details, or see Web IRC), Quick Start: Use

http://www.mibbit.com/chat/?server=irc.w3.org:6665&channel=%23hcls
for
IRC access.
* Duration: ~1 hour
* Frequency: bi-weekly
* Convener: Jun
* Scribe: to-be-determined

==Agenda==
* Introduction
* Gene list RDF representation
* iPhone demo















Reply via email to