Responses in line.
1. We have text mined much of the Affymetrix GEO data, curated it and
imported it into ArrayExpress - there is now much better sample
annotation than the native data in GEO. We also are running QC across
all the data files so we know which should be excluded for future
analyses.
I think it's the right thing to do both to enrich data annotation and
to enhance data quality. This will help data integration a lot.
Currently, we are exploring query federation in the neuroscience
context. It'd be great if we can use the neuroscience use case(s) to
help drive your ontology development for text mining and data
visualization. In addition to the NIH neuroscience microarray
consortium, it may be possible to collaborate with the Neuroscience
Information Framework (NIF) to see if we can utilize some of its
resources (e.g., neuron ontology).
Re-use of the neuron ontology is possible, but it depends on whether
there is available data to annotate either in ArrayExpress or GEO. If
you can get me a list of experiments accessions or pubmed ids I can see
if this is feasible
3. We have summary level data of genes x conditions for ~30,000 hybs
worth of data in our gene expression atlas with p values indicating
relative under/over-expression. We are planning to export these as
triples as soon as we publish the atlas - these may be of interest.
www.ebi.ac.uk/gxa - there's an API at present, but it will be
improved in the next month or so.
It fits well with what we're currently exploring in terms of gene list
representation and linking genes and samples to existing ontologies.
It'd be great if we can download or fetch RDF triples from EBI atlas.
We have a student starting work on this in a month, if you can produce
concrete use cases for how you want to access these data we can do
something.
4. If neuroscience data is of specific interest we could do a themed
atlas release where we add datasets for a given community or project
and make these available. These can be identified by ArrayExpress or
GEO accession or pubmed and we can re-annotate the genes vs
Uniprot/Ensembl, add GO terms, etc and curate the sample attributes
and experimental variables. These pipelines are already in place as
part of our production workflow.
I think it's a great idea to do a themed atlas (e.g., neuro-atlas). I
just played with gxa a little bit. It's nice! For example, I could
find genes that are over-expressed in the hippocampus brain region
across different experiments. However, when I tried to do the same
thing for neurons, there are only a few neuron types that I can
select. It'd be nice if we can have more neuron types, for instance.
This is probably as we don't have data - here's a list of human
experiments with the term neuron - if any of these are useful, then I
can prioritise their curation and inclusion in an atlas release
http://www.ebi.ac.uk/microarray-as/ae/browse.html?keywords=neuron&species=Homo+sapiens&array=&exptype=&pagesize=25&sortby=releasedate&sortorder=descending
and brain
http://www.ebi.ac.uk/microarray-as/ae/browse.html?keywords=brain&species=Homo+sapiens&array=&exptype=&pagesize=25&sortby=releasedate&sortorder=descending
I'd be very happy to collaborate, and for this group to use our data,
we spend a lot of time adding semantic value to it, so please let me
know if this is of interest
We are also looking into the possibility of establishing collaboration
with the scientific discourse task force based on the microarray use
case. We're planning to have a microarray-related presentation and
discussion on Aug. 31 (Monday, 11 am EDT/5 pm CET). Details will be
announced later. It'd be great if you can join the BioRDF call to
participate in the discussion.
Cheers,
-Kei
best regards
Helen
Kei Cheung wrote:
The minutes for yesterday's BioRDF call are available at:
http://esw.w3.org/topic/HCLSIG_BioRDF_Subgroup/Meetings/2009-07-20_Conference_Call
Thanks to Lena for scribing and Eric for retrieving the transcript
from the IRC log.
Cheers,
-Kei
Kei Cheung wrote:
This is a reminder that the next BioRDF teleconf. will be held at
11 am EDT (5 pm CET) on Monday, July 20 (see details below).
I created the following wiki page for discussing the microarray use
case:
http://esw.w3.org/topic/HCLSIG_BioRDF_Subgroup/QueryFederation2
Cheers,
-Kei
== Conference Details ==
* Date of Call: Monday July 20, 2009
* Time of Call: 11:00 am Eastern Time
* Dial-In #: +1.617.761.6200 (Cambridge, MA)
* Dial-In #: +33.4.89.06.34.99 (Nice, France)
* Dial-In #: +44.117.370.6152 (Bristol, UK)
* Participant Access Code: 4257 ("HCLS")
* IRC Channel: irc.w3.org port 6665 channel #hcls (see
[http://www.w3.org/Project/IRC/ W3C IRC page] for details, or see
[http://cgi.w3.org/member-bin/irc/irc.cgi Web IRC])
* Duration: ~1 hour
* Frequency: bi-weekly
* Convener: Kei Cheung
== Agenda ==
* Roll call and introduction (Kei)
* TCM data quick update (Jun, Kei)
* Query federation use case expanison (microarray) (All)
--
Helen Parkinson, PhD
ArrayExpress Production Coordinator,
Microarray Informatics Team,
EBI
EBI 01223 494672
Skype: helen.parkinson.ebi