Hello folks,
It appears that I forgot to put the URL in that last email about the
pathguide: http://www.pathguide.org
Well, since I've already managed to embarrass myself publicly, I
figure I might as well introduce myself formally.
My name is Jeremy Zucker, and I am a bioinformatics specialist at the
Dana-Farber Cancer Institute and a research fellow at Harvard Medical
School in George Church's lab.
I have been working mainly with data integration[1] issues that arise
from automating the metabolic reconstruction of pathway/genome
databases [2] for the purpose of generating flux balance models [3].
I also work with Joanne Luciano and others on a pathway exchange
format in OWL/RDF called BioPAX.
The semantic web interests me for several reasons. For one, I
believe it will be a solid substrate for distributed curation, which
is a necessary part of the ongoing effort to improve the quality of
the biological data we use.
Like wikipedia, we need a way to exploit the wisdom of crowds to
discover, cross-validate, and annotate the biological data that we
are currently using.
Second, the semantic web should make it easier to do distributed
"pathway data mashups", such as overlaying expression data onto
metabolic, signal transduction, and gene regulation pathways, to
understand how the cell controls the production of itself, how
certain disease states form, how to alter metabolic pathways to
remove toxins from the environment, and how to optimize the
metabolic fluxes to produce useful biomolecules.
Third, with semantic web technologies such as description logics and
rules, it should be possible to infer when two data sets are really
talking about the same biological object, even if they use different
identifiers to describe the thing.
To that end, I have been working with Alan Ruttenberg and others at
York University, UCSD and SRI to develop an OWL/Description-logic
based method to automate the integration of two E. coli databases.
The first database has an extremely well-developed ontology [2]. The
other has a highly curated data set specifically tuned for flux
balance analysis [3]. By merging them, it should be possible to
automatically generate metabolic flux models for any sequenced
organism.[4]
There, now that I have introduced myself and my interests, let's try
to estimate the number of javabeans in the Life sciences jar!
Sincerely,
Jeremy
[1] http://www.freebiology.org/wiki/Debugging_the_bug
[2] http://biocyc.org
[3] http://gcrg.ucsd.edu
[4] http://prelude.bu.edu/publications/Segre_etal_OMICS_2003.pdf
On Aug 2, 2006, at 1:17 AM, Jeremy Zucker wrote:
Hi folks,
One resource that is likely to be of use in the pathway space is
the pathguide:
It has detailed statistics about the size of each database and
other metadata for about 222 biological pathway databases.
This is the target space for conversion to BioPAX.
Sincerely,
Jeremy
On Jul 31, 2006, at 6:35 PM, Skinner, Karen ((NIH/NIDA)) [E] wrote:
These may be helpful resources:
The Nucleic Acids Research Public Links Directory
See:
http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?
db=pubmed&cmd=Retrieve&dop
t=AbstractPlus&list_uids=16845014&query_hl=6&itool=pubmed_docsum
And the Nucleic Acids 2006 Molecular Biology Database Collection
http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?
itool=abstractplus&db=pubm
ed&cmd=Retrieve&dopt=abstractplus&list_uids=16381871
Karen Skinner, Ph.D.
Deputy Director for Science and Technology Development
Division of Basic Neuroscience and Behavior Research
National Institute on Drug Abuse
Room 4243
6001 Executive Boulevard
Bethesda, Maryland 20892-9651
301-435-0886 or 301-443-1887
[EMAIL PROTECTED]
-----Original Message-----
From: Eric Neumann [mailto:[EMAIL PROTECTED]
Sent: Monday, July 31, 2006 10:07 AM
To: public-semweb-lifesci hcls
Subject: Size estimates of current LS space
As per today's Telcon, does any person with genomics knowledge (that
includes you too Carole) have estimates for the following numbers:
1. How many bio-molecular and organism-anatomical-functional entities
and records (broad sense) are currently accessible through the web
(excluding LIMS entities, such as samples, for now)?
2. Does this number grow substantially when it is allowed to include
every variant of protein, gene, etc. per species (i.e., not
instances of
real molecules or organisms)?
I think these would be quite useful for other W3C members to be aware
of, since some proposed mechanisms would require their global
indexing...
Eric