Hello folks,

It appears that I forgot to put the URL in that last email about the pathguide: http://www.pathguide.org

Well, since I've already managed to embarrass myself publicly, I figure I might as well introduce myself formally.

My name is Jeremy Zucker, and I am a bioinformatics specialist at the Dana-Farber Cancer Institute and a research fellow at Harvard Medical School in George Church's lab. I have been working mainly with data integration[1] issues that arise from automating the metabolic reconstruction of pathway/genome databases [2] for the purpose of generating flux balance models [3]. I also work with Joanne Luciano and others on a pathway exchange format in OWL/RDF called BioPAX.


The semantic web interests me for several reasons. For one, I believe it will be a solid substrate for distributed curation, which is a necessary part of the ongoing effort to improve the quality of the biological data we use. Like wikipedia, we need a way to exploit the wisdom of crowds to discover, cross-validate, and annotate the biological data that we are currently using.

Second, the semantic web should make it easier to do distributed "pathway data mashups", such as overlaying expression data onto metabolic, signal transduction, and gene regulation pathways, to understand how the cell controls the production of itself, how certain disease states form, how to alter metabolic pathways to remove toxins from the environment, and how to optimize the metabolic fluxes to produce useful biomolecules.

Third, with semantic web technologies such as description logics and rules, it should be possible to infer when two data sets are really talking about the same biological object, even if they use different identifiers to describe the thing. To that end, I have been working with Alan Ruttenberg and others at York University, UCSD and SRI to develop an OWL/Description-logic based method to automate the integration of two E. coli databases.

The first database has an extremely well-developed ontology [2]. The other has a highly curated data set specifically tuned for flux balance analysis [3]. By merging them, it should be possible to automatically generate metabolic flux models for any sequenced organism.[4]

There, now that I have introduced myself and my interests, let's try to estimate the number of javabeans in the Life sciences jar!

Sincerely,

Jeremy

[1] http://www.freebiology.org/wiki/Debugging_the_bug
[2] http://biocyc.org
[3] http://gcrg.ucsd.edu
[4] http://prelude.bu.edu/publications/Segre_etal_OMICS_2003.pdf


On Aug 2, 2006, at 1:17 AM, Jeremy Zucker wrote:


Hi folks,

One resource that is likely to be of use in the pathway space is the pathguide: It has detailed statistics about the size of each database and other metadata for about 222 biological pathway databases.
This is the target space for conversion to BioPAX.

Sincerely,

Jeremy



On Jul 31, 2006, at 6:35 PM, Skinner, Karen ((NIH/NIDA)) [E] wrote:


These may be helpful resources:

The Nucleic Acids Research Public Links Directory
See:
http://www.ncbi.nlm.nih.gov/entrez/query.fcgi? db=pubmed&cmd=Retrieve&dop
t=AbstractPlus&list_uids=16845014&query_hl=6&itool=pubmed_docsum


And the Nucleic Acids 2006 Molecular Biology Database Collection
http://www.ncbi.nlm.nih.gov/entrez/query.fcgi? itool=abstractplus&db=pubm
ed&cmd=Retrieve&dopt=abstractplus&list_uids=16381871

Karen Skinner, Ph.D.
Deputy Director for Science and Technology Development
Division of Basic Neuroscience and Behavior Research
National Institute on Drug Abuse
Room 4243
6001 Executive Boulevard
Bethesda, Maryland 20892-9651
301-435-0886 or 301-443-1887
[EMAIL PROTECTED]


-----Original Message-----
From: Eric Neumann [mailto:[EMAIL PROTECTED]
Sent: Monday, July 31, 2006 10:07 AM
To: public-semweb-lifesci hcls
Subject: Size estimates of current LS space



As per today's Telcon, does any person with genomics knowledge (that
includes you too Carole) have estimates for the following numbers:

1. How many bio-molecular and organism-anatomical-functional entities
and records (broad sense) are currently accessible through the web
(excluding LIMS entities, such as samples, for now)?

2. Does this number grow substantially when it is allowed to include
every variant of protein, gene, etc. per species (i.e., not instances of
real molecules or organisms)?


I think these would be quite useful for other W3C members to be aware
of, since some proposed mechanisms would require their global
indexing...

Eric







Reply via email to