Re: Size estimates of current LS space (and Introductions)

Jeremy Zucker Wed, 02 Aug 2006 02:30:54 -0700


Hello folks,

It appears that I forgot to put the URL in that last email about thepathguide: http://www.pathguide.org

Well, since I've already managed to embarrass myself publicly, Ifigure I might as well introduce myself formally.

My name is Jeremy Zucker, and I am a bioinformatics specialist at theDana-Farber Cancer Institute and a research fellow at Harvard MedicalSchool in George Church's lab.I have been working mainly with data integration[1] issues that arisefrom automating the metabolic reconstruction of pathway/genomedatabases [2] for the purpose of generating flux balance models [3].I also work with Joanne Luciano and others on a pathway exchangeformat in OWL/RDF called BioPAX.

The semantic web interests me for several reasons. For one, Ibelieve it will be a solid substrate for distributed curation, whichis a necessary part of the ongoing effort to improve the quality ofthe biological data we use.Like wikipedia, we need a way to exploit the wisdom of crowds todiscover, cross-validate, and annotate the biological data that weare currently using.

Second, the semantic web should make it easier to do distributed"pathway data mashups", such as overlaying expression data ontometabolic, signal transduction, and gene regulation pathways, tounderstand how the cell controls the production of itself, howcertain disease states form, how to alter metabolic pathways toremove toxins from the environment, and how to optimize themetabolic fluxes to produce useful biomolecules.

Third, with semantic web technologies such as description logics andrules, it should be possible to infer when two data sets are reallytalking about the same biological object, even if they use differentidentifiers to describe the thing.To that end, I have been working with Alan Ruttenberg and others atYork University, UCSD and SRI to develop an OWL/Description-logicbased method to automate the integration of two E. coli databases.

The first database has an extremely well-developed ontology [2]. Theother has a highly curated data set specifically tuned for fluxbalance analysis [3]. By merging them, it should be possible toautomatically generate metabolic flux models for any sequencedorganism.[4]

There, now that I have introduced myself and my interests, let's tryto estimate the number of javabeans in the Life sciences jar!


Sincerely,

Jeremy

[1] http://www.freebiology.org/wiki/Debugging_the_bug
[2] http://biocyc.org
[3] http://gcrg.ucsd.edu
[4] http://prelude.bu.edu/publications/Segre_etal_OMICS_2003.pdf


On Aug 2, 2006, at 1:17 AM, Jeremy Zucker wrote:


Hi folks,

One resource that is likely to be of use in the pathway space isthe pathguide:It has detailed statistics about the size of each database andother metadata for about 222 biological pathway databases.

This is the target space for conversion to BioPAX.

Sincerely,

Jeremy



On Jul 31, 2006, at 6:35 PM, Skinner, Karen ((NIH/NIDA)) [E] wrote:


These may be helpful resources:

The Nucleic Acids Research Public Links Directory
See:

http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=pubmed&cmd=Retrieve&dop

t=AbstractPlus&list_uids=16845014&query_hl=6&itool=pubmed_docsum


And the Nucleic Acids 2006 Molecular Biology Database Collection

http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?itool=abstractplus&db=pubm

ed&cmd=Retrieve&dopt=abstractplus&list_uids=16381871

Karen Skinner, Ph.D.
Deputy Director for Science and Technology Development
Division of Basic Neuroscience and Behavior Research
National Institute on Drug Abuse
Room 4243
6001 Executive Boulevard
Bethesda, Maryland 20892-9651
301-435-0886 or 301-443-1887
[EMAIL PROTECTED]


-----Original Message-----
From: Eric Neumann [mailto:[EMAIL PROTECTED]
Sent: Monday, July 31, 2006 10:07 AM
To: public-semweb-lifesci hcls
Subject: Size estimates of current LS space



As per today's Telcon, does any person with genomics knowledge (that
includes you too Carole) have estimates for the following numbers:

1. How many bio-molecular and organism-anatomical-functional entities
and records (broad sense) are currently accessible through the web
(excluding LIMS entities, such as samples, for now)?

2. Does this number grow substantially when it is allowed to include

every variant of protein, gene, etc. per species (i.e., notinstances of

real molecules or organisms)?


I think these would be quite useful for other W3C members to be aware
of, since some proposed mechanisms would require their global
indexing...

Eric

Re: Size estimates of current LS space (and Introductions)

Reply via email to