2011/9/16 Michel Petitjean petitjean.chi...@gmail.com:
Chemical information: a field of fuzzy contours ?
-
Before turning to chemistry, I would recall some facts that I noticed
on the FIS forum:
although many people consider that a unifying definition of
information science is possible (to be constructed),
a number of other people consider that there are many concepts of
information which are not necessarily
the facets of an unique concept, so that it could be better to speak
about information scienceS,
and not about information science.
I can read on http://en.wikipedia.org/wiki/Information_science
Information science is an interdisciplinary science primarily
concerned with the
analysis, collection, classification, manipulation, storage, retrieval
and dissemination of information.
and some fewer lines above:
Information Science consists of having the knowledge and
understanding on how to collect, classify, manipulate, store, retrieve
and disseminate any type of information.
Clearly, collecting, storing, and retrieving information let us think
that we must deal with databases.
The question where is information is neglected, although answering
it is enlighting:
no doubt that much information is stored in data banks.
There are strong connections of Information Science(s) with Data
Mining (DM) and Knowledge Discovery in Databases (KDD).
Is the situation clearer in chemistry ?
Undoubtly there is a field of chemical information.
The ACS (American Chemical Society) has a Division of Chemical
Information (CINF),
named as such in 1975, but which in fact goes back to 1943
(http://www.acscinf.org/).
CINF is active and organizes various meetings which can be retrieved on the
web.
Visit also http://www.libsci.sc.edu/bob/chemnet/chchron.htm, an
informative website.
The ACS publishes the Journal of Chemical Information and Modeling
renamed so in 2005
after having been named Journal of Chemical Information and Computer
Sciences from 1975 to 2004,
itself being the continuation of the Journal of Chemical
Documentation from 1961 to 1974.
In fact, it is the same journal (one volume per year), which turned to
chemical information the same year that CINF received his actual name.
Interestingly, still in 1975, the main cheminformatics lab in France
(in fact the only one in France at this time) was renamed.
The old name was LCOP (Laboratoire de Chimie Organique Physique),
and the new name was ITODYS, still in vigor,
meaning until 2001: Institut de TOpologie et de DYnamique des
Systemes. This name, which can be understood in English due
to the close similarity between the French and the English words, was
partly due to the existence of a distance in the molecular graphs
(this distance is the smaller number of chemical bonds separating two
atoms), and as known, a distance induces a topology:
it clearly acknowledged the cheminformatics aspects of the research
performed in the lab.
Chemical Information Science, which is sometimes named Chemical Informatics
(http://www.indiana.edu/~cheminfo/acs800/soced_wash.html)
can be reasonably considered to be a part of the Cheminformatics field.
This latter is defined on Wikipedia
(http://en.wikipedia.org/wiki/Cheminformatics):
Chemoinformatics is the mixing of those information resources to
transform data into information and
information into knowledge for the intended purpose of making better
decisions faster in the area of
drug lead identification and optimization.
This definition, dated from 1998, clearly acknowledges the extraction
of information from data,
but it is restrictive since it discards all pioneering works about
computerization of chemical databases,
including structural formulas coding and structural motifs retrieval,
which historically cannot be denied
to be the core of the cheminformatics field.
Now let me write more lines about the story of cheminformatics in France,
which is a bit funny but enlights the debate on the definition on the
field of chemical information.
The French pioneer was Jacques-Emile Dubois (1920-2005), founder of
the LCOP and of the ITODYS,
who published his first cheminformatics paper in 1966. One of his main
ideas was to use the concept
of concentric layers in the molecular graphs: the nodes are the atoms
and the edges are the bonds,
the neighbours of a node constitute the first concentric layer around this
node,
the next neighbours constitute the second layer, and so on.
This concept was known to mathematicians such as Cayley and Polya.
Here, the challenge was to explain to experimental chemists that in a
number of applications, such as QSAR
(Quantitative Structure-Activity Relationship), the use of sets of two
concentric layers around focus atoms
may be more efficient that the usually taught approaches based on
squeletons and substituents.
Dubois also thought that this concept could help to