Quoting Rajarshi Guha <[email protected]>:
>
> On Jan 26, 2010, at 11:30 AM, Vincent Le Guilloux wrote:
>
>> Dear cdk users,
>>
>> It seems that it's impossible to get other results than NaN values for
>> the following descriptors:
>>
>> Wgamma1.unity = NaN
>> Wgamma2.unity = NaN
>> Wgamma3.unity = NaN
>> WG.unity = NaN
>
>
> It's a known (but unfortunately undocumented) issue. I've seen these
> NaN's from time to time but haven't gotten round to investigating it.
> If I recall correctly the problem is in the determination of assymetric
> and symetric atoms (gamma descriptors)
>
Yes indeed :). I had a quick look at the source code, and saw that the
problem arises in this loop:
// look for symmetric & asymmetric atoms for the gamma descriptor
for (int i = 0; i < 3; i++) {
double ns = 0.0;
double na = 0.0;
for (int j = 0; j < ac.getAtomCount(); j++) {
boolean foundmatch = false;
for (int k = 0; k < ac.getAtomCount(); k++) {
if (k == j) continue;
if (scores[j][i] == -1 * scores[k][i]) {
ns++;
foundmatch = true;
break;
}
}
if (!foundmatch) na++;
}
double n = (double) ac.getAtomCount();
gamma[i] = -1.0 * ((ns / n) * Math.log(ns / n) / Math.log(2.0) +
(na / n) * Math.log(1.0 / n) / Math.log(2.0));
gamma[i] = 1.0 / (1.0 + gamma[i]);
}
The problem is that the number of symmetric atom ns is always 0. As a
consequence, ns/n = 0 and Math.log(ns / n) = -Infinity, which leads to
the NaN value.
I'm guessing that a default value is obviously needed when ns is 0,
which would fix this issue.
However I think the algorithm is broken as ns should not always be 0
as it is currently the case. I don't really know if the algorithm used
is theorically OK to detect symmetric atoms, but I think that in any
case, comparing two double values extracted from PCA computation isn't
a good idea due floating point imprecision (here: scores[j][i] == -1 *
scores[k][i]). But its just a guess... If I take the benzene as
example, here is the scores compared to each others, for the 6 carbon
atoms:
-0.0010549268112963923
1.074278584188431E-4
-4.564022550040472E-4
4.717739512629139E-4
7.722618841847252E-4
-0.0011905526719529509
Note that hydrogens are also compared in the algorithm.
Also, just a last remark and I stop bothering you: why don't you just
send a warning or something like that, instead of an exception, when
2D coordinates are detected? I ask this because, if I build a benzene
in Marvin, and if I calculate 3D coordinates (still in marvin), the
coordinates will still look 2D as the benzene is planar... And I will
not be able to calculate any of the 3D descriptor from the CDK.
Anyway, thanks for your answer :)
vincent
------------------------------------------------------------------------------
The Planet: dedicated and managed hosting, cloud storage, colocation
Stay online with enterprise data centers and the best network in the business
Choose flexible plans and management services without long-term contracts
Personal 24x7 support from experience hosting pros just a phone call away.
http://p.sf.net/sfu/theplanet-com
_______________________________________________
Cdk-user mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/cdk-user