Martin, all,

On Fri, Aug 23, 2013 at 6:23 AM, Nina Jeliazkova
<[email protected]> wrote:
> On 22 August 2013 23:55, Martin Guetlein <[email protected]>
>> > if we're talking about mixtures (which is not really the case for a dot
>> > connected representation), there could be various ways to generate a
>> > mixture descriptor
>
> It depends. Calculating properties of mixtures is a science of its own.
> Properties of salts could be different to those of parent compounds. For
> isomers getting mean is just one of the options.

I think this is a critical point... descriptors have a meaning, many
of which are defined for single chemical entities...

To me, a classic chemometrics example is something that happened at a
big life sciences company in the Netherlands. They were monitoring the
weight of cookies (only one of their products) they baked... and by
monitoring the average weight, they could see if the process was still
running fine. Average weigths were fine, yet the system went crazy.
What happened is that the last step in the process was cutting the
square cookies into two triangles, cutting diagionally over the
opposite corners. Now, that cutting went bad, with one cookie too
large, one too small... yet, the average size they were monitoring
stayed the same...

So, wonder, what does the weight of a salt mean? Can you meaningfully
compare a salt weight with a structure without a salt? Or with a much
larger counter ion (not uncommon for drugs)? Now, consider that most
descriptors really are developed for a single chemical graph, e.g.
graph complexity... are they still useful for salts? For QSAR it is
critical that one column actually has the same meaning, *and* that you
can calculate *similarities* based on them... if you conflict with
that assumption, you are rewritting what QSAR is about and based on.

Of course, the CDK does not make this explicit, but that is explicitly
suggested by the use of IMolecule, rather then IAtomContainer. It's
however not enforced...

So, my answer would be: you must split out the component you are
really interested in, and only count descriptor values for that
structure.

Egon




-- 
Dr E.L. Willighagen
Postdoctoral Researcher
Department of Bioinformatics - BiGCaT
Maastricht University (http://www.bigcat.unimaas.nl/)
Homepage: http://egonw.github.com/
LinkedIn: http://se.linkedin.com/in/egonw
Blog: http://chem-bla-ics.blogspot.com/
PubList: http://www.citeulike.org/user/egonw/tag/papers
ORCID: 0000-0001-7542-0286

------------------------------------------------------------------------------
Introducing Performance Central, a new site from SourceForge and 
AppDynamics. Performance Central is your source for news, insights, 
analysis and resources for efficient Application Performance Management. 
Visit us today!
http://pubads.g.doubleclick.net/gampad/clk?id=48897511&iu=/4140/ostg.clktrk
_______________________________________________
Cdk-user mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/cdk-user

Reply via email to