Martin, all, On Fri, Aug 23, 2013 at 6:23 AM, Nina Jeliazkova <[email protected]> wrote: > On 22 August 2013 23:55, Martin Guetlein <[email protected]> >> > if we're talking about mixtures (which is not really the case for a dot >> > connected representation), there could be various ways to generate a >> > mixture descriptor > > It depends. Calculating properties of mixtures is a science of its own. > Properties of salts could be different to those of parent compounds. For > isomers getting mean is just one of the options.
I think this is a critical point... descriptors have a meaning, many of which are defined for single chemical entities... To me, a classic chemometrics example is something that happened at a big life sciences company in the Netherlands. They were monitoring the weight of cookies (only one of their products) they baked... and by monitoring the average weight, they could see if the process was still running fine. Average weigths were fine, yet the system went crazy. What happened is that the last step in the process was cutting the square cookies into two triangles, cutting diagionally over the opposite corners. Now, that cutting went bad, with one cookie too large, one too small... yet, the average size they were monitoring stayed the same... So, wonder, what does the weight of a salt mean? Can you meaningfully compare a salt weight with a structure without a salt? Or with a much larger counter ion (not uncommon for drugs)? Now, consider that most descriptors really are developed for a single chemical graph, e.g. graph complexity... are they still useful for salts? For QSAR it is critical that one column actually has the same meaning, *and* that you can calculate *similarities* based on them... if you conflict with that assumption, you are rewritting what QSAR is about and based on. Of course, the CDK does not make this explicit, but that is explicitly suggested by the use of IMolecule, rather then IAtomContainer. It's however not enforced... So, my answer would be: you must split out the component you are really interested in, and only count descriptor values for that structure. Egon -- Dr E.L. Willighagen Postdoctoral Researcher Department of Bioinformatics - BiGCaT Maastricht University (http://www.bigcat.unimaas.nl/) Homepage: http://egonw.github.com/ LinkedIn: http://se.linkedin.com/in/egonw Blog: http://chem-bla-ics.blogspot.com/ PubList: http://www.citeulike.org/user/egonw/tag/papers ORCID: 0000-0001-7542-0286 ------------------------------------------------------------------------------ Introducing Performance Central, a new site from SourceForge and AppDynamics. Performance Central is your source for news, insights, analysis and resources for efficient Application Performance Management. Visit us today! http://pubads.g.doubleclick.net/gampad/clk?id=48897511&iu=/4140/ostg.clktrk _______________________________________________ Cdk-user mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/cdk-user

