Hi Martin,
> My application is building QSAR models. I am not a chemist, but my
> feeling is that the clean but complicated solution would be to have
> 'set-valued features' (a set of values instead of a single value) for
> multi-compound molecules.
The trouble is one has to detect the and split into multiple molecules
in the first place. The existing code (ConnectivityChecker) to do this is
sub-optimal and generally a burden. The general algorithm is actually
linear but the CDK version is quadratic time.
I'm not sure if this should be built in functionality but a simple
adapter would allow you to choose whether to combined the values
to an average or return them as a set.
class DisconnectedDescriptorAdapter implements IMolecularDescriptor {
IMolecularDescriptor delegate;
DisconnectedDescriptorAdapter(IMolecularDescriptor delegate) {
this.delegate = delegate;
}
public Set<DescriptorValue> calculate(IAtomContainer container) {
IAtomContainerSet ms =
ConnectivityChecker.partitionIntoMolecules(container);
Set<DescriptorValue> values = new HashSet<DescriptorValue>();
for (IAtomContainer m : ms) {
values.add(delegate.calculate(containers.getAtomContainer(i)));
}
return values;
}
}
J
On 22 Aug 2013, at 17:14, Martin Guetlein <[email protected]>
wrote:
> Hi,
>
> How do CDK descriptors handle molecules with multiple compounds in it?
>
> I experimented a bit, and found out that it depends on the descriptor:
> * most descriptors apparently just add up the values of the single
> compounds (like xlogp, that does make no sense does it?)
> * some fail for multi-compound molecules
> * some compute sth else
>
> My application is building QSAR models. I am not a chemist, but my
> feeling is that the clean but complicated solution would be to have
> 'set-valued features' (a set of values instead of a single value) for
> multi-compound molecules. But thats pretty complicated and most of my
> molecules have only one compound. But I think that the average value
> of the single compounds should be preferred for descriptors like
> molecular weight or logp.
>
> Kind regards,
> Martin
>
> P.S.: Sorry, If I missed existing discussions/documentation on this
> issue, I had some problems to denominate (and therefore google) this
> issue.
>
> --
> Dipl-Inf. Martin Gütlein
> Phone:
> +49 (0)761 203 8442 (office)
> +49 (0)177 623 9499 (mobile)
> Email:
> [email protected]
>
> ------------------------------------------------------------------------------
> Introducing Performance Central, a new site from SourceForge and
> AppDynamics. Performance Central is your source for news, insights,
> analysis and resources for efficient Application Performance Management.
> Visit us today!
> http://pubads.g.doubleclick.net/gampad/clk?id=48897511&iu=/4140/ostg.clktrk
> _______________________________________________
> Cdk-user mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/cdk-user
------------------------------------------------------------------------------
Introducing Performance Central, a new site from SourceForge and
AppDynamics. Performance Central is your source for news, insights,
analysis and resources for efficient Application Performance Management.
Visit us today!
http://pubads.g.doubleclick.net/gampad/clk?id=48897511&iu=/4140/ostg.clktrk
_______________________________________________
Cdk-user mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/cdk-user