Hi John,
On Thu, Aug 22, 2013 at 6:39 PM, John May <[email protected]> wrote:
> Hi Martin,
>
> My application is building QSAR models. I am not a chemist, but my
> feeling is that the clean but complicated solution would be to have
> 'set-valued features' (a set of values instead of a single value) for
> multi-compound molecules.
>
>
> The trouble is one has to detect the and split into multiple molecules
> in the first place. The existing code (ConnectivityChecker) to do this is
> sub-optimal and generally a burden. The general algorithm is actually
> linear but the CDK version is quadratic time.
I do not think that time is much of an issue.
For model building and validation I have to compute the features only
once, and I dont think that the model will be applied to large sets of
multi-compound molecules later on.
>
> I'm not sure if this should be built in functionality but a simple
> adapter would allow you to choose whether to combined the values
> to an average or return them as a set.
Hmm, but I would have to decide that myself for each descriptor in the CDK?
That would be quite some effort, and I do not think that I would be
able to do that myself.
IMHO it should be a built in functionality for each descriptor, but I
dont know if that is easy to decide for each descriptor (e.g. for
molecular weight it may for many applications be correct to sum up the
weights of the single compounds, but for a QSAR model, the mean
molecular weight sounds more appropriate to me).
Kind regards,
Martin
>
> class DisconnectedDescriptorAdapter implements IMolecularDescriptor {
>
> IMolecularDescriptor delegate;
>
> DisconnectedDescriptorAdapter(IMolecularDescriptor delegate) {
> this.delegate = delegate;
> }
>
> public Set<DescriptorValue> calculate(IAtomContainer container) {
> IAtomContainerSet ms =
> ConnectivityChecker.partitionIntoMolecules(container);
> Set<DescriptorValue> values = new HashSet<DescriptorValue>();
>
> for (IAtomContainer m : ms) {
> values.add(delegate.calculate(containers.getAtomContainer(i)));
> }
>
> return values;
> }
> }
>
>
>
>
> J
>
> On 22 Aug 2013, at 17:14, Martin Guetlein <[email protected]>
> wrote:
>
> Hi,
>
> How do CDK descriptors handle molecules with multiple compounds in it?
>
> I experimented a bit, and found out that it depends on the descriptor:
> * most descriptors apparently just add up the values of the single
> compounds (like xlogp, that does make no sense does it?)
> * some fail for multi-compound molecules
> * some compute sth else
>
> My application is building QSAR models. I am not a chemist, but my
> feeling is that the clean but complicated solution would be to have
> 'set-valued features' (a set of values instead of a single value) for
> multi-compound molecules. But thats pretty complicated and most of my
> molecules have only one compound. But I think that the average value
> of the single compounds should be preferred for descriptors like
> molecular weight or logp.
>
> Kind regards,
> Martin
>
> P.S.: Sorry, If I missed existing discussions/documentation on this
> issue, I had some problems to denominate (and therefore google) this
> issue.
>
> --
> Dipl-Inf. Martin Gütlein
> Phone:
> +49 (0)761 203 8442 (office)
> +49 (0)177 623 9499 (mobile)
> Email:
> [email protected]
>
> ------------------------------------------------------------------------------
> Introducing Performance Central, a new site from SourceForge and
> AppDynamics. Performance Central is your source for news, insights,
> analysis and resources for efficient Application Performance Management.
> Visit us today!
> http://pubads.g.doubleclick.net/gampad/clk?id=48897511&iu=/4140/ostg.clktrk
> _______________________________________________
> Cdk-user mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/cdk-user
>
>
--
Dipl-Inf. Martin Gütlein
Phone:
+49 (0)761 203 8442 (office)
+49 (0)177 623 9499 (mobile)
Email:
[email protected]
------------------------------------------------------------------------------
Introducing Performance Central, a new site from SourceForge and
AppDynamics. Performance Central is your source for news, insights,
analysis and resources for efficient Application Performance Management.
Visit us today!
http://pubads.g.doubleclick.net/gampad/clk?id=48897511&iu=/4140/ostg.clktrk
_______________________________________________
Cdk-user mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/cdk-user