Hi Martin,

> My application is building QSAR models. I am not a chemist, but my
> feeling is that the clean but complicated solution would be to have
> 'set-valued features' (a set of values instead of a single value) for
> multi-compound molecules.


The trouble is one has to detect the and split into multiple molecules
in the first place. The existing code (ConnectivityChecker) to do this is
sub-optimal and generally a burden. The general algorithm is actually 
linear but the CDK version is quadratic time.

I'm not sure if this should be built in functionality but a simple
adapter would allow you to choose whether to combined the values
to an average or return them as a set. 

class DisconnectedDescriptorAdapter implements IMolecularDescriptor {  
 
    IMolecularDescriptor delegate;
 
    DisconnectedDescriptorAdapter(IMolecularDescriptor delegate) {
        this.delegate = delegate;
    }
    
    public Set<DescriptorValue> calculate(IAtomContainer container) {
        IAtomContainerSet    ms     = 
ConnectivityChecker.partitionIntoMolecules(container);
        Set<DescriptorValue> values = new HashSet<DescriptorValue>();
 
        for (IAtomContainer m : ms) {
            values.add(delegate.calculate(containers.getAtomContainer(i)));
        }
 
        return values;
    }
}



J

On 22 Aug 2013, at 17:14, Martin Guetlein <[email protected]> 
wrote:

> Hi,
> 
> How do CDK descriptors handle molecules with multiple compounds in it?
> 
> I experimented a bit, and found out that it depends on the descriptor:
> * most descriptors apparently just add up the values of the single
> compounds (like xlogp, that does make no sense does it?)
> * some fail for multi-compound molecules
> * some compute sth else
> 
> My application is building QSAR models. I am not a chemist, but my
> feeling is that the clean but complicated solution would be to have
> 'set-valued features' (a set of values instead of a single value) for
> multi-compound molecules. But thats pretty complicated and most of my
> molecules have only one compound. But I think that the average value
> of the single compounds should be preferred for descriptors like
> molecular weight or logp.
> 
> Kind regards,
> Martin
> 
> P.S.: Sorry, If I missed existing discussions/documentation on this
> issue, I had some problems to denominate (and therefore google) this
> issue.
> 
> -- 
> Dipl-Inf. Martin Gütlein
> Phone:
> +49 (0)761 203 8442 (office)
> +49 (0)177 623 9499 (mobile)
> Email:
> [email protected]
> 
> ------------------------------------------------------------------------------
> Introducing Performance Central, a new site from SourceForge and 
> AppDynamics. Performance Central is your source for news, insights, 
> analysis and resources for efficient Application Performance Management. 
> Visit us today!
> http://pubads.g.doubleclick.net/gampad/clk?id=48897511&iu=/4140/ostg.clktrk
> _______________________________________________
> Cdk-user mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/cdk-user

------------------------------------------------------------------------------
Introducing Performance Central, a new site from SourceForge and 
AppDynamics. Performance Central is your source for news, insights, 
analysis and resources for efficient Application Performance Management. 
Visit us today!
http://pubads.g.doubleclick.net/gampad/clk?id=48897511&iu=/4140/ostg.clktrk
_______________________________________________
Cdk-user mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/cdk-user

Reply via email to