Hi Martin, All,

On 22 August 2013 23:55, Martin Guetlein <[email protected]>wrote:

> On Thu, Aug 22, 2013 at 6:28 PM, Rajarshi Guha <[email protected]>
> wrote:
> > Do you mean dot connected compounds? In that sense, most (if not all)
> QSAR
> > descriptors should be evaluating descriptors for individual components
> > separately. After that what they do depends on the application - if we're
> > talking about salt forms, probably drop the salt components.
> Alternatively,
> > if we're talking about mixtures (which is not really the case for a dot
> > connected representation), there could be various ways to generate a
> mixture
> > descriptor
>
> Hi Rajarshi,
>
> Yep, in smiles representation its dot connected compounds.
> In our data these compounds are mostly salts/ions, some mixtures and
> some isomeres.
> So in more detail, how would you compute the the descriptor values for
> the mixtures?
>

It depends. Calculating properties of mixtures is a science of its own.
Properties of salts could be different to those of parent compounds. For
isomers getting mean is just one of the options.

On the other side, it depends on the descriptors as well. A single
fingerprint value (or structure alert) for the entire (dot-connected)
compound is perfectly fine in most cases. Simple examples where one value
is valid are the molecular mass or atoms/bonds count. I'm sure the chemists
on this list will come with more examples :)

My preferred solution actually would be to introduce interfaces, specifying
if a descriptor accepts a disconnected structure, or not. If it does
accept, then it might returns a single value (or set of values - this means
a second interface specifying the return value) . If not, then an exception
is thrown, and it is the application responsibility to take the proper
action (e.g. most commercial software does some be kind of standardization,
splitting salts, etc.).

Best regards,
Nina



> For isomeres, using the mean value should be fine, what do you think?
>
> Kind regards,
> Martin
>
>
>
> >
> >
> > On Thu, Aug 22, 2013 at 12:14 PM, Martin Guetlein
> > <[email protected]> wrote:
> >>
> >> Hi,
> >>
> >> How do CDK descriptors handle molecules with multiple compounds in it?
> >>
> >> I experimented a bit, and found out that it depends on the descriptor:
> >> * most descriptors apparently just add up the values of the single
> >> compounds (like xlogp, that does make no sense does it?)
> >> * some fail for multi-compound molecules
> >> * some compute sth else
> >>
> >> My application is building QSAR models. I am not a chemist, but my
> >> feeling is that the clean but complicated solution would be to have
> >> 'set-valued features' (a set of values instead of a single value) for
> >> multi-compound molecules. But thats pretty complicated and most of my
> >> molecules have only one compound. But I think that the average value
> >> of the single compounds should be preferred for descriptors like
> >> molecular weight or logp.
> >>
> >> Kind regards,
> >> Martin
> >>
> >> P.S.: Sorry, If I missed existing discussions/documentation on this
> >> issue, I had some problems to denominate (and therefore google) this
> >> issue.
> >>
> >> --
> >> Dipl-Inf. Martin Gütlein
> >> Phone:
> >> +49 (0)761 203 8442 (office)
> >> +49 (0)177 623 9499 (mobile)
> >> Email:
> >> [email protected]
> >>
> >>
> >>
> ------------------------------------------------------------------------------
> >> Introducing Performance Central, a new site from SourceForge and
> >> AppDynamics. Performance Central is your source for news, insights,
> >> analysis and resources for efficient Application Performance Management.
> >> Visit us today!
> >>
> >>
> http://pubads.g.doubleclick.net/gampad/clk?id=48897511&iu=/4140/ostg.clktrk
> >> _______________________________________________
> >> Cdk-user mailing list
> >> [email protected]
> >> https://lists.sourceforge.net/lists/listinfo/cdk-user
> >
> >
> >
> >
> > --
> > Rajarshi Guha | http://blog.rguha.net
> > NIH Center for Advancing Translational Science
>
>
>
> --
> Dipl-Inf. Martin Gütlein
> Phone:
> +49 (0)761 203 8442 (office)
> +49 (0)177 623 9499 (mobile)
> Email:
> [email protected]
>
>
> ------------------------------------------------------------------------------
> Introducing Performance Central, a new site from SourceForge and
> AppDynamics. Performance Central is your source for news, insights,
> analysis and resources for efficient Application Performance Management.
> Visit us today!
> http://pubads.g.doubleclick.net/gampad/clk?id=48897511&iu=/4140/ostg.clktrk
> _______________________________________________
> Cdk-user mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/cdk-user
>
------------------------------------------------------------------------------
Introducing Performance Central, a new site from SourceForge and 
AppDynamics. Performance Central is your source for news, insights, 
analysis and resources for efficient Application Performance Management. 
Visit us today!
http://pubads.g.doubleclick.net/gampad/clk?id=48897511&iu=/4140/ostg.clktrk
_______________________________________________
Cdk-user mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/cdk-user

Reply via email to