Hello John, thanks for your answer. I ran a quick comparison between CDK
and PubChem, with a few hand-picked molecules. These are the results:
https://docs.google.com/spreadsheets/d/1yl3b05W319ZQW5K9TZf0iMYHbPoJyP5BV8QMLhf1kLE/edit?usp=sharing

I split the molecules in four subsets. The first comprises seemingly
non-problematic molecules: carboxylic acids, amines, aliphatic esters,
aliphatic ethers. In these cases CDK, PubChem and my own intuition are all
in agreement.

The second subset comprises molecules where I think CDK is wrong and
PubChem is correct: phenols. This is due to the issue that you corrected in
the branch you linked.

The third subset comprises molecules where I think CDK is correct and
PubChem is wrong: aromatic ethers, amides, nitro compounds. In the case of
aromatic ethers, we know CDK explicitly introduces a correction to exclude
aromatic ether oxygens from the HB acceptors count. I am not a specialist,
but I understand there are sound reasons to make this exception. PubChem
doesn't seem to implement it. In the case of amides and nitro compounds I
don't quite understand what is going on with PubChem, but CDK's answer
seems the correct one to me.

The last subset comprises aromatic esters (acyloxy substituents). I
honestly don't know what is correct in this case. Are oxygen atoms from
aromatic esters also an exception, just as those from aromatic ethers? That
would mean CDK is right. Otherwise, another correction is needed to make
sure CDK excludes no oxygens on aromatic rings other than those of ethers.

El mar, 23 de nov. de 2021 a la(s) 04:27, John Mayfield (
john.wilkinson...@gmail.com) escribió:

> Thanks for your email. I've always thought the CDK HBond acceptor/donor
> code is a little wonky and needs investigating. I don't have time to look
> deeply at it but yes my reading of this is it doesn't check for the ether
> oxygen correctly. If someone was inclined checking CDK's (and RDKit's)
> values with PubChem would be a quick project that may provide some insight
> onto missed cases and disagreements.
>
> I've made a change here to get the correct value for phenol:
> https://github.com/cdk/cdk/compare/bug/hbondacceptor?expand=1
>
> On Fri, 15 Oct 2021 at 11:27, Guillermo Restrepo <
> guillermo.restr...@mis.mpg.de> wrote:
>
>> We are working with some descriptors taken from Reaxys database, which
>> according to its owner are computed using your CDK library. We found
>> something unexpected and would very much appreciate it if you could help
>> us to understand.
>>
>> We noted that some phenols are reported as having 0 hydrogen bond
>> acceptors, whereas we expected them to have at least one. We checked CDK
>> source code and found this comment on HBondAcceptorCountDescriptor.java:
>>
>> The following groups are counted as hydrogen bond acceptors:
>> - any oxygen where the formal charge of the oxygen is non-positive (i.e.
>> formal charge <= 0) except
>>        - an aromatic ether oxygen (i.e. an ether oxygen that is adjacent
>> to at least one aromatic carbon)
>>         - an oxygen that is adjacent to a nitrogen
>> - any nitrogen where the formal charge of the nitrogen is non-positive
>> (i.e. formal charge <= 0) except
>>         - a nitrogen that is adjacent to an oxygen
>>
>> The way we understood it, this means that phenols should have at least
>> one hydrogen bond acceptor. But further down in the same file, these
>> lines seem to specify otherwise:
>>
>> // looking for suitable oxygen atoms
>>              else if (atom.getAtomicNumber() == IElement.O &&
>> atom.getFormalCharge() <= 0) {
>>                  //excluding oxygens that are adjacent to a nitrogen or
>> to an aromatic carbon
>>                  List<IBond> neighbours = ac.getConnectedBondsList(atom);
>>                  for (IBond bond : neighbours) {
>>                      IAtom neighbor = bond.getOther(atom);
>>                      if (neighbor.getAtomicNumber() == IElement.N ||
>>                          (neighbor.getAtomicNumber() == IElement.C &&
>>                           neighbor.isAromatic() &&
>>                           bond.getOrder() != IBond.Order.DOUBLE))
>>                          continue atomloop;;
>>                  }
>>                  hBondAcceptors++;
>>              }
>>
>> Is this intended, or is it a bug, or are we misunderstanding something?
>>
>>
>>
>> _______________________________________________
>> Cdk-user mailing list
>> Cdk-user@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/cdk-user
>>
> _______________________________________________
> Cdk-user mailing list
> Cdk-user@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/cdk-user
>


-- 


*Andrés Bernal*
*Área de Ciencias Básicas y Modelado*
*Profesor Asociado*
Ext. 1705
andresf.bern...@utadeo.edu.co
Dirección Utadeo: Carrera 4 # 22-61



*ADVERTENCIA SOBRE CONFIDENCIALIDAD*

Las opiniones expresadas en el presente mensaje no representan
necesariamente la opinión oficial de La Universidad de Bogotá Jorge Tadeo
lozano. La información contenida en este correo electrónico, incluyendo sus
anexos, está dirigida exclusivamente a su destinatario y puede contener
datos de carácter confidencial protegidos por la ley. Si usted no es el
destinatario de este mensaje por favor infórmenos y elimínelo a la mayor
brevedad. Cualquier retención, difusión, distribución, divulgación o copia
de éste mensaje es prohibida y será sancionada por la ley.

Este mensaje ha sido sometido a programas antivirus. No obstante, La
Universidad de Bogotá Jorge Tadeo lozano no asume ninguna responsabilidad
por eventuales daños generados por el recibo y uso de este material, siendo
responsabilidad del destinatario verificar con sus propios medios de la
existencia de virus u otros defectos.

 *WARNING ABOUT CONFIDENTIAL INFORMATION*

The opinions expressed herein do not necessarily reflect the positions of
the Universidad de Bogotá Jorge Tadeo Lozano. The information contained in
this electronic mail and attachments is confidential and intended only for
the use of the individual or entity to whom it is addressed and may have
confidential data. If you are not the intended recipient, you are hereby
notified that any disclosure, copying, distribution, or any other use of
the information is strictly prohibited and has legal repercussions.
Therefore, if you have received this document by mistake, please notify the
sender immediately and destroy this document and attachments without making
any copy of any kind.

This message has been tested by antivirus software. Nonetheless, the
Universidad de Bogotá Jorge Tadeo Lozano assumes no liability for any
damages or loss of any kind that might arise from the use of, misuse of, or
the inability to use the materials contained on this electronic message. It
is the responsibility of the recipient to verify by his own means the
presence of a virus or any other harmful components, defects or errors.

-- 


**ADVERTENCIA SOBRE CONFIDENCIALIDAD**

Las opiniones expresadas en el 
presente
mensaje no representan necesariamente la opinión oficial de La 
Universidad de
Bogotá Jorge Tadeo lozano. La información contenida en este 
correo electrónico,
incluyendo sus anexos, está dirigida exclusivamente a 
su destinatario y puede
contener datos de carácter confidencial protegidos 
por la ley. Si usted no es
el destinatario de este mensaje por favor 
infórmenos y elimínelo a la mayor
brevedad. Cualquier retención, difusión, 
distribución, divulgación o copia de
éste mensaje es prohibida y será 
sancionada por la ley.

Este mensaje ha sido sometido a programas
antivirus. No obstante, La Universidad de Bogotá Jorge Tadeo lozano no 
asume
ninguna responsabilidad por eventuales daños generados por el recibo 
y uso de
este material, siendo responsabilidad del destinatario verificar 
con sus
propios medios de la existencia de virus u otros defectos.



 
**WARNING
ABOUT CONFIDENTIAL INFORMATION**



The
opinions expressed herein 
do not necessarily reflect the positions of the
Universidad de Bogotá Jorge 
Tadeo Lozano. The information contained in this
electronic mail and 
attachments is confidential and intended only for the use
of the individual 
or entity to whom it is addressed and may have confidential
data. If you 
are not the intended recipient, you are hereby notified that any
disclosure, copying, distribution, or any other use of the information is
strictly prohibited and has legal repercussions. Therefore, if you have
received this document by mistake, please notify the sender immediately and
destroy this document and attachments without making any copy of any kind.

This
message has been tested by antivirus software. Nonetheless, the 
Universidad de
Bogotá Jorge Tadeo Lozano assumes no liability for any 
damages or loss of
any kind that might arise from the use of, misuse of, or 
the inability to use
the materials contained on this electronic message. It 
is the responsibility of
the recipient to verify by his own means the 
presence of a virus or any other
harmful components, defects or errors.
_______________________________________________
Cdk-user mailing list
Cdk-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/cdk-user

Reply via email to