Hello John, thanks for your answer. I ran a quick comparison between CDK and PubChem, with a few hand-picked molecules. These are the results: https://docs.google.com/spreadsheets/d/1yl3b05W319ZQW5K9TZf0iMYHbPoJyP5BV8QMLhf1kLE/edit?usp=sharing
I split the molecules in four subsets. The first comprises seemingly non-problematic molecules: carboxylic acids, amines, aliphatic esters, aliphatic ethers. In these cases CDK, PubChem and my own intuition are all in agreement. The second subset comprises molecules where I think CDK is wrong and PubChem is correct: phenols. This is due to the issue that you corrected in the branch you linked. The third subset comprises molecules where I think CDK is correct and PubChem is wrong: aromatic ethers, amides, nitro compounds. In the case of aromatic ethers, we know CDK explicitly introduces a correction to exclude aromatic ether oxygens from the HB acceptors count. I am not a specialist, but I understand there are sound reasons to make this exception. PubChem doesn't seem to implement it. In the case of amides and nitro compounds I don't quite understand what is going on with PubChem, but CDK's answer seems the correct one to me. The last subset comprises aromatic esters (acyloxy substituents). I honestly don't know what is correct in this case. Are oxygen atoms from aromatic esters also an exception, just as those from aromatic ethers? That would mean CDK is right. Otherwise, another correction is needed to make sure CDK excludes no oxygens on aromatic rings other than those of ethers. El mar, 23 de nov. de 2021 a la(s) 04:27, John Mayfield ( john.wilkinson...@gmail.com) escribió: > Thanks for your email. I've always thought the CDK HBond acceptor/donor > code is a little wonky and needs investigating. I don't have time to look > deeply at it but yes my reading of this is it doesn't check for the ether > oxygen correctly. If someone was inclined checking CDK's (and RDKit's) > values with PubChem would be a quick project that may provide some insight > onto missed cases and disagreements. > > I've made a change here to get the correct value for phenol: > https://github.com/cdk/cdk/compare/bug/hbondacceptor?expand=1 > > On Fri, 15 Oct 2021 at 11:27, Guillermo Restrepo < > guillermo.restr...@mis.mpg.de> wrote: > >> We are working with some descriptors taken from Reaxys database, which >> according to its owner are computed using your CDK library. We found >> something unexpected and would very much appreciate it if you could help >> us to understand. >> >> We noted that some phenols are reported as having 0 hydrogen bond >> acceptors, whereas we expected them to have at least one. We checked CDK >> source code and found this comment on HBondAcceptorCountDescriptor.java: >> >> The following groups are counted as hydrogen bond acceptors: >> - any oxygen where the formal charge of the oxygen is non-positive (i.e. >> formal charge <= 0) except >> - an aromatic ether oxygen (i.e. an ether oxygen that is adjacent >> to at least one aromatic carbon) >> - an oxygen that is adjacent to a nitrogen >> - any nitrogen where the formal charge of the nitrogen is non-positive >> (i.e. formal charge <= 0) except >> - a nitrogen that is adjacent to an oxygen >> >> The way we understood it, this means that phenols should have at least >> one hydrogen bond acceptor. But further down in the same file, these >> lines seem to specify otherwise: >> >> // looking for suitable oxygen atoms >> else if (atom.getAtomicNumber() == IElement.O && >> atom.getFormalCharge() <= 0) { >> //excluding oxygens that are adjacent to a nitrogen or >> to an aromatic carbon >> List<IBond> neighbours = ac.getConnectedBondsList(atom); >> for (IBond bond : neighbours) { >> IAtom neighbor = bond.getOther(atom); >> if (neighbor.getAtomicNumber() == IElement.N || >> (neighbor.getAtomicNumber() == IElement.C && >> neighbor.isAromatic() && >> bond.getOrder() != IBond.Order.DOUBLE)) >> continue atomloop;; >> } >> hBondAcceptors++; >> } >> >> Is this intended, or is it a bug, or are we misunderstanding something? >> >> >> >> _______________________________________________ >> Cdk-user mailing list >> Cdk-user@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/cdk-user >> > _______________________________________________ > Cdk-user mailing list > Cdk-user@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/cdk-user > -- *Andrés Bernal* *Área de Ciencias Básicas y Modelado* *Profesor Asociado* Ext. 1705 andresf.bern...@utadeo.edu.co Dirección Utadeo: Carrera 4 # 22-61 *ADVERTENCIA SOBRE CONFIDENCIALIDAD* Las opiniones expresadas en el presente mensaje no representan necesariamente la opinión oficial de La Universidad de Bogotá Jorge Tadeo lozano. La información contenida en este correo electrónico, incluyendo sus anexos, está dirigida exclusivamente a su destinatario y puede contener datos de carácter confidencial protegidos por la ley. Si usted no es el destinatario de este mensaje por favor infórmenos y elimínelo a la mayor brevedad. Cualquier retención, difusión, distribución, divulgación o copia de éste mensaje es prohibida y será sancionada por la ley. Este mensaje ha sido sometido a programas antivirus. No obstante, La Universidad de Bogotá Jorge Tadeo lozano no asume ninguna responsabilidad por eventuales daños generados por el recibo y uso de este material, siendo responsabilidad del destinatario verificar con sus propios medios de la existencia de virus u otros defectos. *WARNING ABOUT CONFIDENTIAL INFORMATION* The opinions expressed herein do not necessarily reflect the positions of the Universidad de Bogotá Jorge Tadeo Lozano. The information contained in this electronic mail and attachments is confidential and intended only for the use of the individual or entity to whom it is addressed and may have confidential data. If you are not the intended recipient, you are hereby notified that any disclosure, copying, distribution, or any other use of the information is strictly prohibited and has legal repercussions. Therefore, if you have received this document by mistake, please notify the sender immediately and destroy this document and attachments without making any copy of any kind. This message has been tested by antivirus software. Nonetheless, the Universidad de Bogotá Jorge Tadeo Lozano assumes no liability for any damages or loss of any kind that might arise from the use of, misuse of, or the inability to use the materials contained on this electronic message. It is the responsibility of the recipient to verify by his own means the presence of a virus or any other harmful components, defects or errors. -- **ADVERTENCIA SOBRE CONFIDENCIALIDAD** Las opiniones expresadas en el presente mensaje no representan necesariamente la opinión oficial de La Universidad de Bogotá Jorge Tadeo lozano. La información contenida en este correo electrónico, incluyendo sus anexos, está dirigida exclusivamente a su destinatario y puede contener datos de carácter confidencial protegidos por la ley. Si usted no es el destinatario de este mensaje por favor infórmenos y elimínelo a la mayor brevedad. Cualquier retención, difusión, distribución, divulgación o copia de éste mensaje es prohibida y será sancionada por la ley. Este mensaje ha sido sometido a programas antivirus. No obstante, La Universidad de Bogotá Jorge Tadeo lozano no asume ninguna responsabilidad por eventuales daños generados por el recibo y uso de este material, siendo responsabilidad del destinatario verificar con sus propios medios de la existencia de virus u otros defectos. **WARNING ABOUT CONFIDENTIAL INFORMATION** The opinions expressed herein do not necessarily reflect the positions of the Universidad de Bogotá Jorge Tadeo Lozano. The information contained in this electronic mail and attachments is confidential and intended only for the use of the individual or entity to whom it is addressed and may have confidential data. If you are not the intended recipient, you are hereby notified that any disclosure, copying, distribution, or any other use of the information is strictly prohibited and has legal repercussions. Therefore, if you have received this document by mistake, please notify the sender immediately and destroy this document and attachments without making any copy of any kind. This message has been tested by antivirus software. Nonetheless, the Universidad de Bogotá Jorge Tadeo Lozano assumes no liability for any damages or loss of any kind that might arise from the use of, misuse of, or the inability to use the materials contained on this electronic message. It is the responsibility of the recipient to verify by his own means the presence of a virus or any other harmful components, defects or errors.
_______________________________________________ Cdk-user mailing list Cdk-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/cdk-user