Hi John, > On Jun 25, 2025, at 10:15, John Mayfield <john.wilkinson...@gmail.com> wrote: > Even if you don't add any listeners there is an overhead of dispatching the > edit events so it is better to avoid this.
I will, to use language I learned from WWII submarine fiction, rig for silent running. > Molecule Standard Form > > We (CDK) try to impose very little automation/sanitisation by default, rather > than Daylight's dt_mod on/off and RDKit's sanitization it is more similar to > OEChem in that the molecule comes out of the readers as they were described > in the input. I can appreciate that. As I recall (it's been years since I looked at the OEChem docs), the OEChem docs listed the recommended set of operations for those using the low-level API. For example, my code used to do OEParseSmiles(mol, content, canon, strict) OEAssignAromaticFlags(mol, aromaticity_model) They later added a single function call variant: OEReadMolFromBytes(mol, oeformat, flavor, gzip, content) which handles the appropriate steps. This simplified my code as I don't need that flexibility. > We go a little further and don't even do ring perception (is in ring: > true/false). Most common formats (SMILES/MOLfile/InChI/CML) will set the > hydrogen counts for you but some older formats (PDB/XYZ) will not. Is there documentation for the needed steps? I want to make sure I support the primary formats correctly. As for the less common formats, when I added CDK support back in 2021 I tried to support the XYZ format, but ended up noting "I can't figure out how to read an XYZ file and assign the correct bond types (RebondTool only assigns single bonds and FixBondOrdersTool doesn't add them." I also noted "can't get mol2 to create a SMILES so only do basic tests". That said, I don't think mol2 or XYZ format support is all that useful. I haven't come anyone using either format for a long time. As I recall, Greg Landrum's viewpoint is that people should use Open Babel to convert to a more mainstream format. There are also readers I don't even touch, like Mopac7Reader or ShelXReader. :) > A Pattern for matching a single SMARTS query against multiple target > compounds. The class can be used for efficiently matching many queries > against a single target if setPrepare(boolean) is disabled > (prepare(IAtomContainer)) should be called manually once for each molecule. Yes, now that I know what I'm looking at, I can see that getBitFingerprint() for both PubchemFingerprinter and MACCSFingerprinter call: SmartsPattern.prepare(container); If I follow the code correctly this means SMARTS-based fingerprinting always triggers aromaticity re-perception. For example, if I use the same molecule to generate both MACCS and Pubchem fingerprints then both will do: Cycles.markRingAtomsAndBonds(target); Aromaticity.apply(Aromaticity.Model.Daylight, target); even if input processing has already done this step. It also means that if input processing uses a different model, like Aromaticity.Model.Mdl (picking one available from that class), then I need to pass a copy to the fingerprinter if I don't want the assignments to possibly change. > If you have multiple patterns to match what you want to do is something like > this: > > 0. patterns <- load SMARTS/prepare patterns, set prepare false > 1. Read Molecule (mol) > 2. Set ring flags > 3. Set aromaticity > 4. for pat in patterns: pat.match(mol) > > Steps 2/3 can be replaced with prepare, if you have pre-calculated and store > aromaticity (e.g. in SMILES) then you can skip step 3 as the input > aromaticity flags will be preserved. Because of the chemfp design, my input reader doesn't know if the created molecules will be used for fingerprinting or for format conversion, so I need to alway do 2 and 3. I also don't have a way to distinguish between the built-in CDK fingerprint types which always prepare, and my own fingerprint types which expect prepared molecules. I think this means, at least for chemfp, that I should always prepare the molecules as I read them, using the Daylight model, so that my own fingerprint types can assume the inputs are always properly prepared. > Sorry I meant if you knew the steps to reproduce/which aromaticity model did > you use..? The standard Daylight model used by the SMARTS matcher would find > the externeral porphyrin ring aromatic hence I'm not sure how you would get > that unless you used a different aromaticity model (e.g. tighter ring set) > before writing to SMILES. The problem is that I didn't use any explicit aromaticity perception. Here's my reproducible: ============= import jpype # Must install JPype to interface to the CDK jar import jpype.imports # configure the import hooks import jpype.nio jpype.startJVM(None, '-Djava.awt.headless=true') from org.openscience import cdk from org.openscience.cdk.smiles import ( SmilesParser, SmilesGenerator, SmiFlavor) smiles = ( "OCCO[P+]1(OCCO)n2c3ccc2/C(c2ccccc2)=C2/C=CC(=N2)/C(c2ccccc2)" "=c2/cc/c(n21)=C(\c1ccccc1)C1=NC(=C3c2ccccc2)C=C1 CHEMBL2369103") _default_builder = cdk.DefaultChemObjectBuilder.getInstance() smiles_parser = SmilesParser(_default_builder) mol = smiles_parser.parseSmiles(smiles) if 0: # Missing perception from org.openscience.cdk.graph import Cycles from org.openscience.cdk.aromaticity import Aromaticity Cycles.markRingAtomsAndBonds(mol) Aromaticity.apply(Aromaticity.Model.Daylight, mol) for flavor_name, flavor in ( ("Default", SmiFlavor.Default), ("Default|UseAromaticSymbols", SmiFlavor.Default | SmiFlavor.UseAromaticSymbols), ): smiles_generator = SmilesGenerator(flavor) out_smiles = str(smiles_generator.create(mol)) print(f"-- {flavor_name}:") print(out_smiles) print() ============= The above prints -- Default: OCCO[P+]1(OCCO)N2C3=CC=C2/C(/C4=CC=CC=C4)=C\5/C=CC(=N5)C(C6=CC=CC=C6)=C7C=CC(N71)=C(C8=CC=CC=C8)C9=NC(=C3C%10=CC=CC=C%10)C=C9 -- Default|UseAromaticSymbols: OCCO[P+]1(OCCO)n2c3ccc2/C(/c4ccccc4)=C\5/C=CC(=N5)C(c6ccccc6)=c7ccc(n71)=C(c8ccccc8)C9=NC(=C3c%10ccccc%10)C=C9 With the missing perception step enabled (change the "if 0:" to "if 1:") then I get what I expected from using CDK Depict. -- Default: OCCO[P+]1(OCCO)N2C3=CC=C2/C(/C4=CC=CC=C4)=C\5/C=CC(=N5)C(C6=CC=CC=C6)=C7C=CC(N71)=C(C8=CC=CC=C8)C9=NC(=C3C%10=CC=CC=C%10)C=C9 -- Default|UseAromaticSymbols: OCCO[P+]1(OCCO)n2c3ccc2c(-c4ccccc4)c5C=Cc(n5)c(-c6ccccc6)c7ccc(n71)c(-c8ccccc8)c9nc(c3-c%10ccccc%10)C=C9 > Hopefully that covers everything but let me know if you have any more > questions/thoughts. I think it does. Thanks! Andrew da...@dalkescientific.com _______________________________________________ Cdk-user mailing list Cdk-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/cdk-user