Hi RDKitters, I'm trying to use rdkit to generate molecular fingerprints (such as AP or ECFP) on molecules that have non-interactive pseudoatoms ('dummy atoms', denoted by Du). I attached a sample PDB file containing the dummy atoms on positions 21-24. Reading this file (Chem.rdmolfiles.MolFromPDBFile("test.pdb", sanitize=False) throws a post-condition violation because the element 'Du' isn't recognised, which makes sense. I've been searching online and haven't been able to find any workarounds, do you have any suggestions?
Some notes: * I'm hoping that once rdkit is able to read in the pdb file the mol object can be parsed without the FP constructor (e.g. AllChem.GetMorganFingerprint) complaining. * The use of the term dummy atoms here should not be confused with the dummy atoms depiction in fragmentising molecules in rdkit (where * is the smiles notation). * For this project all I aim to do is generate structural fingerprints for these types of ligands. This means I won't have to worry about defining chemical properties to Du. * The context for this issue is that we're aiming to featurise the ligands for an ML protocol where the dummy atoms are one of the major descriptors of the problem. * I thought manually inserting a 119th element in atomic_data.cpp might resolve the issue but I've been unable to locate the file in my conda installation. * The ODDT python API seems to parse the Du element without any issues but is limited in its FP generator diversity. Best, Jenke The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.
test.pdb
Description: test.pdb
_______________________________________________ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss