Hi Jenke,

I have put together a small gist showing a slightly hacky way to round-trip a molecule containing dummy atoms through a PDB block (assuming that your molecules do not contain astatine). If your dummy atoms are called "DU" rather than " *", you may just change the replace() expression with something that fits your needs.


HTH, cheers

p.


On 10/30/19 12:06, SCHEEN Jenke wrote:
Hi RDKitters,

I'm trying to use rdkit to generate molecular fingerprints (such as AP or ECFP) on molecules that have non-interactive pseudoatoms ('dummy atoms', denoted by Du). I attached a sample PDB file containing the dummy atoms on positions 21-24. Reading this file (Chem.rdmolfiles.MolFromPDBFile("test.pdb", sanitize=False) throws a post-condition violation because the element 'Du' isn't recognised, which makes sense. I've been searching online and haven't been able to find any workarounds, do you have any suggestions?

Some notes:

  * I'm hoping that once rdkit is able to read in the pdb file the mol
    object can be parsed without the FP constructor (e.g.
    AllChem.GetMorganFingerprint) complaining.
  * The use of the term dummy atoms here should not be confused with
    the dummy atoms depiction in fragmentising molecules in rdkit
    (where * is the smiles notation).
  * For this project all I aim to do is generate structural
    fingerprints for these types of ligands. This means I won't have
    to worry about defining chemical properties to Du.
  * The context for this issue is that we're aiming to featurise the
    ligands for an ML protocol where the dummy atoms are one of the
    major descriptors of the problem.

  * I thought manually inserting a 119th element in atomic_data.cpp
    might resolve the issue but I've been unable to locate the file in
    my conda installation.
  * The ODDT python API seems to parse the Du element without any
    issues but is limited in its FP generator diversity.


Best,

Jenke

The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.




_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to