Hello,
I'm trying to "recreate" a template ROMol compound by editing a RWMol using a
stochastic algorithm. To guide the algorithm I would like to use Morgan
fingerprints, with the goal being to reach a Tanimoto coefficient of 1 between
the current molecule's fingerprint and a reference molecule's fingerprint.
The algorithm is capable of recreating the template compound (i.e. when the
structures are drawn they are identical). However, the Tanimoto coefficient is
far from 1. The differences in their fingerprints seem to stem from the use of
different sanitization protocols, since the template ROMol is sanitized when
being build while the RWMol is not.
Below is a C++ code snippet to illustrate this:
// Template molecule
RDKit::ROMol* full_mol = RDKit::SmilesToMol("C1OCC(S1)C1=NC=NC=N1");
RDKit::SparseIntVect<unsigned>* fp1 =
RDKit::MorganFingerprints::getFingerprint(*full_mol, 2);
// Components to rebuild the molecule
RDKit::ROMol* frag1 = RDKit::SmilesToMol("C1=NC=NC=N1");
RDKit::ROMol* frag2 = RDKit::SmilesToMol("C1CSCO1");
// Rebuilt molecule
RDKit::ROMol* combined_romol = RDKit::combineMols(*frag1, *frag2);
RDKit::RWMol combined_rwmol (*combined_romol);
combined_rwmol.addBond(2, 7, RDKit::Bond::SINGLE);
// ERROR: Invariant violation
// RDKit::SparseIntVect<unsigned>* fp2 =
RDKit::MorganFingerprints::getFingerprint(combined_rwmol, 2);
// Barebone sanitization allows FP creation but yields wrong fingerprint
RDKit::MolOps::symmetrizeSSSR(combined_rwmol);
RDKit::SparseIntVect<unsigned>* fp2 =
RDKit::MorganFingerprints::getFingerprint(combined_rwmol, 2);
double tc = RDKit::TanimotoSimilarity(*fp1, *fp2);
std::cout << "Tc with minimal sanitization: " << tc << std::endl;
// Full sanitization yields the correct fingerprint
RDKit::MolOps::sanitizeMol(combined_rwmol);
fp2 = RDKit::MorganFingerprints::getFingerprint(combined_rwmol, 2);
tc = RDKit::TanimotoSimilarity(*fp1, *fp2);
std::cout << "Tc with full sanitization: " << tc << std::endl;
I'm having trouble sanitizing the RWMol that is being manipulated since it has
pseudoatoms (atomic number = 0), which seems to be a problem for kekulization.
Currently I'm only sanitizing it the bare minimum with symmetrizeSSSR so that
the fingerprint function may run. However if I only do this step the
fingerprints are not equivalent. Ideally I would like to skip sanitization
altogether since the molecule is constantly changing and I'm not interested in
having a "chemically sound" molecule up until the very last moment.
Nonetheless, I would like the similarity coefficient to reach 1.
Why are fingerprints different depending on whether the molecule was sanitized
or not? Is there any way to circumvent the need for sanitization perhaps by
using a different kind of fingerprint?
Best regards,
Alan
_______________________________________________
Rdkit-discuss mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss