Dear RDKit Community, I am looking for a way to use MCS module in RDKit to compare atoms and bonding of two molecules which will also take under consideration the hybridization of an atom. The solution to similar problem was suggested before, (Inspired by this RDKit-discuss thread started by Liz Wylie: http://www.mail-archive.com/[email protected]/msg03676.html and see here http://sourceforge.net/p/rdkit/mailman/message/31830412/ )
but even if it is computationally correct it does not necessarily mirror some nuances of chemistry and one may want to modify it in certain specific cases. While it works most of the time for cases like those proposed in the solution of Liz Wylie case: smis = ['CC(C)=C','CC(C)C'] or smis2 = ['CC(C)=C','CC(C)=N'] If we check if 'CCC' substructure is present in molecules from those two data sets upon implementation of Greg Landrum solution to CCC will be found only in 'CC(C)C', taking in to the account the atoms, the bonding and the hybridization of the atoms. It is all correct and cool! But let's look at the other example: Let's look for the N\CC\N substructure in 'C\C=C\NCCN\C=C\C' or the 'NCN' substructure in NCN-C=C or ' C=CNCNC=C'. It will not be found there even if "structurally speaking" it is there. The problem is as follows: an electronegative atom next to a C=C bond will pull electron density from that bond and so the N-C bond in NCN-C=C will have a ‘bit of’ double bond character, even if technically it is a single bond. The current solution to the Liz Wylie problem does not ignore that and distinguishes between regular N-C bond and an N-C bond next to C=C bond (like in NCN-C=C, because of that it will not find NCN in this structure). NCS in NCSC=C is matched because the S bond is more electropositive than N or O and so does not have that double-bond character. My question to the RDKit community is: How to modify Greg Landrum solution to Liz Wylie case to successfully match such cases I mentioned above, while still retaining the hybridization check (we do want to have hybridization match, we just want the bonding to be more important). The problem is that the atoms that are not matched like the N atoms above have sp2 hybridization but technically are bonded by single bonds from all sides. Thanks a lot for your help, time and consideration. This is my first post on RDKit forum, I am new to RDKit and python in general, so I apologize if I anything is not clear. I would really appreciate your help! Best regards, Janusz Petkowski
------------------------------------------------------------------------------
_______________________________________________ Rdkit-discuss mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

