Dear RDKit Community,

I am looking for a way to use MCS module in RDKit to compare atoms and bonding 
of two molecules which will also take under consideration the hybridization of 
an atom.
The solution to similar problem was suggested before, (Inspired by this 
RDKit-discuss thread started by Liz Wylie: 
http://www.mail-archive.com/[email protected]/msg03676.html 
and see here http://sourceforge.net/p/rdkit/mailman/message/31830412/ )

but even if it is computationally correct it does not necessarily mirror some 
nuances of chemistry and one may want to modify it in certain specific cases.
While it works most of the time for cases like those proposed in the solution 
of Liz Wylie case:

smis = ['CC(C)=C','CC(C)C']
 or

smis2 = ['CC(C)=C','CC(C)=N']
 If we check if 'CCC' substructure is present in molecules from those two data 
sets upon implementation of Greg Landrum solution to CCC will be found only in  
'CC(C)C', taking in to the account the atoms, the bonding and the hybridization 
of the atoms. It is all correct and cool!

But let's look at the other example:
Let's look for the N\CC\N substructure in 'C\C=C\NCCN\C=C\C' or the 'NCN' 
substructure in NCN-C=C or ' C=CNCNC=C'. It will not be found there even if 
"structurally speaking" it is there.
The problem is as follows:  an electronegative atom next to a C=C bond will 
pull electron density from that bond and so the N-C bond in NCN-C=C will have a 
‘bit of’ double bond character, even if technically it is a single bond. The 
current solution to the Liz Wylie problem does not ignore that and 
distinguishes between regular N-C bond and an N-C bond next to C=C bond (like 
in NCN-C=C, because of that it will not find NCN in this structure). NCS in 
NCSC=C is matched because the S bond is more electropositive than N or O and so 
does not have that double-bond character. My question to the RDKit community 
is: How to modify Greg Landrum solution to Liz Wylie case to successfully match 
such cases I mentioned above, while still retaining the hybridization check (we 
do want to have hybridization match, we just want the bonding to be more 
important). The problem is that the atoms that are not matched like the N atoms 
above have sp2 hybridization but technically are bonded by single bonds from 
all sides.
Thanks a lot for your help, time and consideration. This is my first post on 
RDKit forum, I am new to RDKit and python in general, so I apologize if I 
anything is not clear.
I would really appreciate your help!

Best regards,

Janusz Petkowski
------------------------------------------------------------------------------
_______________________________________________
Rdkit-discuss mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to