On 21/08/2019 05:06, Andrew Dalke wrote:
Hi all,
Someone asked me recently about finding the graph edit distance of
two small (<= 14 atom) fragments.
I figured this was something that could be brute forced. Following
SmallWorld's example at
https://cisrg.shef.ac.uk/shef2016/talks/oral13.pdf , given a fragment,
incrementally delete terminals (except the "*" connection point atom),
and ring bonds.
Unless rdkit has something, I think graph edit distance is the kind
of things for which you have to rely on a good graph library.
Also, maybe the string edit distance between the two canonical smiles is
a good enough proxy.
For chain bonds, and non-aromatic bonds, it's easy to delete the bond
and add the correct number of hydrogens to either side.
But, what should I do when I cut an aromatic bond?
For something like the first "co" in "c1cocn1", I want the result to
be C=CN=CO. That's because the "o" can only be "-O-" in Kekule form.
For something like "c1cnncn1", breaking on the "nn", I think I would
like to get both 'N=CC=NC=N' and 'NC=CN=CN' because the "nn" can be a
single or a double bond, depending on the Kekule representation, as
in:
Chem.CanonSmiles("C-1=N-N=C-C=N-1")
'c1cnncn1'
Chem.CanonSmiles("C-1=N.N=C-C=N-1")
'N=CC=NC=N'
Chem.CanonSmiles("C=1-N=N-C=C-N=1")
'c1cnncn1'
Chem.CanonSmiles("C=1-N-[HH].[HH]N-C=C-N=1")
'NC=CN=CN'
Problem is, I don't know how to figure out if a given aromatic bond
must be a "-" or "=", or can be both.
(Well, I could brute-force enumerae all 2**n possible aromatic bond
assignments, then canonicalize, and see if both assignments are
possible for a given bond.)
As a non-chemist, I also ask if I'm even on a chemically meaningful
track.
Andrew
da...@dalkescientific.com
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss