On 21/08/2019 05:06, Andrew Dalke wrote:
Hi all,

  Someone asked me recently about finding the graph edit distance of
two small (<= 14 atom) fragments.

I figured this was something that could be brute forced. Following
SmallWorld's example at
https://cisrg.shef.ac.uk/shef2016/talks/oral13.pdf , given a fragment,
incrementally delete terminals (except the "*" connection point atom),
and ring bonds.

Unless rdkit has something, I think graph edit distance is the kind
of things for which you have to rely on a good graph library.

Also, maybe the string edit distance between the two canonical smiles is a good enough proxy.

For chain bonds, and non-aromatic bonds, it's easy to delete the bond
and add the correct number of hydrogens to either side.

But, what should I do when I cut an aromatic bond?

For something like the first "co" in "c1cocn1", I want the result to
be C=CN=CO. That's because the "o" can only be "-O-" in Kekule form.

For something like "c1cnncn1", breaking on the "nn", I think I would
like to get both 'N=CC=NC=N' and 'NC=CN=CN' because the "nn" can be a
single or a double bond, depending on the Kekule representation, as
in:

Chem.CanonSmiles("C-1=N-N=C-C=N-1")
'c1cnncn1'
Chem.CanonSmiles("C-1=N.N=C-C=N-1")
'N=CC=NC=N'

Chem.CanonSmiles("C=1-N=N-C=C-N=1")
'c1cnncn1'
Chem.CanonSmiles("C=1-N-[HH].[HH]N-C=C-N=1")
'NC=CN=CN'

Problem is, I don't know how to figure out if a given aromatic bond
must be a "-" or "=", or can be both.

(Well, I could brute-force enumerae all 2**n possible aromatic bond
assignments, then canonicalize, and see if both assignments are
possible for a given bond.)

As a non-chemist, I also ask if I'm even on a chemically meaningful track.


                                Andrew
                                da...@dalkescientific.com




_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to