[Rdkit-discuss] question on complexity of cannonization

S Joshua Swamidass Thu, 15 Jun 2023 09:23:03 -0700

Hello,

I had a theoretical question I thought the RDkit team might have some
insight on.


It's well known that the graph-isomorphism problem is NP, which means that
for some examples run-time will be worse than polynomial using known
algorithms. This problem is connected to molecule canonization. By
comparing canonical SMILES strings, we very efficiently determine whether
or not two molecular graphs are isomorphic or not.

That means canonizing SMILES graphs is either:

1. potentially very costly in some cases.
   AND/OR
2. fails to produce a unique string for particular graph structures.

So here is my question. What are the cases that are very difficult to
canonize a graph? By extension, are any particular classes of real
molecules difficult or impossible to canonize? If not, is that because
there are some restrictions on molecule graphs that can guarantee we avoid
the difficult cases (e.g. limited branching factor, or that molecules are
nearly planar?)?

Thanks for considering my question.

*S. Joshua Swamidass M.D. Ph.D.*
http://swami.wustl.edu/

Associate Professor, Laboratory and Genomic Medicine
Associate Professor, Biomedical Engineering
Washington University in St. Louis

Administrator: Lori Scantlan <*[email protected] <[email protected]>*>

_______________________________________________
Rdkit-discuss mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

[Rdkit-discuss] question on complexity of cannonization

Reply via email to