Hello, I had a theoretical question I thought the RDkit team might have some insight on.
It's well known that the graph-isomorphism problem is NP, which means that for some examples run-time will be worse than polynomial using known algorithms. This problem is connected to molecule canonization. By comparing canonical SMILES strings, we very efficiently determine whether or not two molecular graphs are isomorphic or not. That means canonizing SMILES graphs is either: 1. potentially very costly in some cases. AND/OR 2. fails to produce a unique string for particular graph structures. So here is my question. What are the cases that are very difficult to canonize a graph? By extension, are any particular classes of real molecules difficult or impossible to canonize? If not, is that because there are some restrictions on molecule graphs that can guarantee we avoid the difficult cases (e.g. limited branching factor, or that molecules are nearly planar?)? Thanks for considering my question. *S. Joshua Swamidass M.D. Ph.D.* http://swami.wustl.edu/ Associate Professor, Laboratory and Genomic Medicine Associate Professor, Biomedical Engineering Washington University in St. Louis Administrator: Lori Scantlan <*llscant...@wustl.edu <llscant...@wustl.edu>*>
_______________________________________________ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss