On 11/28/2016 10:25 AM, Stephen O'hagan wrote:
> Has anyone come up with fool-proof way of matching structurally equivalent 
> molecules?

This is somewhat convoluted and there is no proof that it's fool-proof.

A few years ago we had good results from running graphpowerhash()
function here:
http://madgik.github.io/madis/aggregate.html#module-functions.aggregate.graph
on the PDB ligand database.

The parameters were

- atom1, atom2 IDs (names) as node1, node2.

- Atom stereo (R, S, N), aromatic (y/n), and "leaving atom" (y/n) for
the atoms as node1_details, node2_details (packed into single string
with jpack() function: see http://madgik.github.io/madis/row.html).

Looking at it now, I don't think nodeN_details parameter needs to
include atom's "aromatic" flag.

- Massaged bond type and bond stereo (E, Z, N) as edge_details. Also
packed into a string as above.

PDB chem comp model has bond type as SING or DOUB with a separate yes/no
"aromatic" column. We changed it to AROM for the ones where that was a yes.

The basic model is a list of bonds with atom1, atom2, and type, and a
list of atoms with stereo, aromatic, and "leaving" flags -- the last one
is "Y" for atoms that "go away" when forming a bond.

The algorithm itself, as far as I know (I am not the author), takes the
two "matrices" representing the molecule "graphs", computes their
largest eigenvalue/eigenvectors, and compares those. We have no proof
that it's 100% correct, but all duplicates it found in the PDB ligand
expo at the time were genuine.

Enjoy,
-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu

Attachment: signature.asc
Description: OpenPGP digital signature

------------------------------------------------------------------------------
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to