You can use Chem.CanonicalRankAtoms to de-duplicate the SMARTS matches
based upon the atom symmetry like this:

def count_unique_substructures(smiles, smarts):
    mol = Chem.MolFromSmiles(smiles)
    ranks = list(Chem.CanonicalRankAtoms(mol, breakTies=False))
    pattern = Chem.MolFromSmarts(smarts)

    unique_sets_of_atoms = set()
    for match in mol.GetSubstructMatches(pattern):
        match_ranks = frozenset([ranks[idx] for idx in match])
        unique_sets_of_atoms.add(match_ranks)

    return len(unique_sets_of_atoms)

However, this returns 1 for each of your cases. It's not clear to me why
you would want your 2nd case to return 2 as all paths from a chlorine to a
chlorine through 2 carbons are symmetric.

>>> SMARTS = '[Cl]-[C,c]-,=,:[C,c]-[Cl]'
>>> smiles1 = 'ClC(Cl)CCl'
>>> smiles2 = 'ClC(Cl)C(Cl)(Cl)(Cl)'
>>> count_unique_substructures(smiles1, SMARTS)
1
>>> count_unique_substructures(smiles2, SMARTS)
1

-Brian



On Tue, Nov 7, 2017 at 7:38 PM, James T. Metz via Rdkit-discuss <
rdkit-discuss@lists.sourceforge.net> wrote:

> RDkit Discussion Group,
>
>     I have written a SMARTS to detect vicinal chlorine groups
> using RDkit.  There are 4 atoms involved in a vicinal chlorine group.
>
> SMARTS = '[Cl]-[C,c]-,=,:[C,c]-[Cl]'
>
>     I am trying to count the number of ("unique") occurrences of this
> pattern.
>
>     For some molecules with symmetry, this results in
> over-counting.
>
>     For the molecule, smiles1 below, I want to obtain
> a count of 1 i.e., 1 tuple of 4 atoms.
>
>     smiles1 = 'ClC(Cl)CCl'
>
>     However, using the SMARTS above, I obtain 2 tuples of 4 atoms.
> Beginning with a MOL file representation of smiles1, I get
>
>     ((1,2,4,3), (0,2,4,3))
>
>     One possible solution is to somehow merge the two tuples according
> to a "rule."  One rule that works is "if 3 of the atom indices are the
> same,
> then combine into one tuple."
>
>     However, the rule needs a bit of modification for more complicated
> cases (higher symmetry).
>
>     Consider
>
>     smiles2 = 'ClC(Cl)CCl(Cl)(Cl)
>
>     My goal is to get 2 tuples of 4 atoms for smiles2
>
>     smiles2 is somewhat tricky because there are either
> 2 groups of 3 (4 atom) tuples, or 3 groups of 2 (4 atom)
> tuples depending on how you choose your 3 atom indices.
>
>     Again, if my goal is to get 2 tuples, then I need to somehow
> pick the largest group, i.e., 2 groups of 3 tuples to do the merge
> operation which will give me 2 remaining groups (desired).
>
>     I have already checked stackoverflow and a few other places
> for PYTHON code to do the necessary merging, but I could not
> find anything specific and appropriate.
>
>     I would be most grateful if anyone has ideas how to do this.  I
> suspect the answer is a few lines of well-written PYTHON code,
> and not modifying the SMARTS (I could be mistaken!).
>
>     Thank you.
>
>     Regards,
>     Jim Metz
>
>
>
> ------------------------------------------------------------
> ------------------
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> _______________________________________________
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to