You can use Chem.CanonicalRankAtoms to de-duplicate the SMARTS matches
based upon the atom symmetry like this:
def count_unique_substructures(smiles, smarts):
mol = Chem.MolFromSmiles(smiles)
ranks = list(Chem.CanonicalRankAtoms(mol, breakTies=False))
pattern = Chem.MolFromSmarts(smarts)
unique_sets_of_atoms = set()
for match in mol.GetSubstructMatches(pattern):
match_ranks = frozenset([ranks[idx] for idx in match])
unique_sets_of_atoms.add(match_ranks)
return len(unique_sets_of_atoms)
However, this returns 1 for each of your cases. It's not clear to me why
you would want your 2nd case to return 2 as all paths from a chlorine to a
chlorine through 2 carbons are symmetric.
>>> SMARTS = '[Cl]-[C,c]-,=,:[C,c]-[Cl]'
>>> smiles1 = 'ClC(Cl)CCl'
>>> smiles2 = 'ClC(Cl)C(Cl)(Cl)(Cl)'
>>> count_unique_substructures(smiles1, SMARTS)
1
>>> count_unique_substructures(smiles2, SMARTS)
1
-Brian
On Tue, Nov 7, 2017 at 7:38 PM, James T. Metz via Rdkit-discuss <
rdkit-discuss@lists.sourceforge.net> wrote:
> RDkit Discussion Group,
>
> I have written a SMARTS to detect vicinal chlorine groups
> using RDkit. There are 4 atoms involved in a vicinal chlorine group.
>
> SMARTS = '[Cl]-[C,c]-,=,:[C,c]-[Cl]'
>
> I am trying to count the number of ("unique") occurrences of this
> pattern.
>
> For some molecules with symmetry, this results in
> over-counting.
>
> For the molecule, smiles1 below, I want to obtain
> a count of 1 i.e., 1 tuple of 4 atoms.
>
> smiles1 = 'ClC(Cl)CCl'
>
> However, using the SMARTS above, I obtain 2 tuples of 4 atoms.
> Beginning with a MOL file representation of smiles1, I get
>
> ((1,2,4,3), (0,2,4,3))
>
> One possible solution is to somehow merge the two tuples according
> to a "rule." One rule that works is "if 3 of the atom indices are the
> same,
> then combine into one tuple."
>
> However, the rule needs a bit of modification for more complicated
> cases (higher symmetry).
>
> Consider
>
> smiles2 = 'ClC(Cl)CCl(Cl)(Cl)
>
> My goal is to get 2 tuples of 4 atoms for smiles2
>
> smiles2 is somewhat tricky because there are either
> 2 groups of 3 (4 atom) tuples, or 3 groups of 2 (4 atom)
> tuples depending on how you choose your 3 atom indices.
>
> Again, if my goal is to get 2 tuples, then I need to somehow
> pick the largest group, i.e., 2 groups of 3 tuples to do the merge
> operation which will give me 2 remaining groups (desired).
>
> I have already checked stackoverflow and a few other places
> for PYTHON code to do the necessary merging, but I could not
> find anything specific and appropriate.
>
> I would be most grateful if anyone has ideas how to do this. I
> suspect the answer is a few lines of well-written PYTHON code,
> and not modifying the SMARTS (I could be mistaken!).
>
> Thank you.
>
> Regards,
> Jim Metz
>
>
>
> ------------------------------------------------------------
> ------------------
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> _______________________________________________
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss