Jim,
I'm a bit confused by what you're trying to do.
Maybe we can try simplifying. What would you like to have returned for each
of these SMILES:
1) ClC=CCl
2) ClC(Cl)=CCl
3) ClC(Cl)=C(Cl)Cl
If the answer is the same between 1) and 2), but different for 3), then the
next question will be: "why?"
-greg
On Wed, Nov 8, 2017 at 12:38 AM, James T. Metz via Rdkit-discuss <
rdkit-discuss@lists.sourceforge.net> wrote:
> RDkit Discussion Group,
>
> I have written a SMARTS to detect vicinal chlorine groups
> using RDkit. There are 4 atoms involved in a vicinal chlorine group.
>
> SMARTS = '[Cl]-[C,c]-,=,:[C,c]-[Cl]'
>
> I am trying to count the number of ("unique") occurrences of this
> pattern.
>
> For some molecules with symmetry, this results in
> over-counting.
>
> For the molecule, smiles1 below, I want to obtain
> a count of 1 i.e., 1 tuple of 4 atoms.
>
> smiles1 = 'ClC(Cl)CCl'
>
> However, using the SMARTS above, I obtain 2 tuples of 4 atoms.
> Beginning with a MOL file representation of smiles1, I get
>
> ((1,2,4,3), (0,2,4,3))
>
> One possible solution is to somehow merge the two tuples according
> to a "rule." One rule that works is "if 3 of the atom indices are the
> same,
> then combine into one tuple."
>
> However, the rule needs a bit of modification for more complicated
> cases (higher symmetry).
>
> Consider
>
> smiles2 = 'ClC(Cl)CCl(Cl)(Cl)
>
> My goal is to get 2 tuples of 4 atoms for smiles2
>
> smiles2 is somewhat tricky because there are either
> 2 groups of 3 (4 atom) tuples, or 3 groups of 2 (4 atom)
> tuples depending on how you choose your 3 atom indices.
>
> Again, if my goal is to get 2 tuples, then I need to somehow
> pick the largest group, i.e., 2 groups of 3 tuples to do the merge
> operation which will give me 2 remaining groups (desired).
>
> I have already checked stackoverflow and a few other places
> for PYTHON code to do the necessary merging, but I could not
> find anything specific and appropriate.
>
> I would be most grateful if anyone has ideas how to do this. I
> suspect the answer is a few lines of well-written PYTHON code,
> and not modifying the SMARTS (I could be mistaken!).
>
> Thank you.
>
> Regards,
> Jim Metz
>
>
>
> ------------------------------------------------------------
> ------------------
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> _______________________________________________
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss