Re: [Rdkit-discuss] Python code to merge tuples from a SMARTS match

2017-11-07 Thread David Cosgrove
Hi Jim,
Would it not be easier to use a recursive SMARTS, so that you only count
the carbon atoms? Something like [$([C,c]Cl)]-,=,:[$([C,c]Cl)], or, more
compactly [$([#6]Cl)]~[$([#6]Cl)].  I haven't tested these, as I'm not
close to a suitably equipped computer, but you should be able to get the
gist at least.  The Cl is only defining the sort of C you're after so you
won't have to deal with multiple Cl matches on the same atom.
Dave


On Wed, Nov 8, 2017 at 7:08 AM, Greg Landrum  wrote:

> Jim,
>
> I'm a bit confused by what you're trying to do.
>
> Maybe we can try simplifying. What would you like to have returned for
> each of these SMILES:
> 1) ClC=CCl
> 2) ClC(Cl)=CCl
> 3) ClC(Cl)=C(Cl)Cl
>
> If the answer is the same between 1) and 2), but different for 3), then
> the next question will be: "why?"
>
> -greg
>
>
> On Wed, Nov 8, 2017 at 12:38 AM, James T. Metz via Rdkit-discuss <
> rdkit-discuss@lists.sourceforge.net> wrote:
>
>> RDkit Discussion Group,
>>
>> I have written a SMARTS to detect vicinal chlorine groups
>> using RDkit.  There are 4 atoms involved in a vicinal chlorine group.
>>
>> SMARTS = '[Cl]-[C,c]-,=,:[C,c]-[Cl]'
>>
>> I am trying to count the number of ("unique") occurrences of this
>> pattern.
>>
>> For some molecules with symmetry, this results in
>> over-counting.
>>
>> For the molecule, smiles1 below, I want to obtain
>> a count of 1 i.e., 1 tuple of 4 atoms.
>>
>> smiles1 = 'ClC(Cl)CCl'
>>
>> However, using the SMARTS above, I obtain 2 tuples of 4 atoms.
>> Beginning with a MOL file representation of smiles1, I get
>>
>> ((1,2,4,3), (0,2,4,3))
>>
>> One possible solution is to somehow merge the two tuples according
>> to a "rule."  One rule that works is "if 3 of the atom indices are the
>> same,
>> then combine into one tuple."
>>
>> However, the rule needs a bit of modification for more complicated
>> cases (higher symmetry).
>>
>> Consider
>>
>> smiles2 = 'ClC(Cl)CCl(Cl)(Cl)
>>
>> My goal is to get 2 tuples of 4 atoms for smiles2
>>
>> smiles2 is somewhat tricky because there are either
>> 2 groups of 3 (4 atom) tuples, or 3 groups of 2 (4 atom)
>> tuples depending on how you choose your 3 atom indices.
>>
>> Again, if my goal is to get 2 tuples, then I need to somehow
>> pick the largest group, i.e., 2 groups of 3 tuples to do the merge
>> operation which will give me 2 remaining groups (desired).
>>
>> I have already checked stackoverflow and a few other places
>> for PYTHON code to do the necessary merging, but I could not
>> find anything specific and appropriate.
>>
>> I would be most grateful if anyone has ideas how to do this.  I
>> suspect the answer is a few lines of well-written PYTHON code,
>> and not modifying the SMARTS (I could be mistaken!).
>>
>> Thank you.
>>
>> Regards,
>> Jim Metz
>>
>>
>>
>> 
>> --
>> Check out the vibrant tech community on one of the world's most
>> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
>> ___
>> Rdkit-discuss mailing list
>> Rdkit-discuss@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>
>>
>
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>


-- 
David Cosgrove
Freelance computational chemistry and chemoinformatics developer
http://cozchemix.co.uk
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Python code to merge tuples from a SMARTS match

2017-11-07 Thread Greg Landrum
Jim,

I'm a bit confused by what you're trying to do.

Maybe we can try simplifying. What would you like to have returned for each
of these SMILES:
1) ClC=CCl
2) ClC(Cl)=CCl
3) ClC(Cl)=C(Cl)Cl

If the answer is the same between 1) and 2), but different for 3), then the
next question will be: "why?"

-greg


On Wed, Nov 8, 2017 at 12:38 AM, James T. Metz via Rdkit-discuss <
rdkit-discuss@lists.sourceforge.net> wrote:

> RDkit Discussion Group,
>
> I have written a SMARTS to detect vicinal chlorine groups
> using RDkit.  There are 4 atoms involved in a vicinal chlorine group.
>
> SMARTS = '[Cl]-[C,c]-,=,:[C,c]-[Cl]'
>
> I am trying to count the number of ("unique") occurrences of this
> pattern.
>
> For some molecules with symmetry, this results in
> over-counting.
>
> For the molecule, smiles1 below, I want to obtain
> a count of 1 i.e., 1 tuple of 4 atoms.
>
> smiles1 = 'ClC(Cl)CCl'
>
> However, using the SMARTS above, I obtain 2 tuples of 4 atoms.
> Beginning with a MOL file representation of smiles1, I get
>
> ((1,2,4,3), (0,2,4,3))
>
> One possible solution is to somehow merge the two tuples according
> to a "rule."  One rule that works is "if 3 of the atom indices are the
> same,
> then combine into one tuple."
>
> However, the rule needs a bit of modification for more complicated
> cases (higher symmetry).
>
> Consider
>
> smiles2 = 'ClC(Cl)CCl(Cl)(Cl)
>
> My goal is to get 2 tuples of 4 atoms for smiles2
>
> smiles2 is somewhat tricky because there are either
> 2 groups of 3 (4 atom) tuples, or 3 groups of 2 (4 atom)
> tuples depending on how you choose your 3 atom indices.
>
> Again, if my goal is to get 2 tuples, then I need to somehow
> pick the largest group, i.e., 2 groups of 3 tuples to do the merge
> operation which will give me 2 remaining groups (desired).
>
> I have already checked stackoverflow and a few other places
> for PYTHON code to do the necessary merging, but I could not
> find anything specific and appropriate.
>
> I would be most grateful if anyone has ideas how to do this.  I
> suspect the answer is a few lines of well-written PYTHON code,
> and not modifying the SMARTS (I could be mistaken!).
>
> Thank you.
>
> Regards,
> Jim Metz
>
>
>
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Python code to merge tuples from a SMARTS match

2017-11-07 Thread Brian Cole
You can use Chem.CanonicalRankAtoms to de-duplicate the SMARTS matches
based upon the atom symmetry like this:

def count_unique_substructures(smiles, smarts):
mol = Chem.MolFromSmiles(smiles)
ranks = list(Chem.CanonicalRankAtoms(mol, breakTies=False))
pattern = Chem.MolFromSmarts(smarts)

unique_sets_of_atoms = set()
for match in mol.GetSubstructMatches(pattern):
match_ranks = frozenset([ranks[idx] for idx in match])
unique_sets_of_atoms.add(match_ranks)

return len(unique_sets_of_atoms)

However, this returns 1 for each of your cases. It's not clear to me why
you would want your 2nd case to return 2 as all paths from a chlorine to a
chlorine through 2 carbons are symmetric.

>>> SMARTS = '[Cl]-[C,c]-,=,:[C,c]-[Cl]'
>>> smiles1 = 'ClC(Cl)CCl'
>>> smiles2 = 'ClC(Cl)C(Cl)(Cl)(Cl)'
>>> count_unique_substructures(smiles1, SMARTS)
1
>>> count_unique_substructures(smiles2, SMARTS)
1

-Brian



On Tue, Nov 7, 2017 at 7:38 PM, James T. Metz via Rdkit-discuss <
rdkit-discuss@lists.sourceforge.net> wrote:

> RDkit Discussion Group,
>
> I have written a SMARTS to detect vicinal chlorine groups
> using RDkit.  There are 4 atoms involved in a vicinal chlorine group.
>
> SMARTS = '[Cl]-[C,c]-,=,:[C,c]-[Cl]'
>
> I am trying to count the number of ("unique") occurrences of this
> pattern.
>
> For some molecules with symmetry, this results in
> over-counting.
>
> For the molecule, smiles1 below, I want to obtain
> a count of 1 i.e., 1 tuple of 4 atoms.
>
> smiles1 = 'ClC(Cl)CCl'
>
> However, using the SMARTS above, I obtain 2 tuples of 4 atoms.
> Beginning with a MOL file representation of smiles1, I get
>
> ((1,2,4,3), (0,2,4,3))
>
> One possible solution is to somehow merge the two tuples according
> to a "rule."  One rule that works is "if 3 of the atom indices are the
> same,
> then combine into one tuple."
>
> However, the rule needs a bit of modification for more complicated
> cases (higher symmetry).
>
> Consider
>
> smiles2 = 'ClC(Cl)CCl(Cl)(Cl)
>
> My goal is to get 2 tuples of 4 atoms for smiles2
>
> smiles2 is somewhat tricky because there are either
> 2 groups of 3 (4 atom) tuples, or 3 groups of 2 (4 atom)
> tuples depending on how you choose your 3 atom indices.
>
> Again, if my goal is to get 2 tuples, then I need to somehow
> pick the largest group, i.e., 2 groups of 3 tuples to do the merge
> operation which will give me 2 remaining groups (desired).
>
> I have already checked stackoverflow and a few other places
> for PYTHON code to do the necessary merging, but I could not
> find anything specific and appropriate.
>
> I would be most grateful if anyone has ideas how to do this.  I
> suspect the answer is a few lines of well-written PYTHON code,
> and not modifying the SMARTS (I could be mistaken!).
>
> Thank you.
>
> Regards,
> Jim Metz
>
>
>
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Python code to merge tuples from a SMARTS match

2017-11-07 Thread Peter S. Shenkin
I think you probably used a slightly different SMILES than the one you
showed. The one you showed should have given ((0,1,3,4),(2,1,3,4)).

The proper merge rule would then be to consider all matches equivalent if
the 2nd and 3rd atom in the match agree, in any order; i.e, the two
carbons, indices 1 and 3 in this case.

So to do this, for each molecule, do something like this:

d = dict{}
for match in matches:
t = (match[1], match[2])
if match[1] < match[2] ):
t = (match[1], match[2])
else:
t = (match[2], match[1])
d[t] = match

You will wind up with as many dictionary elements as there are matches.

-P.


On Tue, Nov 7, 2017 at 7:38 PM, James T. Metz via Rdkit-discuss <
rdkit-discuss@lists.sourceforge.net> wrote:

> RDkit Discussion Group,
>
> I have written a SMARTS to detect vicinal chlorine groups
> using RDkit.  There are 4 atoms involved in a vicinal chlorine group.
>
> SMARTS = '[Cl]-[C,c]-,=,:[C,c]-[Cl]'
>
> I am trying to count the number of ("unique") occurrences of this
> pattern.
>
> For some molecules with symmetry, this results in
> over-counting.
>
> For the molecule, smiles1 below, I want to obtain
> a count of 1 i.e., 1 tuple of 4 atoms.
>
> smiles1 = 'ClC(Cl)CCl'
>
> However, using the SMARTS above, I obtain 2 tuples of 4 atoms.
> Beginning with a MOL file representation of smiles1, I get
>
> ((1,2,4,3), (0,2,4,3))
>
> One possible solution is to somehow merge the two tuples according
> to a "rule."  One rule that works is "if 3 of the atom indices are the
> same,
> then combine into one tuple."
>
> However, the rule needs a bit of modification for more complicated
> cases (higher symmetry).
>
> Consider
>
> smiles2 = 'ClC(Cl)CCl(Cl)(Cl)
>
> My goal is to get 2 tuples of 4 atoms for smiles2
>
> smiles2 is somewhat tricky because there are either
> 2 groups of 3 (4 atom) tuples, or 3 groups of 2 (4 atom)
> tuples depending on how you choose your 3 atom indices.
>
> Again, if my goal is to get 2 tuples, then I need to somehow
> pick the largest group, i.e., 2 groups of 3 tuples to do the merge
> operation which will give me 2 remaining groups (desired).
>
> I have already checked stackoverflow and a few other places
> for PYTHON code to do the necessary merging, but I could not
> find anything specific and appropriate.
>
> I would be most grateful if anyone has ideas how to do this.  I
> suspect the answer is a few lines of well-written PYTHON code,
> and not modifying the SMARTS (I could be mistaken!).
>
> Thank you.
>
> Regards,
> Jim Metz
>
>
>
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] Python code to merge tuples from a SMARTS match

2017-11-07 Thread James T. Metz via Rdkit-discuss
RDkit Discussion Group,




I have written a SMARTS to detect vicinal chlorine groups

using RDkit.  There are 4 atoms involved in a vicinal chlorine group.


SMARTS = '[Cl]-[C,c]-,=,:[C,c]-[Cl]'


I am trying to count the number of ("unique") occurrences of this

pattern.


For some molecules with symmetry, this results in

over-counting.
   

For the molecule, smiles1 below, I want to obtain

a count of 1 i.e., 1 tuple of 4 atoms.


smiles1 = 'ClC(Cl)CCl'



However, using the SMARTS above, I obtain 2 tuples of 4 atoms.  
Beginning with a MOL file representation of smiles1, I get


((1,2,4,3), (0,2,4,3))



One possible solution is to somehow merge the two tuples according 

to a "rule."  One rule that works is "if 3 of the atom indices are the same, 
then combine into one tuple."


However, the rule needs a bit of modification for more complicated
cases (higher symmetry).


Consider



smiles2 = 'ClC(Cl)CCl(Cl)(Cl)



My goal is to get 2 tuples of 4 atoms for smiles2



smiles2 is somewhat tricky because there are either

2 groups of 3 (4 atom) tuples, or 3 groups of 2 (4 atom)
tuples depending on how you choose your 3 atom indices.


Again, if my goal is to get 2 tuples, then I need to somehow

pick the largest group, i.e., 2 groups of 3 tuples to do the merge 
operation which will give me 2 remaining groups (desired).


I have already checked stackoverflow and a few other places

for PYTHON code to do the necessary merging, but I could not
find anything specific and appropriate.


I would be most grateful if anyone has ideas how to do this.  I

suspect the answer is a few lines of well-written PYTHON code, 
and not modifying the SMARTS (I could be mistaken!).


Thank you.



Regards,

Jim Metz




--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss