Re: [Rdkit-discuss] multiple SMARTS that match only if in the same fragment

2020-03-07 Thread Curt Fischer
One version I came up with is, assuming "query" is a Smarts-derived
molecule that you want to ensure occurs once and only once in any single
fragment in a set of molecules:

def hasMultiSubstructPerFrag(mol, query):
"""
Determines whether mol has more than one match to query in a single
covalently connected fragment.
"""
if mol.HasSubstructMatch(query):
if any(len(frag.GetSubstructMatches(query)) > 1
   for frag in
   rdmolops.GetMolFrags(mol, asMols=True)
  ):
return True
else:
return False


On Sat, Mar 7, 2020 at 4:02 PM Curt Fischer 
wrote:

> Thanks Ivan -- very helpful.
>
> Is there any consensus on idioms for identifying multiple moieties in the
> same fragment?  Do I have to use len(mol.GetSubstructMatches(patt)) > 1 as
> some kind of selector and then do some kind of graph traversal routine to
> see if any of the matches are covalently connected?
>
> On Sat, Mar 7, 2020 at 3:34 PM Ivan Tubert-Brohman <
> ivan.tubert-broh...@schrodinger.com> wrote:
>
>> Hi Curt,
>>
>> According to
>> https://www.rdkit.org/docs/RDKit_Book.html#smarts-support-and-extensions ,
>> it's not supported:
>>
>> Here’s the (hopefully complete) list of SMARTS features that are *not*
>>>  supported:
>>>
>>>- Non-tetrahedral chiral classes
>>>
>>>
>>>- the @? operator
>>>
>>>
>>>- explicit atomic masses (though isotope queries are supported)
>>>
>>>
>>>- component level grouping requiring matches in different
>>>components, i.e. (C).(C)
>>>
>>> OK, the way it's worded it sounds like (C.C) might be supported (since
>> that would be requiring matches in the same component), but as you've seen,
>> it isn't supported either...
>>
>> Ivan
>>
>>
>> On Sat, Mar 7, 2020 at 4:58 PM Curt Fischer 
>> wrote:
>>
>>> Hi rdkit fiends!
>>>
>>> The [Daylight SMARTS example page](
>>> https://daylight.com/dayhtml_tutorials/languages/smarts/smarts_examples.html)
>>> gives several examples for "multiple group" smarts, including these strings:
>>>
>>> ([Cl!$(Cl~c)].[c!$(c~Cl)])
>>> ([Cl]).([c])
>>> ([Cl].[c])
>>> [NX3;H2,H1;!$(NC=O)].[NX3;H2,H1;!$(NC=O)]
>>>
>>> In general, I cannot get these to be parsed by Chem.MolFromSmarts().
>>>
>>> For example,  Chem.MolFromSmarts('([Cl!$(Cl~c)].[c!$(c~Cl)])') gives me
>>> this error message:
>>>
>>> ```
>>> [13:01:41] SMARTS Parse Error: syntax error while parsing:
>>> ([Cl!$(Cl~c)_100].[c!$(c~Cl)_101])
>>> [13:01:41] SMARTS Parse Error: Failed parsing SMARTS
>>> '([Cl!$(Cl~c)_100].[c!$(c~Cl)_101])' for input: '([Cl!$(Cl~c)].[c!$(c~Cl)])'
>>> ```
>>> My understanding of SMARTS is that the outermost parentheses in this
>>> SMARTS string are required to force the chlorine and the aromatic carbon to
>>> be somewhere in the same covalently connected fragment.  E.g. this pattern
>>> *should* hit benzyl chloride ClCc1c1 but should *not* hit the
>>> hydrochloride salt of aniline Cl.Nc1c1.
>>>
>>> What am I getting wrong?  Is there a way to write rdkit-parsable SMARTS
>>> that achieves this?  (I want to filter our molecules that contain more than
>>> one of certain moieties, while allowing molecules that have one (or zero)
>>> such moieties.  But salts or covalently disconnected fragments that each
>>> contain one instance of the moiety should be fine.)
>>>
>>> Details on my setup:
>>>
>>> - RDKit Version: 2019.09.3
>>> - Operating system: macOS 10.15.2
>>> - Python version (if relevant): 3.6
>>> - Are you using conda? yes
>>> - If you are using conda, which channel did you install the rdkit from?
>>> `conda-forge`
>>> - If you are not using conda: how did you install the RDKit?
>>>
>>> Curt
>>>
>>> ___
>>> Rdkit-discuss mailing list
>>> Rdkit-discuss@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>>
>>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] multiple SMARTS that match only if in the same fragment

2020-03-07 Thread Curt Fischer
Thanks Ivan -- very helpful.

Is there any consensus on idioms for identifying multiple moieties in the
same fragment?  Do I have to use len(mol.GetSubstructMatches(patt)) > 1 as
some kind of selector and then do some kind of graph traversal routine to
see if any of the matches are covalently connected?

On Sat, Mar 7, 2020 at 3:34 PM Ivan Tubert-Brohman <
ivan.tubert-broh...@schrodinger.com> wrote:

> Hi Curt,
>
> According to
> https://www.rdkit.org/docs/RDKit_Book.html#smarts-support-and-extensions ,
> it's not supported:
>
> Here’s the (hopefully complete) list of SMARTS features that are *not*
>>  supported:
>>
>>- Non-tetrahedral chiral classes
>>
>>
>>- the @? operator
>>
>>
>>- explicit atomic masses (though isotope queries are supported)
>>
>>
>>- component level grouping requiring matches in different components,
>>i.e. (C).(C)
>>
>> OK, the way it's worded it sounds like (C.C) might be supported (since
> that would be requiring matches in the same component), but as you've seen,
> it isn't supported either...
>
> Ivan
>
>
> On Sat, Mar 7, 2020 at 4:58 PM Curt Fischer 
> wrote:
>
>> Hi rdkit fiends!
>>
>> The [Daylight SMARTS example page](
>> https://daylight.com/dayhtml_tutorials/languages/smarts/smarts_examples.html)
>> gives several examples for "multiple group" smarts, including these strings:
>>
>> ([Cl!$(Cl~c)].[c!$(c~Cl)])
>> ([Cl]).([c])
>> ([Cl].[c])
>> [NX3;H2,H1;!$(NC=O)].[NX3;H2,H1;!$(NC=O)]
>>
>> In general, I cannot get these to be parsed by Chem.MolFromSmarts().
>>
>> For example,  Chem.MolFromSmarts('([Cl!$(Cl~c)].[c!$(c~Cl)])') gives me
>> this error message:
>>
>> ```
>> [13:01:41] SMARTS Parse Error: syntax error while parsing:
>> ([Cl!$(Cl~c)_100].[c!$(c~Cl)_101])
>> [13:01:41] SMARTS Parse Error: Failed parsing SMARTS
>> '([Cl!$(Cl~c)_100].[c!$(c~Cl)_101])' for input: '([Cl!$(Cl~c)].[c!$(c~Cl)])'
>> ```
>> My understanding of SMARTS is that the outermost parentheses in this
>> SMARTS string are required to force the chlorine and the aromatic carbon to
>> be somewhere in the same covalently connected fragment.  E.g. this pattern
>> *should* hit benzyl chloride ClCc1c1 but should *not* hit the
>> hydrochloride salt of aniline Cl.Nc1c1.
>>
>> What am I getting wrong?  Is there a way to write rdkit-parsable SMARTS
>> that achieves this?  (I want to filter our molecules that contain more than
>> one of certain moieties, while allowing molecules that have one (or zero)
>> such moieties.  But salts or covalently disconnected fragments that each
>> contain one instance of the moiety should be fine.)
>>
>> Details on my setup:
>>
>> - RDKit Version: 2019.09.3
>> - Operating system: macOS 10.15.2
>> - Python version (if relevant): 3.6
>> - Are you using conda? yes
>> - If you are using conda, which channel did you install the rdkit from?
>> `conda-forge`
>> - If you are not using conda: how did you install the RDKit?
>>
>> Curt
>>
>> ___
>> Rdkit-discuss mailing list
>> Rdkit-discuss@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] multiple SMARTS that match only if in the same fragment

2020-03-07 Thread Ivan Tubert-Brohman
Hi Curt,

According to
https://www.rdkit.org/docs/RDKit_Book.html#smarts-support-and-extensions ,
it's not supported:

Here’s the (hopefully complete) list of SMARTS features that are *not*
>  supported:
>
>- Non-tetrahedral chiral classes
>
>
>- the @? operator
>
>
>- explicit atomic masses (though isotope queries are supported)
>
>
>- component level grouping requiring matches in different components,
>i.e. (C).(C)
>
> OK, the way it's worded it sounds like (C.C) might be supported (since
that would be requiring matches in the same component), but as you've seen,
it isn't supported either...

Ivan


On Sat, Mar 7, 2020 at 4:58 PM Curt Fischer 
wrote:

> Hi rdkit fiends!
>
> The [Daylight SMARTS example page](
> https://daylight.com/dayhtml_tutorials/languages/smarts/smarts_examples.html)
> gives several examples for "multiple group" smarts, including these strings:
>
> ([Cl!$(Cl~c)].[c!$(c~Cl)])
> ([Cl]).([c])
> ([Cl].[c])
> [NX3;H2,H1;!$(NC=O)].[NX3;H2,H1;!$(NC=O)]
>
> In general, I cannot get these to be parsed by Chem.MolFromSmarts().
>
> For example,  Chem.MolFromSmarts('([Cl!$(Cl~c)].[c!$(c~Cl)])') gives me
> this error message:
>
> ```
> [13:01:41] SMARTS Parse Error: syntax error while parsing:
> ([Cl!$(Cl~c)_100].[c!$(c~Cl)_101])
> [13:01:41] SMARTS Parse Error: Failed parsing SMARTS
> '([Cl!$(Cl~c)_100].[c!$(c~Cl)_101])' for input: '([Cl!$(Cl~c)].[c!$(c~Cl)])'
> ```
> My understanding of SMARTS is that the outermost parentheses in this
> SMARTS string are required to force the chlorine and the aromatic carbon to
> be somewhere in the same covalently connected fragment.  E.g. this pattern
> *should* hit benzyl chloride ClCc1c1 but should *not* hit the
> hydrochloride salt of aniline Cl.Nc1c1.
>
> What am I getting wrong?  Is there a way to write rdkit-parsable SMARTS
> that achieves this?  (I want to filter our molecules that contain more than
> one of certain moieties, while allowing molecules that have one (or zero)
> such moieties.  But salts or covalently disconnected fragments that each
> contain one instance of the moiety should be fine.)
>
> Details on my setup:
>
> - RDKit Version: 2019.09.3
> - Operating system: macOS 10.15.2
> - Python version (if relevant): 3.6
> - Are you using conda? yes
> - If you are using conda, which channel did you install the rdkit from?
> `conda-forge`
> - If you are not using conda: how did you install the RDKit?
>
> Curt
>
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] multiple SMARTS that match only if in the same fragment

2020-03-07 Thread Curt Fischer
Hi rdkit fiends!

The [Daylight SMARTS example page](
https://daylight.com/dayhtml_tutorials/languages/smarts/smarts_examples.html)
gives several examples for "multiple group" smarts, including these strings:

([Cl!$(Cl~c)].[c!$(c~Cl)])
([Cl]).([c])
([Cl].[c])
[NX3;H2,H1;!$(NC=O)].[NX3;H2,H1;!$(NC=O)]

In general, I cannot get these to be parsed by Chem.MolFromSmarts().

For example,  Chem.MolFromSmarts('([Cl!$(Cl~c)].[c!$(c~Cl)])') gives me
this error message:

```
[13:01:41] SMARTS Parse Error: syntax error while parsing:
([Cl!$(Cl~c)_100].[c!$(c~Cl)_101])
[13:01:41] SMARTS Parse Error: Failed parsing SMARTS
'([Cl!$(Cl~c)_100].[c!$(c~Cl)_101])' for input: '([Cl!$(Cl~c)].[c!$(c~Cl)])'
```
My understanding of SMARTS is that the outermost parentheses in this SMARTS
string are required to force the chlorine and the aromatic carbon to be
somewhere in the same covalently connected fragment.  E.g. this pattern
*should* hit benzyl chloride ClCc1c1 but should *not* hit the
hydrochloride salt of aniline Cl.Nc1c1.

What am I getting wrong?  Is there a way to write rdkit-parsable SMARTS
that achieves this?  (I want to filter our molecules that contain more than
one of certain moieties, while allowing molecules that have one (or zero)
such moieties.  But salts or covalently disconnected fragments that each
contain one instance of the moiety should be fine.)

Details on my setup:

- RDKit Version: 2019.09.3
- Operating system: macOS 10.15.2
- Python version (if relevant): 3.6
- Are you using conda? yes
- If you are using conda, which channel did you install the rdkit from?
`conda-forge`
- If you are not using conda: how did you install the RDKit?

Curt
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss