Re: [Rdkit-discuss] need SMARTS query with a specific exclusion

2017-09-24 Thread David Cosgrove
Hi Chris,
Sure they're equivalent, but with my suggestion you don't have to create
all 6 different SMARTS patterns, which whilst not difficult is likely to be
prone to silly errors.  You can stick a long list of OR'd vector bindings
together to put in all the exclusions you want on each atom as you think of
them.
Dave


On Sun, Sep 24, 2017 at 5:15 PM, Chris Earnshaw <cgearns...@gmail.com>
wrote:

> Hi
>
> It amounts to the same thing - either do all tests on one atom, or one
> test on all atoms.
>
> The syntax is shorter for the latter if you can use the vector bindings
> but may not be otherwise, especially if multiple exclusions are needed.
>
> Regards,
> Chris Earnshaw
>
>
>
> On 24 Sep 2017 16:54, "David Cosgrove" <davidacosgrov...@gmail.com> wrote:
>
> Hi,
> I think Chris' solution is a bit overly complicated, though I haven't
> tested my alternative.  If each atom in the ring is tested for
> '[$(a);!$(n1(C)ccc(=O)nc1=O)]', as you'd get if you expanded out the
> vector bindings I provided previously, then I don't think you need to
> provide the SMARTS for the excluded ring starting from each atom.  So long
> as 1 of the atoms in the ring fails the test, the whole ring fails, so you
> just need the same test on each atom.
> Dave
>
>
> On Sun, Sep 24, 2017 at 4:45 PM, Chris Earnshaw <cgearns...@gmail.com>
> wrote:
>
>> Hi Jim
>>
>> The key thing to remember about the recursive SMARTS clauses is that
>> they only match one atom (the first), and the rest of the string
>> describes the environment in which that atom is located. So the clause
>> $(n1(C)ccc(=O)nc1=O) matches just the nitrogen atom - which has
>> embedded in the rest of the ring system. We then negate that with the
>> ! symbol.
>>
>> If we use just the recursive SMARTS expression '[$(a)]' (or the simple
>> SMARTS 'a'), it can match any of the six aromatic atoms in the
>> heterocycle. Adding the first exclusion '[$(a);!$(n1(C)ccc(=O)nc1=O)]'
>> means this atom can't match the nitrogen substituted by aliphatic
>> C,but it can still match any of the other five aromatic atoms.
>> Consequently there are five more exclusion clauses to add, each of
>> which starts with a different one of the aromatic atoms in your
>> undesired structure. As long as one of the atoms in the full SMARTS is
>> prevented from matching any of the atoms in the undesired structure in
>> this way, then the overall match is prevented.
>>
>> Adding an exclusion for pyridine is then easy. We're already excluding
>> six patterns, and (considering symmetry) we only need to add four more
>> to exclude all pyridines. Appending
>> ';!$(n1c1);!$(c1n1);!$(c1cnccc1);!$(c1ccncc1)' inside the
>> square brackets should do the trick.
>>
>> You're quite right though, this gets pretty cumbersome very quickly
>> and it may well be best to handle it in code with simple include /
>> exclude SMARTS patterns. You'll have to think about checking which
>> atoms have been matched - for example, do you want to match quinoline
>> because it contains a benzene ring, or exclude it because it contains
>> a pyridine? If the former you'll have to check that the atoms matched
>> by your two patterns are different.
>>
>> Hope this helps!
>>
>> Chris Earnshaw
>>
>> On 24 September 2017 at 15:01, James T. Metz <jamestm...@aol.com> wrote:
>> > Chris,
>> >
>> > Wow! Your recursive SMARTS expression works as needed!
>> >
>> > Hmmm... Help me understand this better ... it looks like you "walk
>> around"
>> > the
>> > ring of the substructure we want to exclude and employ a slightly
>> different
>> > recursive SMARTS beginning at that atom.  Is that correct?
>> >
>> > Also, since my situation is likely to get more complicated with respect
>> to
>> > exclusions, suppose I still wanted to utilize the general aromatic
>> > expression
>> > for a 6-membered ring i.e. [a]1:[a]:[a]:[a][a]:[a]1, and I wanted to
>> exclude
>> > the structures we have been discussing, and I also wanted to exclude
>> > pyridine i.e., [n]1:[c]:[c]:[c]:[c]:[c]1.
>> >
>> > Is there a SMARTS expression that would capture 2 exclusions?
>> >
>> > Perhaps this is getting too clumsy!  It might be better to have one or
>> more
>> > inclusion SMARTS and one or more exclusion SMARTS, and write code
>> > to remove those groups of atoms that are coming from the exclusion
>> SMARTS.
>> >
>> > Any ideas for PYTHON/RDkit code?  Something like
>>

Re: [Rdkit-discuss] need SMARTS query with a specific exclusion

2017-09-24 Thread Chris Earnshaw
Hi

It amounts to the same thing - either do all tests on one atom, or one test
on all atoms.

The syntax is shorter for the latter if you can use the vector bindings but
may not be otherwise, especially if multiple exclusions are needed.

Regards,
Chris Earnshaw



On 24 Sep 2017 16:54, "David Cosgrove" <davidacosgrov...@gmail.com> wrote:

Hi,
I think Chris' solution is a bit overly complicated, though I haven't
tested my alternative.  If each atom in the ring is tested for
'[$(a);!$(n1(C)ccc(=O)nc1=O)]', as you'd get if you expanded out the vector
bindings I provided previously, then I don't think you need to provide the
SMARTS for the excluded ring starting from each atom.  So long as 1 of the
atoms in the ring fails the test, the whole ring fails, so you just need
the same test on each atom.
Dave


On Sun, Sep 24, 2017 at 4:45 PM, Chris Earnshaw <cgearns...@gmail.com>
wrote:

> Hi Jim
>
> The key thing to remember about the recursive SMARTS clauses is that
> they only match one atom (the first), and the rest of the string
> describes the environment in which that atom is located. So the clause
> $(n1(C)ccc(=O)nc1=O) matches just the nitrogen atom - which has
> embedded in the rest of the ring system. We then negate that with the
> ! symbol.
>
> If we use just the recursive SMARTS expression '[$(a)]' (or the simple
> SMARTS 'a'), it can match any of the six aromatic atoms in the
> heterocycle. Adding the first exclusion '[$(a);!$(n1(C)ccc(=O)nc1=O)]'
> means this atom can't match the nitrogen substituted by aliphatic
> C,but it can still match any of the other five aromatic atoms.
> Consequently there are five more exclusion clauses to add, each of
> which starts with a different one of the aromatic atoms in your
> undesired structure. As long as one of the atoms in the full SMARTS is
> prevented from matching any of the atoms in the undesired structure in
> this way, then the overall match is prevented.
>
> Adding an exclusion for pyridine is then easy. We're already excluding
> six patterns, and (considering symmetry) we only need to add four more
> to exclude all pyridines. Appending
> ';!$(n1c1);!$(c1n1);!$(c1cnccc1);!$(c1ccncc1)' inside the
> square brackets should do the trick.
>
> You're quite right though, this gets pretty cumbersome very quickly
> and it may well be best to handle it in code with simple include /
> exclude SMARTS patterns. You'll have to think about checking which
> atoms have been matched - for example, do you want to match quinoline
> because it contains a benzene ring, or exclude it because it contains
> a pyridine? If the former you'll have to check that the atoms matched
> by your two patterns are different.
>
> Hope this helps!
>
> Chris Earnshaw
>
> On 24 September 2017 at 15:01, James T. Metz <jamestm...@aol.com> wrote:
> > Chris,
> >
> > Wow! Your recursive SMARTS expression works as needed!
> >
> > Hmmm... Help me understand this better ... it looks like you "walk
> around"
> > the
> > ring of the substructure we want to exclude and employ a slightly
> different
> > recursive SMARTS beginning at that atom.  Is that correct?
> >
> > Also, since my situation is likely to get more complicated with respect
> to
> > exclusions, suppose I still wanted to utilize the general aromatic
> > expression
> > for a 6-membered ring i.e. [a]1:[a]:[a]:[a][a]:[a]1, and I wanted to
> exclude
> > the structures we have been discussing, and I also wanted to exclude
> > pyridine i.e., [n]1:[c]:[c]:[c]:[c]:[c]1.
> >
> > Is there a SMARTS expression that would capture 2 exclusions?
> >
> > Perhaps this is getting too clumsy!  It might be better to have one or
> more
> > inclusion SMARTS and one or more exclusion SMARTS, and write code
> > to remove those groups of atoms that are coming from the exclusion
> SMARTS.
> >
> > Any ideas for PYTHON/RDkit code?  Something like
> >
> > test_smiles = 'c1c1'
> > inclusion_pattern = '[a]1:[a]:[a]:[a]:[a]:[a]1'
> > exclusion_pattern = '[n]1:[c]:[c]:[c]:[c]:[c]1'
> > etc...
> >
> > Hmmm... any other ideas, suggestions, comments?
> >
> > Thanks again.
> >
> > Regards,
> > Jim Metz
> >
> >
> >
> >
> > -Original Message-
> > From: Chris Earnshaw <cgearns...@gmail.com>
> > To: James T. Metz <jamestm...@aol.com>
> > Cc: Rdkit-discuss@lists.sourceforge.net
> > <rdkit-discuss@lists.sourceforge.net>
> > Sent: Sun, Sep 24, 2017 4:01 am
> > Subject: Re: [Rdkit-discuss] need SMARTS query with a specific exclusion
> >
> > Hi Jim
> >
> > It can be done with re

Re: [Rdkit-discuss] need SMARTS query with a specific exclusion

2017-09-24 Thread David Cosgrove
Hi,
I think Chris' solution is a bit overly complicated, though I haven't
tested my alternative.  If each atom in the ring is tested for
'[$(a);!$(n1(C)ccc(=O)nc1=O)]', as you'd get if you expanded out the vector
bindings I provided previously, then I don't think you need to provide the
SMARTS for the excluded ring starting from each atom.  So long as 1 of the
atoms in the ring fails the test, the whole ring fails, so you just need
the same test on each atom.
Dave


On Sun, Sep 24, 2017 at 4:45 PM, Chris Earnshaw <cgearns...@gmail.com>
wrote:

> Hi Jim
>
> The key thing to remember about the recursive SMARTS clauses is that
> they only match one atom (the first), and the rest of the string
> describes the environment in which that atom is located. So the clause
> $(n1(C)ccc(=O)nc1=O) matches just the nitrogen atom - which has
> embedded in the rest of the ring system. We then negate that with the
> ! symbol.
>
> If we use just the recursive SMARTS expression '[$(a)]' (or the simple
> SMARTS 'a'), it can match any of the six aromatic atoms in the
> heterocycle. Adding the first exclusion '[$(a);!$(n1(C)ccc(=O)nc1=O)]'
> means this atom can't match the nitrogen substituted by aliphatic
> C,but it can still match any of the other five aromatic atoms.
> Consequently there are five more exclusion clauses to add, each of
> which starts with a different one of the aromatic atoms in your
> undesired structure. As long as one of the atoms in the full SMARTS is
> prevented from matching any of the atoms in the undesired structure in
> this way, then the overall match is prevented.
>
> Adding an exclusion for pyridine is then easy. We're already excluding
> six patterns, and (considering symmetry) we only need to add four more
> to exclude all pyridines. Appending
> ';!$(n1c1);!$(c1n1);!$(c1cnccc1);!$(c1ccncc1)' inside the
> square brackets should do the trick.
>
> You're quite right though, this gets pretty cumbersome very quickly
> and it may well be best to handle it in code with simple include /
> exclude SMARTS patterns. You'll have to think about checking which
> atoms have been matched - for example, do you want to match quinoline
> because it contains a benzene ring, or exclude it because it contains
> a pyridine? If the former you'll have to check that the atoms matched
> by your two patterns are different.
>
> Hope this helps!
>
> Chris Earnshaw
>
> On 24 September 2017 at 15:01, James T. Metz <jamestm...@aol.com> wrote:
> > Chris,
> >
> > Wow! Your recursive SMARTS expression works as needed!
> >
> > Hmmm... Help me understand this better ... it looks like you "walk
> around"
> > the
> > ring of the substructure we want to exclude and employ a slightly
> different
> > recursive SMARTS beginning at that atom.  Is that correct?
> >
> > Also, since my situation is likely to get more complicated with respect
> to
> > exclusions, suppose I still wanted to utilize the general aromatic
> > expression
> > for a 6-membered ring i.e. [a]1:[a]:[a]:[a][a]:[a]1, and I wanted to
> exclude
> > the structures we have been discussing, and I also wanted to exclude
> > pyridine i.e., [n]1:[c]:[c]:[c]:[c]:[c]1.
> >
> > Is there a SMARTS expression that would capture 2 exclusions?
> >
> > Perhaps this is getting too clumsy!  It might be better to have one or
> more
> > inclusion SMARTS and one or more exclusion SMARTS, and write code
> > to remove those groups of atoms that are coming from the exclusion
> SMARTS.
> >
> > Any ideas for PYTHON/RDkit code?  Something like
> >
> > test_smiles = 'c1c1'
> > inclusion_pattern = '[a]1:[a]:[a]:[a]:[a]:[a]1'
> > exclusion_pattern = '[n]1:[c]:[c]:[c]:[c]:[c]1'
> > etc...
> >
> > Hmmm... any other ideas, suggestions, comments?
> >
> > Thanks again.
> >
> > Regards,
> > Jim Metz
> >
> >
> >
> >
> > -Original Message-
> > From: Chris Earnshaw <cgearns...@gmail.com>
> > To: James T. Metz <jamestm...@aol.com>
> > Cc: Rdkit-discuss@lists.sourceforge.net
> > <rdkit-discuss@lists.sourceforge.net>
> > Sent: Sun, Sep 24, 2017 4:01 am
> > Subject: Re: [Rdkit-discuss] need SMARTS query with a specific exclusion
> >
> > Hi Jim
> >
> > It can be done with recursive SMARTS, though the syntax is a bit
> > painful This may do what you want -
> > [$(a);!$(n1(C)ccc(=O)nc1=O);!$(c1cc(=O)nc(=O)n1C);!$(c1c(=O)
> nc(=O)n(C)c1);!$(c(=O)1nc(=O)n(C)cc1);!$(n1c(=O)n(C)ccc1=O)
> ;!$(c(=O)1n(C)ccc(=O)n1)]:1:a:a:a:a:a:1
> >
> > Its basically the general 6-ring aromatic pattern a

Re: [Rdkit-discuss] need SMARTS query with a specific exclusion

2017-09-24 Thread Chris Earnshaw
Hi Jim

The key thing to remember about the recursive SMARTS clauses is that
they only match one atom (the first), and the rest of the string
describes the environment in which that atom is located. So the clause
$(n1(C)ccc(=O)nc1=O) matches just the nitrogen atom - which has
embedded in the rest of the ring system. We then negate that with the
! symbol.

If we use just the recursive SMARTS expression '[$(a)]' (or the simple
SMARTS 'a'), it can match any of the six aromatic atoms in the
heterocycle. Adding the first exclusion '[$(a);!$(n1(C)ccc(=O)nc1=O)]'
means this atom can't match the nitrogen substituted by aliphatic
C,but it can still match any of the other five aromatic atoms.
Consequently there are five more exclusion clauses to add, each of
which starts with a different one of the aromatic atoms in your
undesired structure. As long as one of the atoms in the full SMARTS is
prevented from matching any of the atoms in the undesired structure in
this way, then the overall match is prevented.

Adding an exclusion for pyridine is then easy. We're already excluding
six patterns, and (considering symmetry) we only need to add four more
to exclude all pyridines. Appending
';!$(n1c1);!$(c1n1);!$(c1cnccc1);!$(c1ccncc1)' inside the
square brackets should do the trick.

You're quite right though, this gets pretty cumbersome very quickly
and it may well be best to handle it in code with simple include /
exclude SMARTS patterns. You'll have to think about checking which
atoms have been matched - for example, do you want to match quinoline
because it contains a benzene ring, or exclude it because it contains
a pyridine? If the former you'll have to check that the atoms matched
by your two patterns are different.

Hope this helps!

Chris Earnshaw

On 24 September 2017 at 15:01, James T. Metz <jamestm...@aol.com> wrote:
> Chris,
>
> Wow! Your recursive SMARTS expression works as needed!
>
> Hmmm... Help me understand this better ... it looks like you "walk around"
> the
> ring of the substructure we want to exclude and employ a slightly different
> recursive SMARTS beginning at that atom.  Is that correct?
>
> Also, since my situation is likely to get more complicated with respect to
> exclusions, suppose I still wanted to utilize the general aromatic
> expression
> for a 6-membered ring i.e. [a]1:[a]:[a]:[a][a]:[a]1, and I wanted to exclude
> the structures we have been discussing, and I also wanted to exclude
> pyridine i.e., [n]1:[c]:[c]:[c]:[c]:[c]1.
>
> Is there a SMARTS expression that would capture 2 exclusions?
>
> Perhaps this is getting too clumsy!  It might be better to have one or more
> inclusion SMARTS and one or more exclusion SMARTS, and write code
> to remove those groups of atoms that are coming from the exclusion SMARTS.
>
> Any ideas for PYTHON/RDkit code?  Something like
>
> test_smiles = 'c1c1'
> inclusion_pattern = '[a]1:[a]:[a]:[a]:[a]:[a]1'
> exclusion_pattern = '[n]1:[c]:[c]:[c]:[c]:[c]1'
> etc...
>
> Hmmm... any other ideas, suggestions, comments?
>
> Thanks again.
>
> Regards,
> Jim Metz
>
>
>
>
> -Original Message-
> From: Chris Earnshaw <cgearns...@gmail.com>
> To: James T. Metz <jamestm...@aol.com>
> Cc: Rdkit-discuss@lists.sourceforge.net
> <rdkit-discuss@lists.sourceforge.net>
> Sent: Sun, Sep 24, 2017 4:01 am
> Subject: Re: [Rdkit-discuss] need SMARTS query with a specific exclusion
>
> Hi Jim
>
> It can be done with recursive SMARTS, though the syntax is a bit
> painful This may do what you want -
> [$(a);!$(n1(C)ccc(=O)nc1=O);!$(c1cc(=O)nc(=O)n1C);!$(c1c(=O)nc(=O)n(C)c1);!$(c(=O)1nc(=O)n(C)cc1);!$(n1c(=O)n(C)ccc1=O);!$(c(=O)1n(C)ccc(=O)n1)]:1:a:a:a:a:a:1
>
> Its basically the general 6-ring aromatic pattern a:1:a:a:a:a:a:1,
> with recursive SMARTS applied to the first atom to ensure that this
> can't match any of the 6 ring atoms in your undesired system.
>
> Regards,
> Chris Earnshaw
>
> On 24 September 2017 at 05:04, James T. Metz via Rdkit-discuss
> <rdkit-discuss@lists.sourceforge.net> wrote:
>> Hello,
>>
>> Suppose I have the following molecule
>>
>> m = 'CN1C=CC(=O)NC1=O'
>>
>> I would like to be able to use a SMARTS pattern
>>
>> pattern = '[a]1:[a][a]:[a]:[a]:a]1'
>>
>> to recognize the 6 atoms in a typical aromatic ring, but
>> I do not want to recognize the 6 atoms in the molecule,
>> m, as aromatic. In other words, I am trying to write
>> a specific exclusion.
>>
>> Is it possible to modify the SMARTS pattern to
>> exclude the above molecule? I have tried using
>> recursive SMARTS, but I can't get the syntax to
>> work.
>>
>> Any ideas? Thank you.
>>
>> Regards,
>> Jim Metz
>&g

Re: [Rdkit-discuss] need SMARTS query with a specific exclusion

2017-09-24 Thread David Cosgrove
Hi Jim,
As a slight aside, this sort of thing demonstrates the value of what
Daylight used to call vector bindings (
http://www.daylight.com/dayhtml/doc/prog/prog.smarts.html#9.3) and which
one might these days call a macro.  For example, in the Daylight toolkit
you could bind the label HAL to [F,Cl,I,Br] and then write
c1([$HAL])c([$HAL])c([$HAL])c([$HAL])cc1 for a benzene with 4 halogen
substituents.  Not only is it clearer, but there's less typing.  Using such
a system for your query could go something like AR =
$(a);!$(n1(C)ccc(=O)nc1=O), followed by [$AR]1[$AR][$AR][$AR][$AR][$AR]1.
They could be nested, to, so that in the first example you could have
CHAL=$(c[$HAL]) and [$CHAL]1[$CHAL] It's relatively simple to write a
general function that just does an iterative string substitution of all
labels into the corresponding SMARTS pattern to reproduce the spirit of the
Daylight vector bindings.  They also used them for efficiency at the search
stage as well, but taking advantage of that would require changes to the
SMARTS parsing and searching code.
At the hackathon I started putting together exactly this sort of function
as part of a tautomer enumerator but had to leave to catch my plane before
I finished.  If I manage to finish it in the next few days I'll post it
here.
Cheers,
Dave

On Sun, Sep 24, 2017 at 3:01 PM, James T. Metz via Rdkit-discuss <
rdkit-discuss@lists.sourceforge.net> wrote:

> Chris,
>
> Wow! Your recursive SMARTS expression works as needed!
>
> Hmmm... Help me understand this better ... it looks like you "walk around"
> the
> ring of the substructure we want to exclude and employ a slightly
> different
> recursive SMARTS beginning at that atom.  Is that correct?
>
> Also, since my situation is likely to get more complicated with respect to
> exclusions, suppose I still wanted to utilize the general aromatic
> expression
> for a 6-membered ring i.e. [a]1:[a]:[a]:[a][a]:[a]1, and I wanted to
> exclude
> the structures we have been discussing, and I also wanted to exclude
> pyridine i.e., [n]1:[c]:[c]:[c]:[c]:[c]1.
>
> Is there a SMARTS expression that would capture 2 exclusions?
>
> Perhaps this is getting too clumsy!  It might be better to have one or more
> inclusion SMARTS and one or more exclusion SMARTS, and write code
> to remove those groups of atoms that are coming from the exclusion SMARTS.
>
> Any ideas for PYTHON/RDkit code?  Something like
>
> test_smiles = 'c1c1'
> inclusion_pattern = '[a]1:[a]:[a]:[a]:[a]:[a]1'
> exclusion_pattern = '[n]1:[c]:[c]:[c]:[c]:[c]1'
> etc...
>
> Hmmm... any other ideas, suggestions, comments?
>
> Thanks again.
>
> Regards,
> Jim Metz
>
>
>
>
> -Original Message-
> From: Chris Earnshaw <cgearns...@gmail.com>
> To: James T. Metz <jamestm...@aol.com>
> Cc: Rdkit-discuss@lists.sourceforge.net <rdkit-discuss@lists.
> sourceforge.net>
> Sent: Sun, Sep 24, 2017 4:01 am
> Subject: Re: [Rdkit-discuss] need SMARTS query with a specific exclusion
>
> Hi Jim
>
> It can be done with recursive SMARTS, though the syntax is a bit
> painful This may do what you want -
> [$(a);!$(n1(C)ccc(=O)nc1=O);!$(c1cc(=O)nc(=O)n1C);!$(c1c(=O)
> nc(=O)n(C)c1);!$(c(=O)1nc(=O)n(C)cc1);!$(n1c(=O)n(C)ccc1=O)
> ;!$(c(=O)1n(C)ccc(=O)n1)]:1:a:a:a:a:a:1
>
> Its basically the general 6-ring aromatic pattern a:1:a:a:a:a:a:1,
> with recursive SMARTS applied to the first atom to ensure that this
> can't match any of the 6 ring atoms in your undesired system.
>
> Regards,
> Chris Earnshaw
>
> On 24 September 2017 at 05:04, James T. Metz via Rdkit-discuss
> <rdkit-discuss@lists.sourceforge.net> wrote:
> > Hello,
> >
> > Suppose I have the following molecule
> >
> > m = 'CN1C=CC(=O)NC1=O'
> >
> > I would like to be able to use a SMARTS pattern
> >
> > pattern = '[a]1:[a][a]:[a]:[a]:a]1'
> >
> > to recognize the 6 atoms in a typical aromatic ring, but
> > I do not want to recognize the 6 atoms in the molecule,
> > m, as aromatic. In other words, I am trying to write
> > a specific exclusion.
> >
> > Is it possible to modify the SMARTS pattern to
> > exclude the above molecule? I have tried using
> > recursive SMARTS, but I can't get the syntax to
> > work.
> >
> > Any ideas? Thank you.
> >
> > Regards,
> > Jim Metz
> >
> >
> >
> > 
> --
> > Check out the vibrant tech community on one of the world's most
> > engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> > ___
> > Rdkit-discuss mailing list
> > Rd

Re: [Rdkit-discuss] need SMARTS query with a specific exclusion

2017-09-24 Thread James T. Metz via Rdkit-discuss
Chris,


Wow! Your recursive SMARTS expression works as needed!


Hmmm... Help me understand this better ... it looks like you "walk around" the
ring of the substructure we want to exclude and employ a slightly different 
recursive SMARTS beginning at that atom.  Is that correct?


Also, since my situation is likely to get more complicated with respect to
exclusions, suppose I still wanted to utilize the general aromatic expression
for a 6-membered ring i.e. [a]1:[a]:[a]:[a][a]:[a]1, and I wanted to exclude
the structures we have been discussing, and I also wanted to exclude
pyridine i.e., [n]1:[c]:[c]:[c]:[c]:[c]1.


Is there a SMARTS expression that would capture 2 exclusions?


Perhaps this is getting too clumsy!  It might be better to have one or more
inclusion SMARTS and one or more exclusion SMARTS, and write code
to remove those groups of atoms that are coming from the exclusion SMARTS.


Any ideas for PYTHON/RDkit code?  Something like


test_smiles = 'c1c1'
inclusion_pattern = '[a]1:[a]:[a]:[a]:[a]:[a]1'
exclusion_pattern = '[n]1:[c]:[c]:[c]:[c]:[c]1'
etc...


Hmmm... any other ideas, suggestions, comments?


Thanks again.


Regards,
Jim Metz








-Original Message-
From: Chris Earnshaw <cgearns...@gmail.com>
To: James T. Metz <jamestm...@aol.com>
Cc: Rdkit-discuss@lists.sourceforge.net <rdkit-discuss@lists.sourceforge.net>
Sent: Sun, Sep 24, 2017 4:01 am
Subject: Re: [Rdkit-discuss] need SMARTS query with a specific exclusion

Hi Jim

It can be done with recursive SMARTS, though the syntax is a bit
painful This may do what you want -
[$(a);!$(n1(C)ccc(=O)nc1=O);!$(c1cc(=O)nc(=O)n1C);!$(c1c(=O)nc(=O)n(C)c1);!$(c(=O)1nc(=O)n(C)cc1);!$(n1c(=O)n(C)ccc1=O);!$(c(=O)1n(C)ccc(=O)n1)]:1:a:a:a:a:a:1

Its basically the general 6-ring aromatic pattern a:1:a:a:a:a:a:1,
with recursive SMARTS applied to the first atom to ensure that this
can't match any of the 6 ring atoms in your undesired system.

Regards,
Chris Earnshaw

On 24 September 2017 at 05:04, James T. Metz via Rdkit-discuss
<rdkit-discuss@lists.sourceforge.net> wrote:
> Hello,
>
> Suppose I have the following molecule
>
> m = 'CN1C=CC(=O)NC1=O'
>
> I would like to be able to use a SMARTS pattern
>
> pattern = '[a]1:[a][a]:[a]:[a]:a]1'
>
> to recognize the 6 atoms in a typical aromatic ring, but
> I do not want to recognize the 6 atoms in the molecule,
> m, as aromatic.  In other words, I am trying to write
> a specific exclusion.
>
> Is it possible to modify the SMARTS pattern to
> exclude the above molecule?  I have tried using
> recursive SMARTS, but I can't get the syntax to
> work.
>
> Any ideas?  Thank you.
>
> Regards,
> Jim Metz
>
>
>
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>


--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] need SMARTS query with a specific exclusion

2017-09-24 Thread Chris Earnshaw
Hi Jim

It can be done with recursive SMARTS, though the syntax is a bit
painful This may do what you want -
[$(a);!$(n1(C)ccc(=O)nc1=O);!$(c1cc(=O)nc(=O)n1C);!$(c1c(=O)nc(=O)n(C)c1);!$(c(=O)1nc(=O)n(C)cc1);!$(n1c(=O)n(C)ccc1=O);!$(c(=O)1n(C)ccc(=O)n1)]:1:a:a:a:a:a:1

Its basically the general 6-ring aromatic pattern a:1:a:a:a:a:a:1,
with recursive SMARTS applied to the first atom to ensure that this
can't match any of the 6 ring atoms in your undesired system.

Regards,
Chris Earnshaw

On 24 September 2017 at 05:04, James T. Metz via Rdkit-discuss
 wrote:
> Hello,
>
> Suppose I have the following molecule
>
> m = 'CN1C=CC(=O)NC1=O'
>
> I would like to be able to use a SMARTS pattern
>
> pattern = '[a]1:[a][a]:[a]:[a]:a]1'
>
> to recognize the 6 atoms in a typical aromatic ring, but
> I do not want to recognize the 6 atoms in the molecule,
> m, as aromatic.  In other words, I am trying to write
> a specific exclusion.
>
> Is it possible to modify the SMARTS pattern to
> exclude the above molecule?  I have tried using
> recursive SMARTS, but I can't get the syntax to
> work.
>
> Any ideas?  Thank you.
>
> Regards,
> Jim Metz
>
>
>
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss