Hi,
I think Chris' solution is a bit overly complicated, though I haven't
tested my alternative.  If each atom in the ring is tested for
'[$(a);!$(n1(C)ccc(=O)nc1=O)]', as you'd get if you expanded out the vector
bindings I provided previously, then I don't think you need to provide the
SMARTS for the excluded ring starting from each atom.  So long as 1 of the
atoms in the ring fails the test, the whole ring fails, so you just need
the same test on each atom.
Dave


On Sun, Sep 24, 2017 at 4:45 PM, Chris Earnshaw <cgearns...@gmail.com>
wrote:

> Hi Jim
>
> The key thing to remember about the recursive SMARTS clauses is that
> they only match one atom (the first), and the rest of the string
> describes the environment in which that atom is located. So the clause
> $(n1(C)ccc(=O)nc1=O) matches just the nitrogen atom - which has
> embedded in the rest of the ring system. We then negate that with the
> ! symbol.
>
> If we use just the recursive SMARTS expression '[$(a)]' (or the simple
> SMARTS 'a'), it can match any of the six aromatic atoms in the
> heterocycle. Adding the first exclusion '[$(a);!$(n1(C)ccc(=O)nc1=O)]'
> means this atom can't match the nitrogen substituted by aliphatic
> C,but it can still match any of the other five aromatic atoms.
> Consequently there are five more exclusion clauses to add, each of
> which starts with a different one of the aromatic atoms in your
> undesired structure. As long as one of the atoms in the full SMARTS is
> prevented from matching any of the atoms in the undesired structure in
> this way, then the overall match is prevented.
>
> Adding an exclusion for pyridine is then easy. We're already excluding
> six patterns, and (considering symmetry) we only need to add four more
> to exclude all pyridines. Appending
> ';!$(n1ccccc1);!$(c1ncccc1);!$(c1cnccc1);!$(c1ccncc1)' inside the
> square brackets should do the trick.
>
> You're quite right though, this gets pretty cumbersome very quickly
> and it may well be best to handle it in code with simple include /
> exclude SMARTS patterns. You'll have to think about checking which
> atoms have been matched - for example, do you want to match quinoline
> because it contains a benzene ring, or exclude it because it contains
> a pyridine? If the former you'll have to check that the atoms matched
> by your two patterns are different.
>
> Hope this helps!
>
> Chris Earnshaw
>
> On 24 September 2017 at 15:01, James T. Metz <jamestm...@aol.com> wrote:
> > Chris,
> >
> > Wow! Your recursive SMARTS expression works as needed!
> >
> > Hmmm... Help me understand this better ... it looks like you "walk
> around"
> > the
> > ring of the substructure we want to exclude and employ a slightly
> different
> > recursive SMARTS beginning at that atom.  Is that correct?
> >
> > Also, since my situation is likely to get more complicated with respect
> to
> > exclusions, suppose I still wanted to utilize the general aromatic
> > expression
> > for a 6-membered ring i.e. [a]1:[a]:[a]:[a][a]:[a]1, and I wanted to
> exclude
> > the structures we have been discussing, and I also wanted to exclude
> > pyridine i.e., [n]1:[c]:[c]:[c]:[c]:[c]1.
> >
> > Is there a SMARTS expression that would capture 2 exclusions?
> >
> > Perhaps this is getting too clumsy!  It might be better to have one or
> more
> > inclusion SMARTS and one or more exclusion SMARTS, and write code
> > to remove those groups of atoms that are coming from the exclusion
> SMARTS.
> >
> > Any ideas for PYTHON/RDkit code?  Something like
> >
> > test_smiles = 'c1ccccc1'
> > inclusion_pattern = '[a]1:[a]:[a]:[a]:[a]:[a]1'
> > exclusion_pattern = '[n]1:[c]:[c]:[c]:[c]:[c]1'
> > etc...
> >
> > Hmmm... any other ideas, suggestions, comments?
> >
> > Thanks again.
> >
> > Regards,
> > Jim Metz
> >
> >
> >
> >
> > -----Original Message-----
> > From: Chris Earnshaw <cgearns...@gmail.com>
> > To: James T. Metz <jamestm...@aol.com>
> > Cc: Rdkit-discuss@lists.sourceforge.net
> > <rdkit-discuss@lists.sourceforge.net>
> > Sent: Sun, Sep 24, 2017 4:01 am
> > Subject: Re: [Rdkit-discuss] need SMARTS query with a specific exclusion
> >
> > Hi Jim
> >
> > It can be done with recursive SMARTS, though the syntax is a bit
> > painful This may do what you want -
> > [$(a);!$(n1(C)ccc(=O)nc1=O);!$(c1cc(=O)nc(=O)n1C);!$(c1c(=O)
> nc(=O)n(C)c1);!$(c(=O)1nc(=O)n(C)cc1);!$(n1c(=O)n(C)ccc1=O)
> ;!$(c(=O)1n(C)ccc(=O)n1)]:1:a:a:a:a:a:1
> >
> > Its basically the general 6-ring aromatic pattern a:1:a:a:a:a:a:1,
> > with recursive SMARTS applied to the first atom to ensure that this
> > can't match any of the 6 ring atoms in your undesired system.
> >
> > Regards,
> > Chris Earnshaw
> >
> > On 24 September 2017 at 05:04, James T. Metz via Rdkit-discuss
> > <rdkit-discuss@lists.sourceforge.net> wrote:
> >> Hello,
> >>
> >> Suppose I have the following molecule
> >>
> >> m = 'CN1C=CC(=O)NC1=O'
> >>
> >> I would like to be able to use a SMARTS pattern
> >>
> >> pattern = '[a]1:[a][a]:[a]:[a]:a]1'
> >>
> >> to recognize the 6 atoms in a typical aromatic ring, but
> >> I do not want to recognize the 6 atoms in the molecule,
> >> m, as aromatic. In other words, I am trying to write
> >> a specific exclusion.
> >>
> >> Is it possible to modify the SMARTS pattern to
> >> exclude the above molecule? I have tried using
> >> recursive SMARTS, but I can't get the syntax to
> >> work.
> >>
> >> Any ideas? Thank you.
> >>
> >> Regards,
> >> Jim Metz
> >>
> >>
> >>
> >>
> >> ------------------------------------------------------------
> ------------------
> >> Check out the vibrant tech community on one of the world's most
> >> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> >> _______________________________________________
> >> Rdkit-discuss mailing list
> >> Rdkit-discuss@lists.sourceforge.net
> >> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
> >>
>
> ------------------------------------------------------------
> ------------------
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> _______________________________________________
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>



-- 
David Cosgrove
Freelance computational chemistry and chemoinformatics developer
http://cozchemix.co.uk
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to