Hi

It amounts to the same thing - either do all tests on one atom, or one test
on all atoms.

The syntax is shorter for the latter if you can use the vector bindings but
may not be otherwise, especially if multiple exclusions are needed.

Regards,
Chris Earnshaw



On 24 Sep 2017 16:54, "David Cosgrove" <davidacosgrov...@gmail.com> wrote:

Hi,
I think Chris' solution is a bit overly complicated, though I haven't
tested my alternative.  If each atom in the ring is tested for
'[$(a);!$(n1(C)ccc(=O)nc1=O)]', as you'd get if you expanded out the vector
bindings I provided previously, then I don't think you need to provide the
SMARTS for the excluded ring starting from each atom.  So long as 1 of the
atoms in the ring fails the test, the whole ring fails, so you just need
the same test on each atom.
Dave


On Sun, Sep 24, 2017 at 4:45 PM, Chris Earnshaw <cgearns...@gmail.com>
wrote:

> Hi Jim
>
> The key thing to remember about the recursive SMARTS clauses is that
> they only match one atom (the first), and the rest of the string
> describes the environment in which that atom is located. So the clause
> $(n1(C)ccc(=O)nc1=O) matches just the nitrogen atom - which has
> embedded in the rest of the ring system. We then negate that with the
> ! symbol.
>
> If we use just the recursive SMARTS expression '[$(a)]' (or the simple
> SMARTS 'a'), it can match any of the six aromatic atoms in the
> heterocycle. Adding the first exclusion '[$(a);!$(n1(C)ccc(=O)nc1=O)]'
> means this atom can't match the nitrogen substituted by aliphatic
> C,but it can still match any of the other five aromatic atoms.
> Consequently there are five more exclusion clauses to add, each of
> which starts with a different one of the aromatic atoms in your
> undesired structure. As long as one of the atoms in the full SMARTS is
> prevented from matching any of the atoms in the undesired structure in
> this way, then the overall match is prevented.
>
> Adding an exclusion for pyridine is then easy. We're already excluding
> six patterns, and (considering symmetry) we only need to add four more
> to exclude all pyridines. Appending
> ';!$(n1ccccc1);!$(c1ncccc1);!$(c1cnccc1);!$(c1ccncc1)' inside the
> square brackets should do the trick.
>
> You're quite right though, this gets pretty cumbersome very quickly
> and it may well be best to handle it in code with simple include /
> exclude SMARTS patterns. You'll have to think about checking which
> atoms have been matched - for example, do you want to match quinoline
> because it contains a benzene ring, or exclude it because it contains
> a pyridine? If the former you'll have to check that the atoms matched
> by your two patterns are different.
>
> Hope this helps!
>
> Chris Earnshaw
>
> On 24 September 2017 at 15:01, James T. Metz <jamestm...@aol.com> wrote:
> > Chris,
> >
> > Wow! Your recursive SMARTS expression works as needed!
> >
> > Hmmm... Help me understand this better ... it looks like you "walk
> around"
> > the
> > ring of the substructure we want to exclude and employ a slightly
> different
> > recursive SMARTS beginning at that atom.  Is that correct?
> >
> > Also, since my situation is likely to get more complicated with respect
> to
> > exclusions, suppose I still wanted to utilize the general aromatic
> > expression
> > for a 6-membered ring i.e. [a]1:[a]:[a]:[a][a]:[a]1, and I wanted to
> exclude
> > the structures we have been discussing, and I also wanted to exclude
> > pyridine i.e., [n]1:[c]:[c]:[c]:[c]:[c]1.
> >
> > Is there a SMARTS expression that would capture 2 exclusions?
> >
> > Perhaps this is getting too clumsy!  It might be better to have one or
> more
> > inclusion SMARTS and one or more exclusion SMARTS, and write code
> > to remove those groups of atoms that are coming from the exclusion
> SMARTS.
> >
> > Any ideas for PYTHON/RDkit code?  Something like
> >
> > test_smiles = 'c1ccccc1'
> > inclusion_pattern = '[a]1:[a]:[a]:[a]:[a]:[a]1'
> > exclusion_pattern = '[n]1:[c]:[c]:[c]:[c]:[c]1'
> > etc...
> >
> > Hmmm... any other ideas, suggestions, comments?
> >
> > Thanks again.
> >
> > Regards,
> > Jim Metz
> >
> >
> >
> >
> > -----Original Message-----
> > From: Chris Earnshaw <cgearns...@gmail.com>
> > To: James T. Metz <jamestm...@aol.com>
> > Cc: Rdkit-discuss@lists.sourceforge.net
> > <rdkit-discuss@lists.sourceforge.net>
> > Sent: Sun, Sep 24, 2017 4:01 am
> > Subject: Re: [Rdkit-discuss] need SMARTS query with a specific exclusion
> >
> > Hi Jim
> >
> > It can be done with recursive SMARTS, though the syntax is a bit
> > painful This may do what you want -
> > [$(a);!$(n1(C)ccc(=O)nc1=O);!$(c1cc(=O)nc(=O)n1C);!$(c1c(=O)
> nc(=O)n(C)c1);!$(c(=O)1nc(=O)n(C)cc1);!$(n1c(=O)n(C)ccc1=O);
> !$(c(=O)1n(C)ccc(=O)n1)]:1:a:a:a:a:a:1
> >
> > Its basically the general 6-ring aromatic pattern a:1:a:a:a:a:a:1,
> > with recursive SMARTS applied to the first atom to ensure that this
> > can't match any of the 6 ring atoms in your undesired system.
> >
> > Regards,
> > Chris Earnshaw
> >
> > On 24 September 2017 at 05:04, James T. Metz via Rdkit-discuss
> > <rdkit-discuss@lists.sourceforge.net> wrote:
> >> Hello,
> >>
> >> Suppose I have the following molecule
> >>
> >> m = 'CN1C=CC(=O)NC1=O'
> >>
> >> I would like to be able to use a SMARTS pattern
> >>
> >> pattern = '[a]1:[a][a]:[a]:[a]:a]1'
> >>
> >> to recognize the 6 atoms in a typical aromatic ring, but
> >> I do not want to recognize the 6 atoms in the molecule,
> >> m, as aromatic. In other words, I am trying to write
> >> a specific exclusion.
> >>
> >> Is it possible to modify the SMARTS pattern to
> >> exclude the above molecule? I have tried using
> >> recursive SMARTS, but I can't get the syntax to
> >> work.
> >>
> >> Any ideas? Thank you.
> >>
> >> Regards,
> >> Jim Metz
> >>
> >>
> >>
> >>
> >> ------------------------------------------------------------
> ------------------
> >> Check out the vibrant tech community on one of the world's most
> >> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> >> _______________________________________________
> >> Rdkit-discuss mailing list
> >> Rdkit-discuss@lists.sourceforge.net
> >> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
> >>
>
> ------------------------------------------------------------
> ------------------
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> _______________________________________________
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>



-- 
David Cosgrove
Freelance computational chemistry and chemoinformatics developer
http://cozchemix.co.uk
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to