Re: [Rdkit-discuss] multiple SMARTS that match only if in the same fragment
One version I came up with is, assuming "query" is a Smarts-derived molecule that you want to ensure occurs once and only once in any single fragment in a set of molecules: def hasMultiSubstructPerFrag(mol, query): """ Determines whether mol has more than one match to query in a single covalently connected fragment. """ if mol.HasSubstructMatch(query): if any(len(frag.GetSubstructMatches(query)) > 1 for frag in rdmolops.GetMolFrags(mol, asMols=True) ): return True else: return False On Sat, Mar 7, 2020 at 4:02 PM Curt Fischer wrote: > Thanks Ivan -- very helpful. > > Is there any consensus on idioms for identifying multiple moieties in the > same fragment? Do I have to use len(mol.GetSubstructMatches(patt)) > 1 as > some kind of selector and then do some kind of graph traversal routine to > see if any of the matches are covalently connected? > > On Sat, Mar 7, 2020 at 3:34 PM Ivan Tubert-Brohman < > ivan.tubert-broh...@schrodinger.com> wrote: > >> Hi Curt, >> >> According to >> https://www.rdkit.org/docs/RDKit_Book.html#smarts-support-and-extensions , >> it's not supported: >> >> Here’s the (hopefully complete) list of SMARTS features that are *not* >>> supported: >>> >>>- Non-tetrahedral chiral classes >>> >>> >>>- the @? operator >>> >>> >>>- explicit atomic masses (though isotope queries are supported) >>> >>> >>>- component level grouping requiring matches in different >>>components, i.e. (C).(C) >>> >>> OK, the way it's worded it sounds like (C.C) might be supported (since >> that would be requiring matches in the same component), but as you've seen, >> it isn't supported either... >> >> Ivan >> >> >> On Sat, Mar 7, 2020 at 4:58 PM Curt Fischer >> wrote: >> >>> Hi rdkit fiends! >>> >>> The [Daylight SMARTS example page]( >>> https://daylight.com/dayhtml_tutorials/languages/smarts/smarts_examples.html) >>> gives several examples for "multiple group" smarts, including these strings: >>> >>> ([Cl!$(Cl~c)].[c!$(c~Cl)]) >>> ([Cl]).([c]) >>> ([Cl].[c]) >>> [NX3;H2,H1;!$(NC=O)].[NX3;H2,H1;!$(NC=O)] >>> >>> In general, I cannot get these to be parsed by Chem.MolFromSmarts(). >>> >>> For example, Chem.MolFromSmarts('([Cl!$(Cl~c)].[c!$(c~Cl)])') gives me >>> this error message: >>> >>> ``` >>> [13:01:41] SMARTS Parse Error: syntax error while parsing: >>> ([Cl!$(Cl~c)_100].[c!$(c~Cl)_101]) >>> [13:01:41] SMARTS Parse Error: Failed parsing SMARTS >>> '([Cl!$(Cl~c)_100].[c!$(c~Cl)_101])' for input: '([Cl!$(Cl~c)].[c!$(c~Cl)])' >>> ``` >>> My understanding of SMARTS is that the outermost parentheses in this >>> SMARTS string are required to force the chlorine and the aromatic carbon to >>> be somewhere in the same covalently connected fragment. E.g. this pattern >>> *should* hit benzyl chloride ClCc1c1 but should *not* hit the >>> hydrochloride salt of aniline Cl.Nc1c1. >>> >>> What am I getting wrong? Is there a way to write rdkit-parsable SMARTS >>> that achieves this? (I want to filter our molecules that contain more than >>> one of certain moieties, while allowing molecules that have one (or zero) >>> such moieties. But salts or covalently disconnected fragments that each >>> contain one instance of the moiety should be fine.) >>> >>> Details on my setup: >>> >>> - RDKit Version: 2019.09.3 >>> - Operating system: macOS 10.15.2 >>> - Python version (if relevant): 3.6 >>> - Are you using conda? yes >>> - If you are using conda, which channel did you install the rdkit from? >>> `conda-forge` >>> - If you are not using conda: how did you install the RDKit? >>> >>> Curt >>> >>> ___ >>> Rdkit-discuss mailing list >>> Rdkit-discuss@lists.sourceforge.net >>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss >>> >> ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] multiple SMARTS that match only if in the same fragment
Thanks Ivan -- very helpful. Is there any consensus on idioms for identifying multiple moieties in the same fragment? Do I have to use len(mol.GetSubstructMatches(patt)) > 1 as some kind of selector and then do some kind of graph traversal routine to see if any of the matches are covalently connected? On Sat, Mar 7, 2020 at 3:34 PM Ivan Tubert-Brohman < ivan.tubert-broh...@schrodinger.com> wrote: > Hi Curt, > > According to > https://www.rdkit.org/docs/RDKit_Book.html#smarts-support-and-extensions , > it's not supported: > > Here’s the (hopefully complete) list of SMARTS features that are *not* >> supported: >> >>- Non-tetrahedral chiral classes >> >> >>- the @? operator >> >> >>- explicit atomic masses (though isotope queries are supported) >> >> >>- component level grouping requiring matches in different components, >>i.e. (C).(C) >> >> OK, the way it's worded it sounds like (C.C) might be supported (since > that would be requiring matches in the same component), but as you've seen, > it isn't supported either... > > Ivan > > > On Sat, Mar 7, 2020 at 4:58 PM Curt Fischer > wrote: > >> Hi rdkit fiends! >> >> The [Daylight SMARTS example page]( >> https://daylight.com/dayhtml_tutorials/languages/smarts/smarts_examples.html) >> gives several examples for "multiple group" smarts, including these strings: >> >> ([Cl!$(Cl~c)].[c!$(c~Cl)]) >> ([Cl]).([c]) >> ([Cl].[c]) >> [NX3;H2,H1;!$(NC=O)].[NX3;H2,H1;!$(NC=O)] >> >> In general, I cannot get these to be parsed by Chem.MolFromSmarts(). >> >> For example, Chem.MolFromSmarts('([Cl!$(Cl~c)].[c!$(c~Cl)])') gives me >> this error message: >> >> ``` >> [13:01:41] SMARTS Parse Error: syntax error while parsing: >> ([Cl!$(Cl~c)_100].[c!$(c~Cl)_101]) >> [13:01:41] SMARTS Parse Error: Failed parsing SMARTS >> '([Cl!$(Cl~c)_100].[c!$(c~Cl)_101])' for input: '([Cl!$(Cl~c)].[c!$(c~Cl)])' >> ``` >> My understanding of SMARTS is that the outermost parentheses in this >> SMARTS string are required to force the chlorine and the aromatic carbon to >> be somewhere in the same covalently connected fragment. E.g. this pattern >> *should* hit benzyl chloride ClCc1c1 but should *not* hit the >> hydrochloride salt of aniline Cl.Nc1c1. >> >> What am I getting wrong? Is there a way to write rdkit-parsable SMARTS >> that achieves this? (I want to filter our molecules that contain more than >> one of certain moieties, while allowing molecules that have one (or zero) >> such moieties. But salts or covalently disconnected fragments that each >> contain one instance of the moiety should be fine.) >> >> Details on my setup: >> >> - RDKit Version: 2019.09.3 >> - Operating system: macOS 10.15.2 >> - Python version (if relevant): 3.6 >> - Are you using conda? yes >> - If you are using conda, which channel did you install the rdkit from? >> `conda-forge` >> - If you are not using conda: how did you install the RDKit? >> >> Curt >> >> ___ >> Rdkit-discuss mailing list >> Rdkit-discuss@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss >> > ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] multiple SMARTS that match only if in the same fragment
Hi Curt, According to https://www.rdkit.org/docs/RDKit_Book.html#smarts-support-and-extensions , it's not supported: Here’s the (hopefully complete) list of SMARTS features that are *not* > supported: > >- Non-tetrahedral chiral classes > > >- the @? operator > > >- explicit atomic masses (though isotope queries are supported) > > >- component level grouping requiring matches in different components, >i.e. (C).(C) > > OK, the way it's worded it sounds like (C.C) might be supported (since that would be requiring matches in the same component), but as you've seen, it isn't supported either... Ivan On Sat, Mar 7, 2020 at 4:58 PM Curt Fischer wrote: > Hi rdkit fiends! > > The [Daylight SMARTS example page]( > https://daylight.com/dayhtml_tutorials/languages/smarts/smarts_examples.html) > gives several examples for "multiple group" smarts, including these strings: > > ([Cl!$(Cl~c)].[c!$(c~Cl)]) > ([Cl]).([c]) > ([Cl].[c]) > [NX3;H2,H1;!$(NC=O)].[NX3;H2,H1;!$(NC=O)] > > In general, I cannot get these to be parsed by Chem.MolFromSmarts(). > > For example, Chem.MolFromSmarts('([Cl!$(Cl~c)].[c!$(c~Cl)])') gives me > this error message: > > ``` > [13:01:41] SMARTS Parse Error: syntax error while parsing: > ([Cl!$(Cl~c)_100].[c!$(c~Cl)_101]) > [13:01:41] SMARTS Parse Error: Failed parsing SMARTS > '([Cl!$(Cl~c)_100].[c!$(c~Cl)_101])' for input: '([Cl!$(Cl~c)].[c!$(c~Cl)])' > ``` > My understanding of SMARTS is that the outermost parentheses in this > SMARTS string are required to force the chlorine and the aromatic carbon to > be somewhere in the same covalently connected fragment. E.g. this pattern > *should* hit benzyl chloride ClCc1c1 but should *not* hit the > hydrochloride salt of aniline Cl.Nc1c1. > > What am I getting wrong? Is there a way to write rdkit-parsable SMARTS > that achieves this? (I want to filter our molecules that contain more than > one of certain moieties, while allowing molecules that have one (or zero) > such moieties. But salts or covalently disconnected fragments that each > contain one instance of the moiety should be fine.) > > Details on my setup: > > - RDKit Version: 2019.09.3 > - Operating system: macOS 10.15.2 > - Python version (if relevant): 3.6 > - Are you using conda? yes > - If you are using conda, which channel did you install the rdkit from? > `conda-forge` > - If you are not using conda: how did you install the RDKit? > > Curt > > ___ > Rdkit-discuss mailing list > Rdkit-discuss@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss > ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
[Rdkit-discuss] multiple SMARTS that match only if in the same fragment
Hi rdkit fiends! The [Daylight SMARTS example page]( https://daylight.com/dayhtml_tutorials/languages/smarts/smarts_examples.html) gives several examples for "multiple group" smarts, including these strings: ([Cl!$(Cl~c)].[c!$(c~Cl)]) ([Cl]).([c]) ([Cl].[c]) [NX3;H2,H1;!$(NC=O)].[NX3;H2,H1;!$(NC=O)] In general, I cannot get these to be parsed by Chem.MolFromSmarts(). For example, Chem.MolFromSmarts('([Cl!$(Cl~c)].[c!$(c~Cl)])') gives me this error message: ``` [13:01:41] SMARTS Parse Error: syntax error while parsing: ([Cl!$(Cl~c)_100].[c!$(c~Cl)_101]) [13:01:41] SMARTS Parse Error: Failed parsing SMARTS '([Cl!$(Cl~c)_100].[c!$(c~Cl)_101])' for input: '([Cl!$(Cl~c)].[c!$(c~Cl)])' ``` My understanding of SMARTS is that the outermost parentheses in this SMARTS string are required to force the chlorine and the aromatic carbon to be somewhere in the same covalently connected fragment. E.g. this pattern *should* hit benzyl chloride ClCc1c1 but should *not* hit the hydrochloride salt of aniline Cl.Nc1c1. What am I getting wrong? Is there a way to write rdkit-parsable SMARTS that achieves this? (I want to filter our molecules that contain more than one of certain moieties, while allowing molecules that have one (or zero) such moieties. But salts or covalently disconnected fragments that each contain one instance of the moiety should be fine.) Details on my setup: - RDKit Version: 2019.09.3 - Operating system: macOS 10.15.2 - Python version (if relevant): 3.6 - Are you using conda? yes - If you are using conda, which channel did you install the rdkit from? `conda-forge` - If you are not using conda: how did you install the RDKit? Curt ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss