Hi Adam
There are a number of issues here.  The key one, I think, is a
misunderstanding about the meaning of H in SMARTS.  It means "a single
attached hydrogen", and is a qualifier for another atom, it cannot be used
by itself.  So [*H] is valid, [H] isn't.  See the table at
https://www.daylight.com/dayhtml/doc/theory/theory.smarts.html.  If you
want to refer to an explicit hydrogen, you have to use [#1].  However, that
will only match an explicit hydrogen in the molecule, not an implicit one.
Thus c[#1] doesn't match anything in c1ccccc1.  If you have read in a
molecule from a molfile, for example, that has explicit hydrogens then you
will be ok.

Further to that, your SMARTS strings, at least as they have appeared in
gmail, which may have garbled them, are incorrect.  In S1, the brackets
round [N,n,H] make it a substituent, so it will not match the indole
nitrogen.  Also, it would probably be better as [N,n;H], which would be
read as "(aliphatic nitrogen OR aromatic nitrogen) AND 1 attached
hydrogen."  The [N,n,H] will match a methylated indole nitrogen which I
imagine is not what you want. Similar remarks apply to S2.  A SMARTS that
matches both 6CI and PCT
is [C,c]1(Cl)[C,c][C,c;H][C,c]([C,c])[C,c;H][C,c]1, but that won't match
the H atoms themselves if you want to use them in the overlay, and it also
won't work in the aliphatic case of, for example, ClC1CCC(C)CC1 because
there the carbon atoms have 2 attached hydrogens.   If you really do want
it to match aliphatic cases as well, then you will need something
like 
[C,c]1(Cl)[$([CH2]),$([cH])][$([CH2]),$([cH])][C,c]([C,c])[$([CH2]),$([cH])][$([CH2]),$([cH])]1
which is quite a mouthful.  The carbons at the 2,3,5 and 6 positions on the
ring are specified as either [CH2] or [cH].

Jupyter notebook can be really useful for debugging SMARTS patterns like
this.  The one I used was variations of
```
from rdkit import Chem
from IPython.display import SVG
mol = Chem.MolFromSmiles('C1=CC(=CC2=C1C=CN2C)Cl')
qmol = Chem.MolFromSmarts('[C,c]1(Cl)[C,c][C,c][C,c]([C,c])[C,c][C,c]1')
print(mol.GetSubstructMatches(qmol))
mol
```
which prints the numbers of the matching atoms and also draws the molecule
with the match highlighted.
Regards,
Dave


On Tue, Mar 1, 2022 at 1:43 AM Adam Moyer <atom.mo...@gmail.com> wrote:

> Hello,
>
> I have a baffling case where I am trying to match substructures on two
> ligands for the goal of aligning them.
>
> I have two ligands; one is a 6-chloroindole (6CI) and the other is a
> para-chloro toluene (PCT).
>
> I am attempting to use the following SMARTS (S1) to match
> them: '[C,c]1(Cl)[C,c][C,c]*([N,n,H])*[C,c]([C,c,H])[C,c]([H])[C,c]1'.
> For some reason S1 only finds a match in 6CI.
>
> When I use the following SMARTS (S2) I only match to PCT as expected:
> '[C,c]1(Cl)[C,c][C,c]*([H])*[C,c]([C,c,H])[C,c]([H])[C,c]1'.
>
> How can S1 not match PCT? S1 is strictly a superset of S2 because I am
> using the "or" operation. Do I have a misunderstanding of how explicit
> hydrogens work in RDKit/SMARTS?
>
> Lastly when I use the last SMARTS (S3) I am able to match to both, but I
> cannot use that smarts due to other requirements in my
> project: '[C,c]1(Cl)[C,c][C,c][C,c]([C,c,H])[C,c]([H])[C,c]1'
>
> Thanks!
> Adam
> _______________________________________________
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>


-- 
David Cosgrove
Freelance computational chemistry and chemoinformatics developer
http://cozchemix.co.uk
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to