Hi Adam There are a number of issues here. The key one, I think, is a misunderstanding about the meaning of H in SMARTS. It means "a single attached hydrogen", and is a qualifier for another atom, it cannot be used by itself. So [*H] is valid, [H] isn't. See the table at https://www.daylight.com/dayhtml/doc/theory/theory.smarts.html. If you want to refer to an explicit hydrogen, you have to use [#1]. However, that will only match an explicit hydrogen in the molecule, not an implicit one. Thus c[#1] doesn't match anything in c1ccccc1. If you have read in a molecule from a molfile, for example, that has explicit hydrogens then you will be ok.
Further to that, your SMARTS strings, at least as they have appeared in gmail, which may have garbled them, are incorrect. In S1, the brackets round [N,n,H] make it a substituent, so it will not match the indole nitrogen. Also, it would probably be better as [N,n;H], which would be read as "(aliphatic nitrogen OR aromatic nitrogen) AND 1 attached hydrogen." The [N,n,H] will match a methylated indole nitrogen which I imagine is not what you want. Similar remarks apply to S2. A SMARTS that matches both 6CI and PCT is [C,c]1(Cl)[C,c][C,c;H][C,c]([C,c])[C,c;H][C,c]1, but that won't match the H atoms themselves if you want to use them in the overlay, and it also won't work in the aliphatic case of, for example, ClC1CCC(C)CC1 because there the carbon atoms have 2 attached hydrogens. If you really do want it to match aliphatic cases as well, then you will need something like [C,c]1(Cl)[$([CH2]),$([cH])][$([CH2]),$([cH])][C,c]([C,c])[$([CH2]),$([cH])][$([CH2]),$([cH])]1 which is quite a mouthful. The carbons at the 2,3,5 and 6 positions on the ring are specified as either [CH2] or [cH]. Jupyter notebook can be really useful for debugging SMARTS patterns like this. The one I used was variations of ``` from rdkit import Chem from IPython.display import SVG mol = Chem.MolFromSmiles('C1=CC(=CC2=C1C=CN2C)Cl') qmol = Chem.MolFromSmarts('[C,c]1(Cl)[C,c][C,c][C,c]([C,c])[C,c][C,c]1') print(mol.GetSubstructMatches(qmol)) mol ``` which prints the numbers of the matching atoms and also draws the molecule with the match highlighted. Regards, Dave On Tue, Mar 1, 2022 at 1:43 AM Adam Moyer <atom.mo...@gmail.com> wrote: > Hello, > > I have a baffling case where I am trying to match substructures on two > ligands for the goal of aligning them. > > I have two ligands; one is a 6-chloroindole (6CI) and the other is a > para-chloro toluene (PCT). > > I am attempting to use the following SMARTS (S1) to match > them: '[C,c]1(Cl)[C,c][C,c]*([N,n,H])*[C,c]([C,c,H])[C,c]([H])[C,c]1'. > For some reason S1 only finds a match in 6CI. > > When I use the following SMARTS (S2) I only match to PCT as expected: > '[C,c]1(Cl)[C,c][C,c]*([H])*[C,c]([C,c,H])[C,c]([H])[C,c]1'. > > How can S1 not match PCT? S1 is strictly a superset of S2 because I am > using the "or" operation. Do I have a misunderstanding of how explicit > hydrogens work in RDKit/SMARTS? > > Lastly when I use the last SMARTS (S3) I am able to match to both, but I > cannot use that smarts due to other requirements in my > project: '[C,c]1(Cl)[C,c][C,c][C,c]([C,c,H])[C,c]([H])[C,c]1' > > Thanks! > Adam > _______________________________________________ > Rdkit-discuss mailing list > Rdkit-discuss@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss > -- David Cosgrove Freelance computational chemistry and chemoinformatics developer http://cozchemix.co.uk
_______________________________________________ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss