On Feb 8, 2020, at 17:55, Janusz Petkowski <[email protected]> wrote:
>
> If not how can I match cases where in a given position there can be C or H
> with rdkit?
I believe you should use #1 instead of H.
>>> from rdkit import Chem
>>> mols = [Chem.MolFromSmiles(s) for s in ["C(=O)OC", "C(=O)OCC", "C(=O)OCCC"]]
>>> hmols = [Chem.AddHs(mol) for mol in mols]
Your pattern:
>>> pat1 = Chem.MolFromSmarts("[H]C(=O)OC([C,H])([H])[H]")
>>> [mol.HasSubstructMatch(pat1) for mol in hmols]
[False, True, True]
Using #1 instead of H:
>>> pat2 = Chem.MolFromSmarts("[H]C(=O)OC([C,#1])([#1])[#1]")
>>> [mol.HasSubstructMatch(pat2) for mol in hmols]
[True, True, True]
"H" has an odd interpretation.
https://www.daylight.com/dayhtml/doc/theory/theory.smarts.html says:
Note that atomic primitive H can have two meanings,
implying a property or the element itself. [H] means
hydrogen atom. [*H2] means any atom with exactly
two hydrogens attached
I believe the goal of having [H] match a hydrogen atom is to allow a SMILES,
when interpreted as a SMARTS, to be able to match the SMILES when interpreted
as a molecule. I'm not sure about that though.
Cheers,
Andrew
[email protected]
_______________________________________________
Rdkit-discuss mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss