Hi Paolo, argh... I thought if you are setting params.sanitize=False then you don't want sanitization. Apparently, you need it for the mols but skipping only aromatization. I guess, I slowly start to understand...
What you explained here and in your github example I tried to find something similar in the RDKit documentation or in the web... without too much success. Wouldn't this be an essential step in substructure search or even a FAQ? Well, now I took a larger list and I got as many hits as I expected. Happy End! Thank you very much for your kind help! Theo. Am 20.05.2020 um 15:22 schrieb Paolo Tosco: > Hi Theo, > > that's because you omitted the sanitization step completely, so the molecule > is missing crucial information for the SubstructureMatch to do a proper job. > > If you put back sanitization, only leaving out the aromatization step, things > work as expected. > Also, you do not need to create pattern again from SMILES, you can make a > copy of the molecule that you have already created and sanitized using the > Chem.Mol copy constructor. > > from rdkit import Chem > > smiles_strings = ''' > N12N3C(CC4=CC=CC(NC=C2)=C14)=CC=C3 > C12=CC=CC3=C1N(N4C=CC=C4C2)C=CN3 > ''' > > smiles_list = smiles_strings.splitlines()[1:] > print(smiles_list) > > params = Chem.SmilesParserParams() > params.sanitize=False > > mols = [Chem.MolFromSmiles(x,params) for x in smiles_list] > for m in mols: > Chem.SanitizeMol(m, Chem.SANITIZE_ALL ^ Chem.SANITIZE_SETAROMATICITY) > > pattern = Chem.Mol(mols[0]) > > query_params = Chem.AdjustQueryParameters() > query_params.makeBondsGeneric = True > query_params.aromatizeIfPossible = False > query_params.adjustDegree = False > query_params.adjustHeavyDegree = False > pattern_generic_bonds = Chem.AdjustQueryProperties(pattern,query_params) > > matches = [idx for idx,m in enumerate(mols) if > m.HasSubstructMatch(pattern_generic_bonds)] > print("{} of {}: {}".format(len(matches),len(smiles_list),matches)) > > $ python3 SubstructMatch2.py > > ['N12N3C(CC4=CC=CC(NC=C2)=C14)=CC=C3', 'C12=CC=CC3=C1N(N4C=CC=C4C2)C=CN3'] > 2 of 2: [0, 1] > > Cheers, > p. > > On 20/05/2020 09:50, theozh wrote: >> from rdkit import Chem >> >> smiles_strings = ''' >> N12N3C(CC4=CC=CC(NC=C2)=C14)=CC=C3 >> C12=CC=CC3=C1N(N4C=CC=C4C2)C=CN3 >> ''' >> >> smiles_list = smiles_strings.splitlines()[1:] >> print(smiles_list) >> >> params = Chem.SmilesParserParams() >> params.sanitize=False >> >> mols = [Chem.MolFromSmiles(x,params) for x in smiles_list] >> >> pattern = Chem.MolFromSmiles(smiles_list[0],params) >> >> query_params = Chem.AdjustQueryParameters() >> query_params.makeBondsGeneric = True >> query_params.aromatizeIfPossible = False >> query_params.adjustDegree = False >> query_params.adjustHeavyDegree = False >> pattern_generic_bonds = Chem.AdjustQueryProperties(pattern,query_params) >> >> matches = [idx for idx,m in enumerate(mols) if >> m.HasSubstructMatch(pattern_generic_bonds)] >> print("{} of {}: {}".format(len(matches),len(smiles_list),matches)) _______________________________________________ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss