Hi Paolo,

argh... I thought if you are setting  params.sanitize=False  then you don't 
want sanitization.
Apparently, you need it for the mols but skipping only aromatization.
I guess, I slowly start to understand...

What you explained here and in your github example I tried to find something 
similar in the RDKit documentation or in the web... without too much success. 
Wouldn't this be an essential step in substructure search or even a FAQ?

Well, now I took a larger list and I got as many hits as I expected. Happy End!

Thank you very much for your kind help!
Theo.

Am 20.05.2020 um 15:22 schrieb Paolo Tosco:
> Hi Theo,
>
> that's because you omitted the sanitization step completely, so the molecule 
> is missing crucial information for the SubstructureMatch to do a proper job.
>
> If you put back sanitization, only leaving out the aromatization step, things 
> work as expected.
> Also, you do not need to create pattern again from SMILES, you can make a 
> copy of the molecule that you have already created and sanitized using the 
> Chem.Mol copy constructor.
>
> from rdkit import Chem
>
> smiles_strings = '''
> N12N3C(CC4=CC=CC(NC=C2)=C14)=CC=C3
> C12=CC=CC3=C1N(N4C=CC=C4C2)C=CN3
> '''
>
> smiles_list = smiles_strings.splitlines()[1:]
> print(smiles_list)
>
> params = Chem.SmilesParserParams()
> params.sanitize=False
>
> mols = [Chem.MolFromSmiles(x,params) for x in smiles_list]
> for m in mols:
>     Chem.SanitizeMol(m, Chem.SANITIZE_ALL ^ Chem.SANITIZE_SETAROMATICITY)
>
> pattern = Chem.Mol(mols[0])
>
> query_params = Chem.AdjustQueryParameters()
> query_params.makeBondsGeneric = True
> query_params.aromatizeIfPossible = False
> query_params.adjustDegree = False
> query_params.adjustHeavyDegree = False
> pattern_generic_bonds = Chem.AdjustQueryProperties(pattern,query_params)
>
> matches = [idx for idx,m in enumerate(mols) if 
> m.HasSubstructMatch(pattern_generic_bonds)]
> print("{} of {}: {}".format(len(matches),len(smiles_list),matches))
>
> $ python3 SubstructMatch2.py
>
> ['N12N3C(CC4=CC=CC(NC=C2)=C14)=CC=C3', 'C12=CC=CC3=C1N(N4C=CC=C4C2)C=CN3']
> 2 of 2: [0, 1]
>
> Cheers,
> p.
>
> On 20/05/2020 09:50, theozh wrote:
>> from rdkit import Chem
>>
>> smiles_strings = '''
>> N12N3C(CC4=CC=CC(NC=C2)=C14)=CC=C3
>> C12=CC=CC3=C1N(N4C=CC=C4C2)C=CN3
>> '''
>>
>> smiles_list = smiles_strings.splitlines()[1:]
>> print(smiles_list)
>>
>> params = Chem.SmilesParserParams()
>> params.sanitize=False
>>
>> mols = [Chem.MolFromSmiles(x,params) for x in smiles_list]
>>
>> pattern = Chem.MolFromSmiles(smiles_list[0],params)
>>
>> query_params = Chem.AdjustQueryParameters()
>> query_params.makeBondsGeneric = True
>> query_params.aromatizeIfPossible = False
>> query_params.adjustDegree = False
>> query_params.adjustHeavyDegree = False
>> pattern_generic_bonds = Chem.AdjustQueryProperties(pattern,query_params)
>>
>> matches = [idx for idx,m in enumerate(mols) if 
>> m.HasSubstructMatch(pattern_generic_bonds)]
>> print("{} of {}: {}".format(len(matches),len(smiles_list),matches))


_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to