[Rdkit-discuss] Searching in (Downloaded) Databases

Philipp Otten Mon, 21 Jun 2021 23:19:58 -0700

Hey you lovely people,
as I am creating a set of building blocks for my in-silico reaction, I
downloaded various accessible databases (ChemBL28, GDB13, GDB17, Pubchem,
emolecules and mcule) and want to just work through them with
"HasSubstructMatch". Unfortunately I run into a "File parsing error: ran
out of lines"
I open the .smi files as SmilesMolSupplier and then just for loop through
them:


 with open(target_file, "w") as outfile:
        suppl = Chem.SmilesMolSupplier(infile, sanitize=False,
nameColumn=-1)
        for mol in suppl:
            if Descriptors.MolWt(mol) <= mwt:
                if mol.HasSubstructMatch(pattern1) == True:
                    mol = Chem.MolToSmiles(mol)
                    outfile.write(mol + "\n")
                else:
                    continue
            else:
                continue

I can imagine that it possibly has something to do with the length of the
files, but I don't know how to actually fix that.
Thanks for all your help!
Kind regards
Philipp

_______________________________________________
Rdkit-discuss mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

[Rdkit-discuss] Searching in (Downloaded) Databases

Reply via email to