Hi Philipp,
It looks like the supplier thinks the line index has gone past the end of
file.
1) How large is the SMILES file which leads to this error (ls -l)?
2) Does it consistently happen at the same line number?
You can check this with something like:
suppl = Chem.SmilesMolSupplier(infile, sanitize=False, nameColumn=-1)
i = 0
while 1:
try:
mol = next(suppl)
except StopIteration:
break
except Exception:
print(f"Exception raised after {i} mols")
raise
i += 1
To check if the problem is actually due to file size, you may split
linewise your input file with the coreutils split command :
split -l <n_lines> large_file.smi large_file_ --additional-suffix=.smi
Replace <n_lines> with a number < than the one that causes the exception
and check if operating on smaller chunks removes the problem.
HTH, cheers
p.
On Tue, Jun 22, 2021 at 8:19 AM Philipp Otten <[email protected]>
wrote:
> Hey you lovely people,
> as I am creating a set of building blocks for my in-silico reaction, I
> downloaded various accessible databases (ChemBL28, GDB13, GDB17, Pubchem,
> emolecules and mcule) and want to just work through them with
> "HasSubstructMatch". Unfortunately I run into a "File parsing error: ran
> out of lines"
> I open the .smi files as SmilesMolSupplier and then just for loop through
> them:
>
> with open(target_file, "w") as outfile:
> suppl = Chem.SmilesMolSupplier(infile, sanitize=False,
> nameColumn=-1)
> for mol in suppl:
> if Descriptors.MolWt(mol) <= mwt:
> if mol.HasSubstructMatch(pattern1) == True:
> mol = Chem.MolToSmiles(mol)
> outfile.write(mol + "\n")
> else:
> continue
> else:
> continue
>
> I can imagine that it possibly has something to do with the length of the
> files, but I don't know how to actually fix that.
> Thanks for all your help!
> Kind regards
> Philipp
> _______________________________________________
> Rdkit-discuss mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
_______________________________________________
Rdkit-discuss mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss