Re: [Rdkit-discuss] Reading text records from SDF from gzipped files

Paolo Tosco Thu, 04 Nov 2021 09:38:29 -0700

Hi Tim,

if you need access to the original text, you'll have to do the chunking
yourself, e.g.:


import gzip

def molgen(hnd):
    mol_text_tmp = ""
    while 1:
        line = hnd.readline()
        if not line:
            return
        line = line.decode("utf-8")
        mol_text_tmp += line
        if line.startswith("$$$$"):
            mol_text = mol_text_tmp
            mol_text_tmp = ""
            yield mol_text

with gzip.open("yourfile.sdf.gz", "rb") as gzip_hnd:
    for mol_text in molgen(gzip_hnd):
        print(mol_text)
        suppl = Chem.SDMolSupplier()
        suppl.SetData(mol_text)
        mol = next(suppl)
        print(mol.GetNumAtoms())
        print("------------------")

If you are happy with the RDKit-generated text, you can combine the
ForwardSDMolSupplier with the SDWriter:

import gzip
from io import StringIO

with gzip.open("yourfile.sdf.gz", "rb") as gzip_hnd:
    with Chem.ForwardSDMolSupplier(gzip_hnd) as suppl:
        for mol in suppl:
            buf = StringIO()
            with Chem.SDWriter(buf) as w:
                w.write(mol)
            print(buf.getvalue())
            print(mol.GetNumAtoms())
            print("------------------")

Cheers,
p.

On Thu, Nov 4, 2021 at 5:09 PM Tim Dudgeon <[email protected]> wrote:

> I am needing to access the text of each record of a SDF, as well as
> creating a mol instance.
> I was successfully doing this using SDMolSupplier.GetItemText().
> Then I needed to switch to handling gzipped SD files, and SDMolSupplier
> can only take a file name in its constructor.
> ForwardSDMolSupplier can handle a gzip file-like instance, but doesn't
> have the GetItemText() function.
> Reading the file records as text is easy enough, but I can't figure out
> how to get the SD file properties (Chem.MolFromMolBlock() does not handle
> the properties).
>
> Seems like there should be an easy way to handle this that I'm not seeing!
>
> Tim
>
> _______________________________________________
> Rdkit-discuss mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>

_______________________________________________
Rdkit-discuss mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Re: [Rdkit-discuss] Reading text records from SDF from gzipped files

Reply via email to