Hi Tim,
if you need access to the original text, you'll have to do the chunking
yourself, e.g.:
import gzip
def molgen(hnd):
mol_text_tmp = ""
while 1:
line = hnd.readline()
if not line:
return
line = line.decode("utf-8")
mol_text_tmp += line
if line.startswith("$$$$"):
mol_text = mol_text_tmp
mol_text_tmp = ""
yield mol_text
with gzip.open("yourfile.sdf.gz", "rb") as gzip_hnd:
for mol_text in molgen(gzip_hnd):
print(mol_text)
suppl = Chem.SDMolSupplier()
suppl.SetData(mol_text)
mol = next(suppl)
print(mol.GetNumAtoms())
print("------------------")
If you are happy with the RDKit-generated text, you can combine the
ForwardSDMolSupplier with the SDWriter:
import gzip
from io import StringIO
with gzip.open("yourfile.sdf.gz", "rb") as gzip_hnd:
with Chem.ForwardSDMolSupplier(gzip_hnd) as suppl:
for mol in suppl:
buf = StringIO()
with Chem.SDWriter(buf) as w:
w.write(mol)
print(buf.getvalue())
print(mol.GetNumAtoms())
print("------------------")
Cheers,
p.
On Thu, Nov 4, 2021 at 5:09 PM Tim Dudgeon <[email protected]> wrote:
> I am needing to access the text of each record of a SDF, as well as
> creating a mol instance.
> I was successfully doing this using SDMolSupplier.GetItemText().
> Then I needed to switch to handling gzipped SD files, and SDMolSupplier
> can only take a file name in its constructor.
> ForwardSDMolSupplier can handle a gzip file-like instance, but doesn't
> have the GetItemText() function.
> Reading the file records as text is easy enough, but I can't figure out
> how to get the SD file properties (Chem.MolFromMolBlock() does not handle
> the properties).
>
> Seems like there should be an easy way to handle this that I'm not seeing!
>
> Tim
>
> _______________________________________________
> Rdkit-discuss mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
_______________________________________________
Rdkit-discuss mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss