On Sep 26, 2023, at 01:17, Ling Chan <[email protected]> wrote:
> > <pKa> (1)
> 4.0999999
..
> Just wonder what was the rationale behind this extra "(1)" on the property
> field lines (pKa and logP in the above example)?
>
> And is there a way to get rid of these? I am not sure if this extra "(1)" is
> part of the standard sd format.
RDKit uses the increasing value as a sort of per-file registry number.
This is follows the part of the standard which says "External registry numbers
must be enclosed in parentheses."
The relevant code is in Code/GraphMol/FileParsers/SDWriter.cpp :
if (d_molid >= 0) {
(*dp_ostream) << "(" << d_molid + 1 << ") ";
}
There is no way to suppress this output. No only is there no direct way to
change the d_molid, but d_molid cannot be negative as
Code/GraphMol/FileParsers/MolWriters.h declares it as:
unsigned int d_molid; // the number of the molecules we wrote so far
Wim suggested a post-processing approach. Another is to write the SD data items
yourself, that is, use MolToMolBlock() to generate the connection table/molfile
as a string, then iterate through the properties and generate the data items.
import sys
from rdkit import Chem
def MolToSDFRecord(
mol,
includeStereo: bool = True,
confId: int = -1,
kekulize: bool = True,
forceV3000: bool = False):
mol_block = Chem.MolToMolBlock(mol, includeStereo, confId, kekulize,
forceV3000)
lines = []
for prop_name in mol.GetPropNames():
if "\n" in prop_name or ">" in prop_name or "<" in prop_name:
sys.stderr.write(f"WARNING: Skipping property {prop_name!r} because
the "
"name includes an unsupported character.\n")
continue
prop_value = mol.GetProp(prop_name)
if "\n" in prop_value:
if "\n\n" in prop_value or "\r\n\r\n" in prop_value:
sys.stderr.write(f"WARNING: Skipping property {prop_name!r}
because the "
"value includes an embedded newline.\n")
continue
if prop_value.endswith("\r\n"):
prop_value = prop_value[:-2]
elif prop_value.endswith("\n"):
prop_value = prop_value[:-1]
lines.append(f"> <{prop_name}>\n{prop_value}\n\n")
lines.append("$$$$\n")
return mol_block + "".join(lines)
mol = Chem.MolFromSmiles("CCO")
mol.SetProp("pKa","3.3\r\n")
print(MolToSDFRecord(mol))
Andrew
[email protected]
_______________________________________________
Rdkit-discuss mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss