Greg-
Your suggestion fixed my problems with the file I was working with. I
am a novice with both to RDKit and Python.
I have a new question for you: I am using the model script you
provided, but with a different set of compounds.
##### SCRIPT to embed and optimize #########
import Chem
from Chem import AllChem
suppl = Chem.SDMolSupplier('usdrug.700.sdf')
mols = []
for i,mol in enumerate(suppl):
num = mol.GetNumAtoms()
print 'embed:',i
AllChem.EmbedMolecule(mol)
print 'optimize:',i
AllChem.UFFOptimizeMolecule(mol,500)
mols.append(mol)
w = Chem.SDWriter('output.sdf')
for mol in mols: w.write(mol)
##########################
###### OUTPUT error ########
embed: 529
optimize: 529
embed: 530
optimize: 530
embed: 531
optimize: 531
embed: 532
optimize: 532
Traceback (most recent call last):
File "2-3D.py", line 12, in <module>
AllChem.UFFOptimizeMolecule(mol,500)
ValueError: Bad Conformer Id
###############################
I think I narrowed the molecule down to either of these two (with the
first looking more likely to be the problem):
http://www.emolecules.com/cgi-bin/more?vid=5852064
http://www.emolecules.com/cgi-bin/more?vid=1985607
AND THEN, I tried to stop using the UFFOptimizeMolecule function and
use only EmbedMolecule and I received this error:
############
embed: 613
embed: 614
embed: 615
embed: 616
embed: 617
embed: 618
[20:03:38] Explicit valence for atom # 0 N greater than permitted
[20:03:38] Unexpected error hit on line 38532
[20:03:38] ERROR: moving to the begining of the next molecule
embed: 619
Traceback (most recent call last):
File "2-3D.py", line 10, in <module>
AllChem.EmbedMolecule(mol)
Boost.Python.ArgumentError: Python argument types in
Chem.rdDistGeom.EmbedMolecule(NoneType)
did not match C++ signature:
EmbedMolecule(RDKit::ROMol {lvalue} mol, unsigned int
maxAttempts=0, int randomSeed=-1, bool clearConfs=True, bool
useRandomCoords=False, double boxSizeMult=2.0, bool randNegEig=True,
unsigned int numZeroFail=1, boost::python::dict {lvalue} coordMap={})
######################
I think this is the #618 and #619 compound(s) that caused the problem:
http://www.emolecules.com/cgi-bin/more?vid=5753883
http://www.emolecules.com/cgi-bin/more?vid=15920212
If there are valence problems, I can just record those errors and
throw out those molecules. But I do not understand why the
FileParseException does not succeed in proceeding to the next
molecule. I am assuming it has to do with the way the for-loop is
constructed, but I wasn't able to figure it out.
Thank you for your great support. I really appreciate it.
-Marshall
On Mar 4, 2009, at 9:35 PM, Greg Landrum wrote:
Dear Marshall,
On Wed, Mar 4, 2009 at 7:01 PM, Marshall Levesque
<[email protected]> wrote:
I have written a script to automate converting compounds from a 2D
SD file
into a 3D, optimized SD file. With the compounds I'm testing with,
I get a
hangup in python that has to be killed. I'm wondering if it has
anything to
do with this bug you listed:
- UFF optimization not terminating when atoms are on top of each
other
(issue 2378119)
Here is the code I'm trying to run:
############
import Chem
from Chem import AllChem
suppl = Chem.SDMolSupplier('original65cmpds.sdf')
max = len(suppl)
for i in range(0, max):
AllChem.EmbedMolecule(suppl[i])
AllChem.UFFOptimizeMolecule(suppl[i])
print i
w = Chem.SDWriter('output.sdf')
for mol in suppl: w.write(mol)
##########
There are two things going on here.
The first is, indeed, a bug in the optimizer that causes it to get
stuck in some circumstances. I will track this down.
However, even if this bug were not present, the code above would not
behave the way you expect. When you ask an SDMolSupplier for a
particular object, it parses the corresponding part of the SD file and
returns a *new* object, so the calls to EmbedMolecule(),
UFFOptimizeMolecule(), and w.write() in your code above are all
working on different objects. Here's a simple demonstration:
[4]>>> suppl = Chem.SDMolSupplier('original65cmpds.sdf')
[5]>>> ma = suppl[0]
[6]>>> mb = suppl[0]
[7]>>> ma is mb
Out[7] False
[8]>>> ma
Out[8] <rdkit.Chem.rdchem.Mol object at 0xa72b02c>
[9]>>> mb
Out[9] <rdkit.Chem.rdchem.Mol object at 0xa72b09c>
notice that ma and mb are pointing to different pieces of memory
(that's what the "at 0x...." things indicate).
Here's a version of your script that does what you want, and that
doesn't hang:
##########
suppl = Chem.SDMolSupplier('original65cmpds.sdf')
mols = []
for i,mol in enumerate(suppl):
print 'embed: ',i
AllChem.EmbedMolecule(mol)
print 'optimize: ',i
AllChem.UFFOptimizeMolecule(mol)
mols.append(mol)
w = Chem.SDWriter('output.sdf')
for mol in mols: w.write(mol)
##########
Regards,
-greg