Greg-

Your suggestion fixed my problems with the file I was working with. I am a novice with both to RDKit and Python.

I have a new question for you: I am using the model script you provided, but with a different set of compounds.

##### SCRIPT to embed and optimize  #########
import Chem
from Chem import AllChem
suppl = Chem.SDMolSupplier('usdrug.700.sdf')

mols = []
for i,mol in enumerate(suppl):
       num = mol.GetNumAtoms()
        print 'embed:',i
        AllChem.EmbedMolecule(mol)
       print 'optimize:',i
       AllChem.UFFOptimizeMolecule(mol,500)
        mols.append(mol)

w = Chem.SDWriter('output.sdf')
for mol in mols: w.write(mol)

##########################

###### OUTPUT error ########
embed: 529
optimize: 529
embed: 530
optimize: 530
embed: 531
optimize: 531
embed: 532
optimize: 532
Traceback (most recent call last):
  File "2-3D.py", line 12, in <module>
    AllChem.UFFOptimizeMolecule(mol,500)
ValueError: Bad Conformer Id

###############################

I think I narrowed the molecule down to either of these two (with the first looking more likely to be the problem):
http://www.emolecules.com/cgi-bin/more?vid=5852064
http://www.emolecules.com/cgi-bin/more?vid=1985607

AND THEN, I tried to stop using the UFFOptimizeMolecule function and use only EmbedMolecule and I received this error:

############
embed: 613
embed: 614
embed: 615
embed: 616
embed: 617
embed: 618
[20:03:38] Explicit valence for atom # 0 N greater than permitted
[20:03:38] Unexpected error hit on line 38532
[20:03:38] ERROR: moving to the begining of the next molecule
embed: 619
Traceback (most recent call last):
  File "2-3D.py", line 10, in <module>
    AllChem.EmbedMolecule(mol)
Boost.Python.ArgumentError: Python argument types in
    Chem.rdDistGeom.EmbedMolecule(NoneType)
did not match C++ signature:
EmbedMolecule(RDKit::ROMol {lvalue} mol, unsigned int maxAttempts=0, int randomSeed=-1, bool clearConfs=True, bool useRandomCoords=False, double boxSizeMult=2.0, bool randNegEig=True, unsigned int numZeroFail=1, boost::python::dict {lvalue} coordMap={})

######################

I think this is the #618 and #619 compound(s) that caused the problem:
http://www.emolecules.com/cgi-bin/more?vid=5753883
http://www.emolecules.com/cgi-bin/more?vid=15920212

If there are valence problems, I can just record those errors and throw out those molecules. But I do not understand why the FileParseException does not succeed in proceeding to the next molecule. I am assuming it has to do with the way the for-loop is constructed, but I wasn't able to figure it out.

Thank you for your great support.  I really appreciate it.

-Marshall

On Mar 4, 2009, at 9:35 PM, Greg Landrum wrote:

Dear Marshall,

On Wed, Mar 4, 2009 at 7:01 PM, Marshall Levesque
<[email protected]> wrote:

I have written a script to automate converting compounds from a 2D SD file into a 3D, optimized SD file. With the compounds I'm testing with, I get a hangup in python that has to be killed. I'm wondering if it has anything to
do with this bug you listed:

- UFF optimization not terminating when atoms are on top of each other
(issue 2378119)

Here is the code I'm trying to run:

############
import Chem
from Chem import AllChem
suppl = Chem.SDMolSupplier('original65cmpds.sdf')

max = len(suppl)

for i in range(0, max):
       AllChem.EmbedMolecule(suppl[i])
       AllChem.UFFOptimizeMolecule(suppl[i])
       print i

w = Chem.SDWriter('output.sdf')
for mol in suppl: w.write(mol)
##########

There are two things going on here.

The first is, indeed, a bug in the optimizer that causes it to get
stuck in some circumstances. I will track this down.

However, even if this bug were not present, the code above would not
behave the way you expect. When you ask an SDMolSupplier for a
particular object, it parses the corresponding part of the SD file and
returns a *new* object, so the calls to EmbedMolecule(),
UFFOptimizeMolecule(), and w.write() in your code above are all
working on different objects. Here's a simple demonstration:

[4]>>> suppl = Chem.SDMolSupplier('original65cmpds.sdf')

[5]>>> ma = suppl[0]

[6]>>> mb = suppl[0]

[7]>>> ma is mb
Out[7] False

[8]>>> ma
Out[8] <rdkit.Chem.rdchem.Mol object at 0xa72b02c>

[9]>>> mb
Out[9] <rdkit.Chem.rdchem.Mol object at 0xa72b09c>

notice that ma and mb are pointing to different pieces of memory
(that's what the "at 0x...." things indicate).

Here's a version of your script that does what you want, and that doesn't hang:

##########
suppl = Chem.SDMolSupplier('original65cmpds.sdf')

mols = []
for i,mol in enumerate(suppl):
      print 'embed: ',i
      AllChem.EmbedMolecule(mol)
      print 'optimize: ',i
      AllChem.UFFOptimizeMolecule(mol)
      mols.append(mol)

w = Chem.SDWriter('output.sdf')
for mol in mols: w.write(mol)
##########

Regards,
-greg


Reply via email to