Greg, I found the files of interest and ran a few tests. The files resulting from the tests are in the attached archive and here are the details.
The structures in question came from the non-aggregators set of Shoichet which were available on his web page. My original intent was to convert the SMILES files from the Shoichet set to SDF. This went smoothly enough until I had to process the SDF for a different purpose. Four structures were found to cause problems. In the attached archive, each offending structure has 5 associated files named according the the NGC ID associated with the original SMILES: .smi - The original SMILES. .sdf - The result I had found in my SMILES to SDF conversion having nan as the atom coordinates. .mol - Generated manually today by: m = Chem.MolFromSmiles('<offending SMILES>') AllChem.Compute2DCoords(m) print >>file ('blah.mol','w+'), Chem.MolToMolBlock(m) _fix.smi - This is the RDKit generated SMILES for the structure. _fix.mol - The result of the following after the code snip above: m=Chem.MolFromSmiles(Chem.MolToSmiles(m)) AllChem.Compute2DCoords(m) print >>file ('blah_fix.mol','w+'), Chem.MolToMolBlock(m) Only 14662 did not result in a fixed mol file. Interestingly, the first bad conversion only has nan for coordinates of the platinum hexachloride. After the SMILES round-trip, all coordinates are nan. Please let me know if you need any further details. -Kirk On Sat, May 1, 2010 at 10:24 PM, Greg Landrum <greg.land...@gmail.com>wrote: > On Fri, Apr 30, 2010 at 12:56 PM, Greg Landrum <greg.land...@gmail.com> > wrote: > > > > I don't see any problems in your script, so I have to assume that it's > > a problem with the binary you're using. I'm travelling and don't have > > a windows machine handy, so this will have to wait until I'm back home > > this weekend. > > Ok, I was able to reproduce this on my windows box. It's clearly a > problem with the windows build: > > In [29]: m = Chem.MolFromSmiles('OC(=O)C1CCCC1') > > In [30]: AllChem.Compute2DCoords(m) > Out[30]: 0 > > In [31]: print Chem.MolToMolBlock(m) > -------> print(Chem.MolToMolBlock(m)) > > RDKit 2D > > 8 8 0 0 0 0 0 0 0 0999 V2000 > -1.#IND 1.#QNB 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0 > -1.#IND 1.#QNB 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 > -1.#IND 1.#QNB 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0 > -1.#IND 1.#QNB 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 > -1.#IND 1.#QNB 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 > -1.#IND 1.#QNB 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 > -1.#IND 1.#QNB 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 > -1.#IND 1.#QNB 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 > 1 2 1 0 > 2 3 2 3 > 2 4 1 0 > 4 5 1 0 > 5 6 1 0 > 6 7 1 0 > 7 8 1 0 > 8 4 1 0 > M END > > I will look into this and see where the problem lies. > > Note: whatever is going on here doesn't affect every depiction; other > molecules do end up with correct coordinates. > > Best Regards, > -greg > > > ------------------------------------------------------------------------------ > _______________________________________________ > Rdkit-discuss mailing list > Rdkit-discuss@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss >
nan.tgz
Description: GNU Zip compressed data
------------------------------------------------------------------------------
_______________________________________________ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss