On 25/10/13 08:09, James Davidson wrote: > Hi Roger, > > Thanks for the response > >> The use of an integer file format "flavor" argument allows the caller to >> customize the behavior of the readers and writers. The semantics is that a >> reasonable default is zero (for all bits), but that new features may be added >> without changing the API/ABI. >> Most of the bits above (for the writer) control strict compliance with the >> PDB >> format specification. For example, a flavor of 12 will write bond orders the >> way the RCSB expects them both throwing away bond orders and increasing >> the size of the PDB file. > As a test, I am using 2VCI, and am retrieving the PDB data from the RCSB > using the following > > import requests > url = > "http://www.rcsb.org/pdb/download/downloadFile.do?fileFormat=pdb&compression=NO&structureId=2VCI" > response = requests.get(url) > pdb_block = response.content > response.close() > > > pdb_block shows CONECT records only for the HETATM records. > If I now read into RDKit, using the defaults, and write back out using the > defaults, I see CONECT records for every atom (ie protein as well). And I > can't see any double-bonds rendered in PyMOL: > > from rdkit import Chem > from rdkit.Chem import AllChem > pdb = Chem.MolFromPDBBlock(pdb_block) > pdb_block_out = Chem.MolToPDBBlock(pdb) > > First 10 CONECT records of output: > CONECT 1 2 > CONECT 2 3 5 > CONECT 3 4 4 10 > CONECT 5 6 > CONECT 6 7 > CONECT 7 8 8 9 > CONECT 10 11 > CONECT 11 12 14 > CONECT 12 13 13 17 > CONECT 14 15 16 > > > If I use Chem.MolToPDBBlock(pdb, flavor=12) I do, indeed see the ligand > CONECT records in what looks like the original format (albeit now numbered > differently), and I still see CONECT records for the protein - but this PDB > *will* render double bonds in PyMOL. > > First 10 CONECT records of output: > CONECT 3 4 4 > CONECT 7 8 8 > CONECT 12 13 13 > CONECT 19 20 20 > CONECT 23 24 24 > CONECT 28 29 29 > CONECT 35 36 36 > CONECT 38 39 39 > CONECT 40 42 42 > CONECT 41 43 43 >
If I may be so bold, I believe an important part of the puzzle is missing. The residue-name/3-letter-code/comp-id in the PDB file is a pointer to an entry in the mmCIF-formatted chemical component dictionary that describes the compound, for all compounds for all entries released by the PDB. http://deposit.pdb.org/cc_dict_tut.html If this is an "internal" PDB file there will, very likely be a similar mmCIF file used for crystallographic refinement. Only when these options fail would I consider turning to bond-order perception and CONECT records. Paul. ------------------------------------------------------------------------------ October Webinars: Code for Performance Free Intel webinars can help you accelerate application performance. Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from the latest Intel processors and coprocessors. See abstracts and register > http://pubads.g.doubleclick.net/gampad/clk?id=60135991&iu=/4140/ostg.clktrk _______________________________________________ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss