Hi Noel, ( I've cc'ed this message to the openbabel-devel list as suggested, so you have a record of the patch :) )
The use of multiple conect records to define bond orders is not a part of the pdb standard, but there does appear to be limited support for this in the pdb parser currently - if the same connection appears multiple times *on the same line* then the bond order is derived from this. If the same connection appears over separate lines then only the information on the first line is used. Consider the Y3 ligand from the pdb file 1a5v - I've attached an 'astex mode' ligand pdb file, where the connections to eg sulfur 1144 span two lines CONECT 1144 1134 1145 1145 1146 CONECT 1144 1146 1147 The S=O bond, 1144-1146 is split across two lines. Using the current openbabel pdbformat parser, the output smiles string is c1(cc(cc2c1c(cc(c2)[S](=O)(O)O)NC(=O)C)[S](=O)(O)O)O if you switch to using my patched pdbformat parser (diff file attached - note this is a diff from the v2.3.3 source code), you end up getting c1(cc(cc2c1c(cc(c2)S(=O)(=O)O)NC(=O)C)S(=O)(=O)O)O which is 'more correct' in this instance. My patch has 'astex mode' turned on by default, but you can revert to the original algorithm by adding the -aa flag. Note that I haven't tested this to destruction, but it seems to be doing a good job for ~4000 test pdb ligands containing astex style conect records. Thanks also for the pybel advice - with 'astex mode' as the default I don't need the switch currently, but I will have a look at that OBConversion object as I am bound to need it at some point. cheers - enjoy the summer, the holidays and your conferencing! Richard -----Original Message----- From: Noel O'Boyle [mailto:[email protected]] Sent: 10 June 2010 11:16 To: Richard Hall Subject: Re: pybel and openbabel On 10 June 2010 09:57, Richard Hall <[email protected]> wrote: Hi Richard, How's the goin'? > hope all is well with you It is indeed. > - I have recently been using PyBel for a > cheminformatics project and have hacked the OpenBabel pdb file reader > slightly to cope with the 'Astex way' of dealing with conect records (*). I > was wondering whether I should submit a patch for this? Rather than trample > the default behaviour, my change requires a switch when running OpenBabel > (-aa) Great. Patches always very welcome, especially for the PDB parser (which I may one day actually get around to using). Seeing as this isn't the first Astex contribution, do you want commit access? Otherwise, I can sort it out myself if you send me a patch (also a good idea to cc to [email protected] if you're happy with this - means we have a record of the patch). One thing I don't understand is whether you are talking about reading custom PDB files or PDB files actually from the PDB, because it would be strange if their own PDB files didn't conform to their standard. > - how much work would it be to get the Pybel readfile method to > include these switches? Someone just asked me about this also so it's on my mind. The reason I'm reluctant is because I feel this is moving into 'advanced usage' territory, but I'll think about it. In the meanwhile, you need to use the underlying OBConversion object yourself as shown at http://baoilleach.blogspot.com/2008/10/generating-inchis-mini-me-inchikey.html. > I hope that makes sense? Are you going to the Sheffield conference? If so > I'll see you there! I'm all conferenced out for this year, except for Goslar in November. I was planning to go to PyCon this time round, but it clashes with holidays. - Noel > best wishes > > Richard > > > > (*) We use the number of occurrences of a connect record to determine bond > order and these occurrences can span multiple lines - I was finding that the > sulfur in a CS(=O)(=O)C motif would have connect records running over two > lines and the current way of doing things does not cope with this - I would > end up with a smiles that looked like C[S@@](=O)(O) > > > > Disclaimer > > This communication is confidential and may contain privileged information > intended solely for the named addressee(s). It may not be used or disclosed > except for the purpose for which it has been sent. If you are not the > intended recipient you must not review, use, disclose, copy, distribute or > take any action in reliance upon it. If you have received this communication > in error, please notify Astex Therapeutics Ltd by emailing > [email protected] and destroy all copies of the message and any > attached documents. > > Astex Therapeutics Ltd monitors, controls and protects all its messaging > traffic in compliance with its corporate email policy. The Company accepts > no liability or responsibility for any onward transmission or use of emails > and attachments having left the Astex Therapeutics domain. Unless expressly > stated, opinions in this message are those of the individual sender and not > of Astex Therapeutics Ltd. The recipient should check this email and any > attachments for the presence of computer viruses. Astex Therapeutics Ltd > accepts no liability for damage caused by any virus transmitted by this > email. E-mail is susceptible to data corruption, interception, unauthorized > amendment, and tampering, Astex Therapeutics Ltd only send and receive > e-mails on the basis that the Company is not liable for any such alteration > or any consequences thereof. > > Astex Therapeutics Ltd., Registered in England at 436 Cambridge Science > Park, Cambridge CB4 0QA under number 3751674 > > Disclaimer This communication is confidential and may contain privileged information intended solely for the named addressee(s). It may not be used or disclosed except for the purpose for which it has been sent. If you are not the intended recipient you must not review, use, disclose, copy, distribute or take any action in reliance upon it. If you have received this communication in error, please notify Astex Therapeutics Ltd by emailing [email protected] and destroy all copies of the message and any attached documents. Astex Therapeutics Ltd monitors, controls and protects all its messaging traffic in compliance with its corporate email policy. The Company accepts no liability or responsibility for any onward transmission or use of emails and attachments having left the Astex Therapeutics domain. Unless expressly stated, opinions in this message are those of the individual sender and not of Astex Therapeutics Ltd. The recipient should check this email and any attachments for the presence of computer viruses. Astex Therapeutics Ltd accepts no liability for damage caused by any virus transmitted by this email. E-mail is susceptible to data corruption, interception, unauthorized amendment, and tampering, Astex Therapeutics Ltd only send and receive e-mails on the basis that the Company is not liable for any such alteration or any consequences thereof. Astex Therapeutics Ltd., Registered in England at 436 Cambridge Science Park, Cambridge CB4 0QA under number 3751674
1a5v_001.pdb
Description: 1a5v_001.pdb
pdbformat.cpp.patch
Description: pdbformat.cpp.patch
------------------------------------------------------------------------------ ThinkGeek and WIRED's GeekDad team up for the Ultimate GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the lucky parental unit. See the prize list and enter to win: http://p.sf.net/sfu/thinkgeek-promo
_______________________________________________ OpenBabel-Devel mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/openbabel-devel
