Hi Noel,

( I've cc'ed this message to the openbabel-devel list as suggested, so you have 
a record of the patch :) )

The use of multiple conect records to define bond orders is not a part of the 
pdb standard, but there does appear to be limited support for this in the pdb 
parser currently - if the same connection appears multiple times *on the same 
line* then the bond order is derived from this.  If the same connection appears 
over separate lines then only the information on the first line is used.  
Consider the Y3 ligand from the pdb file 1a5v - I've attached an 'astex mode' 
ligand pdb file, where the connections to eg sulfur 1144 span two lines

CONECT 1144 1134 1145 1145 1146
CONECT 1144 1146 1147

The S=O bond, 1144-1146 is split across two lines.  Using the current openbabel 
pdbformat parser, the output smiles string is

c1(cc(cc2c1c(cc(c2)[S](=O)(O)O)NC(=O)C)[S](=O)(O)O)O

if you switch to using my patched pdbformat parser (diff file attached - note 
this is a diff from the v2.3.3 source code), you end up getting

c1(cc(cc2c1c(cc(c2)S(=O)(=O)O)NC(=O)C)S(=O)(=O)O)O

which is 'more correct' in this instance.  My patch has 'astex mode' turned on 
by default, but you can revert to the original algorithm by adding the -aa 
flag.  Note that I haven't tested this to destruction, but it seems to be doing 
a good job for ~4000 test pdb ligands containing astex style conect records.

Thanks also for the pybel advice - with 'astex mode' as the default I don't 
need the switch currently, but I will have a look at that OBConversion object 
as I am bound to need it at some point.

cheers - enjoy the summer, the holidays and your conferencing!
Richard

-----Original Message-----
From: Noel O'Boyle [mailto:[email protected]] 
Sent: 10 June 2010 11:16
To: Richard Hall
Subject: Re: pybel and openbabel

On 10 June 2010 09:57, Richard Hall <[email protected]> wrote:

Hi Richard,

How's the goin'?

> hope all is well with you

It is indeed.

> - I have recently been using PyBel for a
> cheminformatics project and have hacked the OpenBabel pdb file reader
> slightly to cope with the 'Astex way' of dealing with conect records (*).  I
> was wondering whether I should submit a patch for this?  Rather than trample
> the default behaviour, my change requires a switch when running OpenBabel
> (-aa)

Great. Patches always very welcome, especially for the PDB parser
(which I may one day actually get around to using). Seeing as this
isn't the first Astex contribution, do you want commit access?
Otherwise, I can sort it out myself if you send me a patch (also a
good idea to cc to [email protected] if you're happy with
this - means we have a record of the patch).

One thing I don't understand is whether you are talking about reading
custom PDB files or PDB files actually from the PDB, because it would
be strange if their own PDB files didn't conform to their standard.

> - how much work would it be to get the Pybel readfile method to
> include these switches?

Someone just asked me about this also so it's on my mind. The reason
I'm reluctant is because I feel this is moving into 'advanced usage'
territory, but I'll think about it. In the meanwhile, you need to use
the underlying OBConversion object yourself as shown at
http://baoilleach.blogspot.com/2008/10/generating-inchis-mini-me-inchikey.html.

> I hope that makes sense?  Are you going to the Sheffield conference?  If so
> I'll see you there!

I'm all conferenced out for this year, except for Goslar in November.
I was planning to go to PyCon this time round, but it clashes with
holidays.

- Noel

> best wishes
>
> Richard
>
>
>
> (*) We use the number of occurrences of a connect record to determine bond
> order and these occurrences can span multiple lines - I was finding that the
> sulfur in a CS(=O)(=O)C motif would have connect records running over two
> lines and the current way of doing things does not cope with this - I would
> end up with a smiles that looked like C[S@@](=O)(O)
>
>
>
> Disclaimer
>
> This communication is confidential and may contain privileged information
> intended solely for the named addressee(s). It may not be used or disclosed
> except for the purpose for which it has been sent. If you are not the
> intended recipient you must not review, use, disclose, copy, distribute or
> take any action in reliance upon it. If you have received this communication
> in error, please notify Astex Therapeutics Ltd by emailing
> [email protected] and destroy all copies of the message and any
> attached documents.
>
> Astex Therapeutics Ltd monitors, controls and protects all its messaging
> traffic in compliance with its corporate email policy. The Company accepts
> no liability or responsibility for any onward transmission or use of emails
> and attachments having left the Astex Therapeutics domain. Unless expressly
> stated, opinions in this message are those of the individual sender and not
> of Astex Therapeutics Ltd. The recipient should check this email and any
> attachments for the presence of computer viruses. Astex Therapeutics Ltd
> accepts no liability for damage caused by any virus transmitted by this
> email. E-mail is susceptible to data corruption, interception, unauthorized
> amendment, and tampering, Astex Therapeutics Ltd only send and receive
> e-mails on the basis that the Company is not liable for any such alteration
> or any consequences thereof.
>
> Astex Therapeutics Ltd., Registered in England at 436 Cambridge Science
> Park, Cambridge CB4 0QA under number 3751674
>
>



Disclaimer
This communication is confidential and may contain privileged information 
intended solely for the named addressee(s). It may not be used or disclosed 
except for the purpose for which it has been sent. If you are not the intended 
recipient you must not review, use, disclose, copy, distribute or take any 
action in reliance upon it. If you have received this communication in error, 
please notify Astex Therapeutics Ltd by emailing [email protected] 
and destroy all copies of the message and any attached documents. 
Astex Therapeutics Ltd monitors, controls and protects all its messaging 
traffic in compliance with its corporate email policy. The Company accepts no 
liability or responsibility for any onward transmission or use of emails and 
attachments having left the Astex Therapeutics domain.  Unless expressly 
stated, opinions in this message are those of the individual sender and not of 
Astex Therapeutics Ltd. The recipient should check this email and any 
attachments for the presence of computer viruses. Astex Therapeutics Ltd 
accepts no liability for damage caused by any virus transmitted by this email. 
E-mail is susceptible to data corruption, interception, unauthorized amendment, 
and tampering, Astex Therapeutics Ltd only send and receive e-mails on the 
basis that the Company is not liable for any such alteration or any 
consequences thereof.
Astex Therapeutics Ltd., Registered in England at 436 Cambridge Science Park, 
Cambridge CB4 0QA under number 3751674

Attachment: 1a5v_001.pdb
Description: 1a5v_001.pdb

Attachment: pdbformat.cpp.patch
Description: pdbformat.cpp.patch

------------------------------------------------------------------------------
ThinkGeek and WIRED's GeekDad team up for the Ultimate 
GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the 
lucky parental unit.  See the prize list and enter to win: 
http://p.sf.net/sfu/thinkgeek-promo
_______________________________________________
OpenBabel-Devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/openbabel-devel

Reply via email to