Re: [ccp4bb] [Fwd: Re: [ccp4bb] FW: New ligand 3-letter code (help-7071)]

2015-06-24 Thread Ian Tickle
Hi Ben

From discussions we have had with PDBe they consider tautomers to be
different compounds (just as stereoisomers would be considered to be
different compounds), since they require different restraint dictionaries,
so each tautomer that was observed would require a unique 3-lettter code.
Of course you still have to have evidence (e.g. from the H-bonding pattern)
that what you are really seeing are different tautomers, but that's a
different question.

Cheers

-- Ian


On 24 June 2015 at 12:50, Ben Bax benjamin.d@gsk.com wrote:

  Another major problem with the PDB is that it does not seem to believe
 in the existence of different tautomers or protonation states.

 For example the ATP analogue AMPPNP can have the nitrogen between the beta
 and gamma phosphates protonated (-P-NH-P) or unprotonated (P-N=P), and
 there are well documented examples of both tautomers in the PDB (NH being a
 hydrogen bond donor and N a hydrogen bond acceptor).
 If you look in the CSD you can see that the protonation state  of the
 nitrogen changes the geometry of the P-N-P bond.

 However, as I understand it, the PDB considers all tautomeric (and
 protonated) forms of AMPPNP the same. When I tried to deposit a specific
 AMPPNP tautomer in 2013, they would not accept it. The PDB also seems to
 believe, as I understand it, that the overall charge on AMPPNP is zero and
 that the phosphates do not carry negative charge.


 *Ben Bax*
 *Senior Scientific Investigator*
 BioMolecular Sciences UK
 RD Platform Technology  Science

 *GSK*
 *Medicines Research Centre, Gunnels Wood Road, Stevenage, SG1 2NY, UK*
 *Email   **benjamin.d@gsk.com* benjamin.d@gsk.com
 *Mobile  +44 (*0) 7912 600604
 *Tel +*44 (0) 1438 55 1156

 *gsk.com* http://www.gsk.com/  |  *Twitter* http://twitter.com/GSK
 |  *YouTube* http://www.youtube.com/user/gskvision  |  *Facebook*
 http://www.facebook.com/glaxosmithkline  |  *Flickr*
 http://www.flickr.com/photos/glaxosmithkline










 -Original Message-
 From: CCP4 bulletin board [mailto:CCP4BB@JISCMAIL.AC.UK
 CCP4BB@JISCMAIL.AC.UK] On Behalf Of Martyn Symmons
 Sent: 22 June 2015 23:39
 To: CCP4BB@JISCMAIL.AC.UK
 Subject: Re: [ccp4bb] [Fwd: Re: [ccp4bb] FW: New ligand 3-letter code
 (help-7071)]

 Well the problem is there is a lot more to a ligand than PDB coordinates -
 little things like bond orders... In addition people can publish ligands
 with atoms for which they have no density - so zero-occupancy is allowed
 too. So who should get priority - the group who publishes a ligand first,
 or the ones who actually have density for all the atoms?

 These sorts of complications mean we all benefit from peer-review of the
 structure - that is why we put things on hold. And authors should have a
 chance to change their ligand definition based on reviewers'
 comments - just as they are allowed to improve the PDB coordinates. So it
 is a worry for them that the PDB might 'publish' the ligand aspect of
 their work before they have completed the peer-review process.

 Maybe you don't believe is peer-review - in reply to which I'd paraphrase
 what people say about democracy - it's pretty bad but better than the
 alternatives.

 But to return to the point I made: what really is the problem with
 maintaining and modifying _separate_ definitions with authors'
 _separate_ deposited coordinates (and bond orders) while structures are on
 hold and being reviewed? Journals manage to keep all those submitted papers
 separate in their databases.

 cheers
  M.

 On Mon, Jun 22, 2015 at 3:12 AM, Edward A. Berry ber...@upstate.edu
 wrote:
I can't imagine a journal doing that can you?  When I work on my
  supplementary material in a paper I don't expect that the journal
  will take a bit out and publish it separately to support the work of
  my competitors. Not out of spite that I was beaten - but because I
  don't want to take the responsibility for checking their science for
 them!
 
 
  I don't see the problem here. What about the dozens of authors who
  will benefit from using your ligand in their structure _after_ your
  structure comes out? You don't take responsibility for checking their
  science. Every author gets a copy of his final structure to check
  before it is released and each is responsible for his own.
  The only difference here is whether the competitor got to use it
  first, (which might sting a bit) or only after you had already made it
  your own with the first structure.
 
  I guess the ligand database is the responsibility of the pdb, but they
  depend on first depositors to help set up each ligand, so it is not
  surprising if the type model has coordinates from the first
  depositor's structure (although it would be convenient if they were
  all moved to c.o.m. at 0,0,0). When another group publishes a
  structure with the ligand, they will not be publishing the first
  depositor's coordinates because the ligand will be moved to its
  position in their structure and refined

Re: [ccp4bb] [Fwd: Re: [ccp4bb] FW: New ligand 3-letter code (help-7071)]

2015-06-24 Thread Ben Bax
Another major problem with the PDB is that it does not seem to believe in the 
existence of different tautomers or protonation states.

For example the ATP analogue AMPPNP can have the nitrogen between the beta and 
gamma phosphates protonated (-P-NH-P) or unprotonated (P-N=P), and there are 
well documented examples of both tautomers in the PDB (NH being a hydrogen bond 
donor and N a hydrogen bond acceptor).
If you look in the CSD you can see that the protonation state  of the nitrogen 
changes the geometry of the P-N-P bond.

However, as I understand it, the PDB considers all tautomeric (and protonated) 
forms of AMPPNP the same. When I tried to deposit a specific AMPPNP tautomer in 
2013, they would not accept it. The PDB also seems to believe, as I understand 
it, that the overall charge on AMPPNP is zero and that the phosphates do not 
carry negative charge.


Ben Bax
Senior Scientific Investigator
BioMolecular Sciences UK
RD Platform Technology  Science

GSK
Medicines Research Centre, Gunnels Wood Road, Stevenage, SG1 2NY, UK
Email   benjamin.d@gsk.commailto:benjamin.d@gsk.com
Mobile  +44 (0) 7912 600604
Tel +44 (0) 1438 55 1156

gsk.comhttp://www.gsk.com/  |  Twitterhttp://twitter.com/GSK  |  
YouTubehttp://www.youtube.com/user/gskvision  |  
Facebookhttp://www.facebook.com/glaxosmithkline  |  
Flickrhttp://www.flickr.com/photos/glaxosmithkline











-Original Message-
From: CCP4 bulletin board [mailto:CCP4BB@JISCMAIL.AC.UK] On Behalf Of Martyn 
Symmons
Sent: 22 June 2015 23:39
To: CCP4BB@JISCMAIL.AC.UK
Subject: Re: [ccp4bb] [Fwd: Re: [ccp4bb] FW: New ligand 3-letter code 
(help-7071)]

Well the problem is there is a lot more to a ligand than PDB coordinates - 
little things like bond orders... In addition people can publish ligands with 
atoms for which they have no density - so zero-occupancy is allowed too. So who 
should get priority - the group who publishes a ligand first, or the ones who 
actually have density for all the atoms?

These sorts of complications mean we all benefit from peer-review of the 
structure - that is why we put things on hold. And authors should have a chance 
to change their ligand definition based on reviewers'
comments - just as they are allowed to improve the PDB coordinates. So it is a 
worry for them that the PDB might 'publish' the ligand aspect of  their work 
before they have completed the peer-review process.

Maybe you don't believe is peer-review - in reply to which I'd paraphrase what 
people say about democracy - it's pretty bad but better than the alternatives.

But to return to the point I made: what really is the problem with maintaining 
and modifying _separate_ definitions with authors'
_separate_ deposited coordinates (and bond orders) while structures are on hold 
and being reviewed? Journals manage to keep all those submitted papers separate 
in their databases.

cheers
 M.

On Mon, Jun 22, 2015 at 3:12 AM, Edward A. Berry 
ber...@upstate.edumailto:ber...@upstate.edu wrote:
   I can't imagine a journal doing that can you?  When I work on my
 supplementary material in a paper I don't expect that the journal
 will take a bit out and publish it separately to support the work of
 my competitors. Not out of spite that I was beaten - but because I
 don't want to take the responsibility for checking their science for them!


 I don't see the problem here. What about the dozens of authors who
 will benefit from using your ligand in their structure _after_ your
 structure comes out? You don't take responsibility for checking their
 science. Every author gets a copy of his final structure to check
 before it is released and each is responsible for his own.
 The only difference here is whether the competitor got to use it
 first, (which might sting a bit) or only after you had already made it
 your own with the first structure.

 I guess the ligand database is the responsibility of the pdb, but they
 depend on first depositors to help set up each ligand, so it is not
 surprising if the type model has coordinates from the first
 depositor's structure (although it would be convenient if they were
 all moved to c.o.m. at 0,0,0). When another group publishes a
 structure with the ligand, they will not be publishing the first
 depositor's coordinates because the ligand will be moved to its
 position in their structure and refined against their data, probably
 with somewhat different restraints.

 If the ligand is a top secret novel drug lead that your company is
 developing I guess it would come as a shock to find someone else has
 already deposited it, and it might be good to hasten not the
 publication but protecting of the compound with a patent!

 Although Miriam says a new 3-letter code is generated when no match is
 found, I believe the depositor's code will be used if it is available,
 at least one of mine was last year, so there is some use for Nigel's
 utility if you want to stamp your new compound with a rememberable name.

 eab

Re: [ccp4bb] [Fwd: Re: [ccp4bb] FW: New ligand 3-letter code (help-7071)]

2015-06-24 Thread Miri Hirshberg
Good afternoon both,

there is also the issue of inconsistency of presentation.

For example, Lysine, that is L-Lysine (LYS) is protonated on the side
chain nitrogen (NZ), whiles as D-lysine (DLY) is not.

i.e. you have NZ(HZ1, HZ2) for DLY, and NZ(HZ1, HZ2, HZ3) for LYS

Miri

On Wed, 2015-06-24 at 13:35 +0100, Ian Tickle wrote:
 Hi Ben
 
 
 From discussions we have had with PDBe they consider tautomers to be
 different compounds (just as stereoisomers would be considered to be
 different compounds), since they require different restraint
 dictionaries, so each tautomer that was observed would require a
 unique 3-lettter code.  Of course you still have to have evidence
 (e.g. from the H-bonding pattern) that what you are really seeing are
 different tautomers, but that's a different question.
 
 
 Cheers
 
 
 -- Ian
 
 
 
 On 24 June 2015 at 12:50, Ben Bax benjamin.d@gsk.com wrote:
 Another major problem with the PDB is that it does not seem to
 believe in the existence of different tautomers or protonation
 states.
  
 For example the ATP analogue AMPPNP can have the nitrogen
 between the beta and gamma phosphates protonated (-P-NH-P) or
 unprotonated (P-N=P), and there are well documented examples
 of both tautomers in the PDB (NH being a hydrogen bond donor
 and N a hydrogen bond acceptor). 
 If you look in the CSD you can see that the protonation state
 of the nitrogen changes the geometry of the P-N-P bond.
  
 However, as I understand it, the PDB considers all tautomeric
 (and protonated) forms of AMPPNP the same. When I tried to
 deposit a specific AMPPNP tautomer in 2013, they would not
 accept it. The PDB also seems to believe, as I understand it,
 that the overall charge on AMPPNP is zero and that the
 phosphates do not carry negative charge.
  
  
 Ben Bax
 Senior Scientific Investigator
 BioMolecular Sciences UK
 RD Platform Technology  Science
  
 GSK
 Medicines Research Centre, Gunnels Wood Road, Stevenage, SG1
 2NY, UK
 Email   benjamin.d@gsk.com
 Mobile  +44 (0) 7912 600604
 Tel +44 (0) 1438 55 1156
  
 gsk.com  | Twitter  |  YouTube  |  Facebook  |  Flickr
  
 
  
  
  
  
  
  
  
  
  
 -Original Message-
 From: CCP4 bulletin board [mailto:CCP4BB@JISCMAIL.AC.UK] On
 Behalf Of Martyn Symmons
 Sent: 22 June 2015 23:39
 To: CCP4BB@JISCMAIL.AC.UK
 Subject: Re: [ccp4bb] [Fwd: Re: [ccp4bb] FW: New ligand
 3-letter code (help-7071)]
  
 Well the problem is there is a lot more to a ligand than PDB
 coordinates - little things like bond orders... In addition
 people can publish ligands with atoms for which they have no
 density - so zero-occupancy is allowed too. So who should get
 priority - the group who publishes a ligand first, or the ones
 who actually have density for all the atoms?
  
 These sorts of complications mean we all benefit from
 peer-review of the structure - that is why we put things on
 hold. And authors should have a chance to change their ligand
 definition based on reviewers'
 comments - just as they are allowed to improve the PDB
 coordinates. So it is a worry for them that the PDB might
 'publish' the ligand aspect of  their work before they have
 completed the peer-review process.
  
 Maybe you don't believe is peer-review - in reply to which I'd
 paraphrase what people say about democracy - it's pretty bad
 but better than the alternatives.
  
 But to return to the point I made: what really is the problem
 with maintaining and modifying _separate_ definitions with
 authors'
 _separate_ deposited coordinates (and bond orders) while
 structures are on hold and being reviewed? Journals manage to
 keep all those submitted papers separate in their databases.
  
 cheers
 M.
  
 On Mon, Jun 22, 2015 at 3:12 AM, Edward A. Berry
 ber...@upstate.edu wrote:
I can't imagine a journal doing that can you?  When I
 work on my 
  supplementary material in a paper I don't expect that the
 journal 
  will take a bit out and publish it separately to support
 the work of 
  my competitors. Not out of spite that I was beaten - but
 because I 
  don't want to take the responsibility for checking their
 science for them!
 
 
  I don't see the problem here. What about the dozens of
 authors who

Re: [ccp4bb] [Fwd: Re: [ccp4bb] FW: New ligand 3-letter code (help-7071)]

2015-06-22 Thread Martyn Symmons
Well the problem is there is a lot more to a ligand than PDB
coordinates - little things like bond orders... In addition people can
publish ligands with atoms for which they have no density - so
zero-occupancy is allowed too. So who should get priority - the group
who publishes a ligand first, or the ones who actually have density
for all the atoms?

These sorts of complications mean we all benefit from peer-review of
the structure - that is why we put things on hold. And authors should
have a chance to change their ligand definition based on reviewers'
comments - just as they are allowed to improve the PDB coordinates. So
it is a worry for them that the PDB might 'publish' the ligand aspect
of  their work before they have completed the peer-review process.

Maybe you don't believe is peer-review - in reply to which I'd
paraphrase what people say about democracy - it's pretty bad but
better than the alternatives.

But to return to the point I made: what really is the problem with
maintaining and modifying _separate_ definitions with authors'
_separate_ deposited coordinates (and bond orders) while structures
are on hold and being reviewed? Journals manage to keep all those
submitted papers separate in their databases.

cheers
 M.

On Mon, Jun 22, 2015 at 3:12 AM, Edward A. Berry ber...@upstate.edu wrote:
   I can't imagine a journal doing that can you?  When I work on my
 supplementary material in a paper I don't expect that the journal will
 take a bit out and publish it separately to support the work of my
 competitors. Not out of spite that I was beaten - but because I don't
 want to take the responsibility for checking their science for them!


 I don't see the problem here. What about the dozens of authors who
 will benefit from using your ligand in their structure _after_ your
 structure comes out? You don't take responsibility for checking their
 science. Every author gets a copy of his final structure to check
 before it is released and each is responsible for his own.
 The only difference here is whether the competitor got to use it first,
 (which might sting a bit) or only after you had already made it your
 own with the first structure.

 I guess the ligand database is the responsibility of the pdb, but
 they depend on first depositors to help set up each ligand, so
 it is not surprising if the type model has coordinates from the
 first depositor's structure (although it would be convenient if
 they were all moved to c.o.m. at 0,0,0). When another group publishes
 a structure with the ligand, they will not be publishing the first
 depositor's coordinates because the ligand will be moved to its position
 in their structure and refined against their data, probably with
 somewhat different restraints.

 If the ligand is a top secret novel drug lead that your company is
 developing I guess it would come as a shock to find someone else has
 already deposited it, and it might be good to hasten not the
 publication but protecting of the compound with a patent!

 Although Miriam says a new 3-letter code is generated when no match is
 found,
 I believe the depositor's code will be used if it is available,
 at least one of mine was last year, so there is some use for Nigel's
 utility if you want to stamp your new compound with a rememberable name.

 eab


 On 06/21/2015 06:33 PM, Martyn Symmons wrote:

 Miri raises important points about issues in the PDB Chemical
 Component Dictionary - I think part of the problem is that this is
 published completely separately from the actual PDB - so for example I
 don't think we have an archive of the CCD for comparison alongside the
 PDB snapshots? This makes it difficult to follow the convoluted track
 of particular ligands through the PDB's many,many changes to small
 molecule definitions.

 But following discussion with other contributors offline I want to
 make it clear what is my understanding of the ZA3 (2Y2I /2Y59) case:

 I am clear there was no unethical behaviour by either group in the
 course of their work on these structures and the publication of them.

 The problem I am highlighting is that the PDB don't understand
 publishing ethics - what happened in ZA3 was that they published a
 little bit of one group's work to support the work of someone who was
 scooping them!

   I can't imagine a journal doing that can you?  When I work on my
 supplementary material in a paper I don't expect that the journal will
 take a bit out and publish it separately to support the work of my
 competitors. Not out of spite that I was beaten - but because I don't
 want to take the responsibility for checking their science for them!

 All the best
Martyn

 Cambridge

 On Sun, Jun 21, 2015 at 7:01 PM, Miri Hirshberg
 02897e8e9f0f-dmarc-requ...@jiscmail.ac.uk wrote:

 Sun., June 21st 2015

 Good evening,

 adding several general points to the thread.

 (1) Fundamentally PDB unlike other chemical databases
 insists that all equal structures should have the same 

Re: [ccp4bb] [Fwd: Re: [ccp4bb] FW: New ligand 3-letter code (help-7071)]

2015-06-21 Thread Miri Hirshberg
Sun., June 21st 2015

Good evening,

adding several general points to the thread.

(1) Fundamentally PDB unlike other chemical databases 
insists that all equal structures should have the same 3-letter 
code and the same atom names - obviously for amino acids and say ATP. 

 (1.1) Needless to say there are endless examples in the PDB of two   
ligands differ by let say one hydroxyl group, where equivalent atoms in
the two ligands having totally different names.

(2) When a structure is deposited with a ligand, the ligand is first
compared against PDB chem_comp database (CCD) and against the on-hold
chem_comp (CCD) (naturally the latter is not publicly available), 
and only if no-match can be found a new three-letter code  is generated
and assigned. 

If not, then this is a mistake in annotation and should not happen.

(3) Exception to the above take several different flavours. This
include:

 (3.1) When the same ligand is described in PDB as a 3-letters-code
and as well as a combination of two different 3-letters-code ligands. 
An example out of many is phosphoserine. The 3-letter-code 
in PDB CCD is SEP which is used in 704 PDB entries (RCSB counting 
21-June-2015). But in the PDB entry 3uw2 the phosphoserine 109A is 
described as a combination of SER and the inorganic phosphate PO4 !!! 
(a side point: note the inorganic PO4 became organic upon this linkage -
a PDB chemical conundrum!!).

 (3.2) CCDC does not make any attempt to standardise atom names nor to
match same structures to have equal atom names - original author atom
names are kept so that amino acids may have bizarre atom names and where
required symmetry atom names are generated - this is rare in the PDB but
not unknown, and the PDB is poor at completing atom/ligand names where
symmetry is required and in fact often is not completed in any chemical
reasonable sense as this would require changes in occupancy. 

The simplest case is in racemic PDB entries where the symmetry generated
structure for say L-ALA should be the D-version DAL, 
but PDB as is, has not coped with it, as it would require two sets of
coordinates each at say 1/2 occupancy (usually).

One of several examples in the PDB archive is pdb entry 3e7r. The
Xray structure of Racemic Plectasin. The entry consists of one protein
chain, in SPG P-1. 

In the manuscript 
http://onlinelibrary.wiley.com/doi/10.1002/pro.127/pdf

Figure 3a, for example shows Crystal packing. 
(a) Centrosymmetric P-1 unit cell. The
L-plectasin molecule is shown in blue and the
D-plectasin molecule is in gold. 

But if you use the PDB entry, and the symmetry operator of P-1
to generate the two symmetry related mates in the unit cell 
you will get a chain with L- naming residues 
GLY-PHE-GLY-CYS-ASN-GLY-PRO- etc
representing D- amino acids. 
(GLY is a special case). 

 (3.3) There is also the problem in assigning a 3-letter code where the
submission has obviously assigned the wrong chirality. One example is a
where the sugar must be NAG but is assigned NGA in a
glycopeptide where NGA is impossible - the PDB should have assigned NAG
with a CAVEAT that the chirality is incorrect. Note, re-refinement by
other software will require a bond-breakage.
NGA is used in 90 entries (RCSB counting 21-June-2015)

regards Miri 




  From: Yong Wang wang_yon...@lilly.com
  Reply-to: Yong Wang wang_yon...@lilly.com
  To: CCP4BB@JISCMAIL.AC.UK
  Subject: Re: [ccp4bb] FW: New ligand 3-letter code (help-7071)
  Date: Sat, 20 Jun 2015 18:36:34 +
  
  Sharing a ligand name should only be limited to having the same compound, 
  i.e. same 2D structure or connectivity.  Each deposition should have its 
  own 3D coordinates.  If a different publication gets your ligand 3D 
  coordinates (2Y59 actually embodies the atomic coordinates from the 
  2Y2I), that looks to me an oversight by PDB.  It is hard to believe that 
  PDB intended to use the 3D coordinates from one entry for the other, ligand 
  or not.  In fact, the restraints as described by the ligand dictionary 
  should also be kept separate as that reflects how the authors refine their 
  ligand.  
  
  Yong
  
  -Original Message-
  From: CCP4 bulletin board [mailto:CCP4BB@JISCMAIL.AC.UK] On Behalf Of 
  Martyn Symmons
  Sent: Friday, June 19, 2015 8:39 PM
  To: CCP4BB@JISCMAIL.AC.UK
  Subject: Re: [ccp4bb] FW: New ligand 3-letter code (help-7071)
  
  By oversimplifying the situation here the PDB does not answer my related 
  point about competing crystallographers:
  My scenario:
  
  Group A deposits structure with new drug - gets their three-letter code for 
  example ZA3  they then get to check the coordinates and chemical definition 
  of this ligand.
  
  But suppose a little after that a competing group B deposits their 
  structure with the same drug which they think is novel - but no...
  they get assigned the now described ZA3 which has been checked by the other 
  group.
  
   Then it is a race to see who gets to publish and release first. And if 

Re: [ccp4bb] [Fwd: Re: [ccp4bb] FW: New ligand 3-letter code (help-7071)]

2015-06-21 Thread Edward A. Berry

  I can't imagine a journal doing that can you?  When I work on my
supplementary material in a paper I don't expect that the journal will
take a bit out and publish it separately to support the work of my
competitors. Not out of spite that I was beaten - but because I don't
want to take the responsibility for checking their science for them!


I don't see the problem here. What about the dozens of authors who
will benefit from using your ligand in their structure _after_ your
structure comes out? You don't take responsibility for checking their
science. Every author gets a copy of his final structure to check
before it is released and each is responsible for his own.
The only difference here is whether the competitor got to use it first,
(which might sting a bit) or only after you had already made it your
own with the first structure.

I guess the ligand database is the responsibility of the pdb, but
they depend on first depositors to help set up each ligand, so
it is not surprising if the type model has coordinates from the
first depositor's structure (although it would be convenient if
they were all moved to c.o.m. at 0,0,0). When another group publishes
a structure with the ligand, they will not be publishing the first
depositor's coordinates because the ligand will be moved to its position
in their structure and refined against their data, probably with
somewhat different restraints.

If the ligand is a top secret novel drug lead that your company is
developing I guess it would come as a shock to find someone else has
already deposited it, and it might be good to hasten not the
publication but protecting of the compound with a patent!

Although Miriam says a new 3-letter code is generated when no match is found,
I believe the depositor's code will be used if it is available,
at least one of mine was last year, so there is some use for Nigel's
utility if you want to stamp your new compound with a rememberable name.

eab

On 06/21/2015 06:33 PM, Martyn Symmons wrote:

Miri raises important points about issues in the PDB Chemical
Component Dictionary - I think part of the problem is that this is
published completely separately from the actual PDB - so for example I
don't think we have an archive of the CCD for comparison alongside the
PDB snapshots? This makes it difficult to follow the convoluted track
of particular ligands through the PDB's many,many changes to small
molecule definitions.

But following discussion with other contributors offline I want to
make it clear what is my understanding of the ZA3 (2Y2I /2Y59) case:

I am clear there was no unethical behaviour by either group in the
course of their work on these structures and the publication of them.

The problem I am highlighting is that the PDB don't understand
publishing ethics - what happened in ZA3 was that they published a
little bit of one group's work to support the work of someone who was
scooping them!

  I can't imagine a journal doing that can you?  When I work on my
supplementary material in a paper I don't expect that the journal will
take a bit out and publish it separately to support the work of my
competitors. Not out of spite that I was beaten - but because I don't
want to take the responsibility for checking their science for them!

All the best
   Martyn

Cambridge

On Sun, Jun 21, 2015 at 7:01 PM, Miri Hirshberg
02897e8e9f0f-dmarc-requ...@jiscmail.ac.uk wrote:

Sun., June 21st 2015

Good evening,

adding several general points to the thread.

(1) Fundamentally PDB unlike other chemical databases
insists that all equal structures should have the same 3-letter
code and the same atom names - obviously for amino acids and say ATP.

  (1.1) Needless to say there are endless examples in the PDB of two
ligands differ by let say one hydroxyl group, where equivalent atoms in
the two ligands having totally different names.

(2) When a structure is deposited with a ligand, the ligand is first
compared against PDB chem_comp database (CCD) and against the on-hold
chem_comp (CCD) (naturally the latter is not publicly available),
and only if no-match can be found a new three-letter code  is generated
and assigned.

If not, then this is a mistake in annotation and should not happen.

(3) Exception to the above take several different flavours. This
include:

  (3.1) When the same ligand is described in PDB as a 3-letters-code
and as well as a combination of two different 3-letters-code ligands.
An example out of many is phosphoserine. The 3-letter-code
in PDB CCD is SEP which is used in 704 PDB entries (RCSB counting
21-June-2015). But in the PDB entry 3uw2 the phosphoserine 109A is
described as a combination of SER and the inorganic phosphate PO4 !!!
(a side point: note the inorganic PO4 became organic upon this linkage -
a PDB chemical conundrum!!).

  (3.2) CCDC does not make any attempt to standardise atom names nor to
match same structures to have equal atom names - original author atom
names are kept so that amino acids may have 

Re: [ccp4bb] [Fwd: Re: [ccp4bb] FW: New ligand 3-letter code (help-7071)]

2015-06-21 Thread Martyn Symmons
Miri raises important points about issues in the PDB Chemical
Component Dictionary - I think part of the problem is that this is
published completely separately from the actual PDB - so for example I
don't think we have an archive of the CCD for comparison alongside the
PDB snapshots? This makes it difficult to follow the convoluted track
of particular ligands through the PDB's many,many changes to small
molecule definitions.

But following discussion with other contributors offline I want to
make it clear what is my understanding of the ZA3 (2Y2I /2Y59) case:

I am clear there was no unethical behaviour by either group in the
course of their work on these structures and the publication of them.

The problem I am highlighting is that the PDB don't understand
publishing ethics - what happened in ZA3 was that they published a
little bit of one group's work to support the work of someone who was
scooping them!

 I can't imagine a journal doing that can you?  When I work on my
supplementary material in a paper I don't expect that the journal will
take a bit out and publish it separately to support the work of my
competitors. Not out of spite that I was beaten - but because I don't
want to take the responsibility for checking their science for them!

All the best
  Martyn

Cambridge

On Sun, Jun 21, 2015 at 7:01 PM, Miri Hirshberg
02897e8e9f0f-dmarc-requ...@jiscmail.ac.uk wrote:
 Sun., June 21st 2015

 Good evening,

 adding several general points to the thread.

 (1) Fundamentally PDB unlike other chemical databases
 insists that all equal structures should have the same 3-letter
 code and the same atom names - obviously for amino acids and say ATP.

  (1.1) Needless to say there are endless examples in the PDB of two
 ligands differ by let say one hydroxyl group, where equivalent atoms in
 the two ligands having totally different names.

 (2) When a structure is deposited with a ligand, the ligand is first
 compared against PDB chem_comp database (CCD) and against the on-hold
 chem_comp (CCD) (naturally the latter is not publicly available),
 and only if no-match can be found a new three-letter code  is generated
 and assigned.

 If not, then this is a mistake in annotation and should not happen.

 (3) Exception to the above take several different flavours. This
 include:

  (3.1) When the same ligand is described in PDB as a 3-letters-code
 and as well as a combination of two different 3-letters-code ligands.
 An example out of many is phosphoserine. The 3-letter-code
 in PDB CCD is SEP which is used in 704 PDB entries (RCSB counting
 21-June-2015). But in the PDB entry 3uw2 the phosphoserine 109A is
 described as a combination of SER and the inorganic phosphate PO4 !!!
 (a side point: note the inorganic PO4 became organic upon this linkage -
 a PDB chemical conundrum!!).

  (3.2) CCDC does not make any attempt to standardise atom names nor to
 match same structures to have equal atom names - original author atom
 names are kept so that amino acids may have bizarre atom names and where
 required symmetry atom names are generated - this is rare in the PDB but
 not unknown, and the PDB is poor at completing atom/ligand names where
 symmetry is required and in fact often is not completed in any chemical
 reasonable sense as this would require changes in occupancy.

 The simplest case is in racemic PDB entries where the symmetry generated
 structure for say L-ALA should be the D-version DAL,
 but PDB as is, has not coped with it, as it would require two sets of
 coordinates each at say 1/2 occupancy (usually).

 One of several examples in the PDB archive is pdb entry 3e7r. The
 Xray structure of Racemic Plectasin. The entry consists of one protein
 chain, in SPG P-1.

 In the manuscript
 http://onlinelibrary.wiley.com/doi/10.1002/pro.127/pdf

 Figure 3a, for example shows Crystal packing.
 (a) Centrosymmetric P-1 unit cell. The
 L-plectasin molecule is shown in blue and the
 D-plectasin molecule is in gold.

 But if you use the PDB entry, and the symmetry operator of P-1
 to generate the two symmetry related mates in the unit cell
 you will get a chain with L- naming residues
 GLY-PHE-GLY-CYS-ASN-GLY-PRO- etc
 representing D- amino acids.
 (GLY is a special case).

  (3.3) There is also the problem in assigning a 3-letter code where the
 submission has obviously assigned the wrong chirality. One example is a
 where the sugar must be NAG but is assigned NGA in a
 glycopeptide where NGA is impossible - the PDB should have assigned NAG
 with a CAVEAT that the chirality is incorrect. Note, re-refinement by
 other software will require a bond-breakage.
 NGA is used in 90 entries (RCSB counting 21-June-2015)

 regards Miri




  From: Yong Wang wang_yon...@lilly.com
  Reply-to: Yong Wang wang_yon...@lilly.com
  To: CCP4BB@JISCMAIL.AC.UK
  Subject: Re: [ccp4bb] FW: New ligand 3-letter code (help-7071)
  Date: Sat, 20 Jun 2015 18:36:34 +
 
  Sharing a ligand name should only be limited to having the