This isn't specifically a ccp4-related question, but I'm hoping for feedback
on a topic that most of have had to consider. I'm motivated to ask the
question because I'm currently trying to answer it for myself.  I should
make the disclaimer right off that I'm not looking to start a heated debate
about PDB guidelines, but am genuinely looking for constructive suggestions.

 

My situation involves a two-domain protein in somewhat well-studied family
of molecules. There is a long-standing history of how these are numbered -
and examples of this can be found in the PDB. The first domain can typically
be found with a letter-descriptor after the number (i.e., 1P, 2P, 3P..) and
then resetting to 1 with no letter following for the second domain. All
numbering is done relative to the original member of the family of these
proteins - so if there is a gap based on sequence alignment to that
sequence, the numbering skips. Similarly, if there are inserts, the
numbering becomes 46a, 45b, 45c, etc.  Again, lots of precedent for this in
the PDB.   

 

BUT, now there is a push from databases for more 'simplification' and
standardization of numbering, i.e., start from 1 and go sequentially to the
end. Obviously there are arguments to be made for maintaining biologically
relevant and historically established precedents. But there are arguments
for the other side as well. 

 

How do you handle the numbering of your protein sequence if there are gaps,
inserts, different biologically relevant domains? Do you use the accepted
precedents set by other related structures that have been solved or do you
simply start from 1 and push on through to your end point?

 

Thanks in advance for any input.

 

-Linda Brinen

 

Reply via email to