This isn't specifically a ccp4-related question, but I'm hoping for feedback on a topic that most of have had to consider. I'm motivated to ask the question because I'm currently trying to answer it for myself. I should make the disclaimer right off that I'm not looking to start a heated debate about PDB guidelines, but am genuinely looking for constructive suggestions.
My situation involves a two-domain protein in somewhat well-studied family of molecules. There is a long-standing history of how these are numbered - and examples of this can be found in the PDB. The first domain can typically be found with a letter-descriptor after the number (i.e., 1P, 2P, 3P..) and then resetting to 1 with no letter following for the second domain. All numbering is done relative to the original member of the family of these proteins - so if there is a gap based on sequence alignment to that sequence, the numbering skips. Similarly, if there are inserts, the numbering becomes 46a, 45b, 45c, etc. Again, lots of precedent for this in the PDB. BUT, now there is a push from databases for more 'simplification' and standardization of numbering, i.e., start from 1 and go sequentially to the end. Obviously there are arguments to be made for maintaining biologically relevant and historically established precedents. But there are arguments for the other side as well. How do you handle the numbering of your protein sequence if there are gaps, inserts, different biologically relevant domains? Do you use the accepted precedents set by other related structures that have been solved or do you simply start from 1 and push on through to your end point? Thanks in advance for any input. -Linda Brinen