Re: [ccp4bb] B-factor & Space gr questions!
On Tuesday 05 June 2007 12:19, Edward A Berry wrote: > You have a good point there and I would be interested in hearing > some other opinions, so I take the liberty of reposting- > > My instinctive preference is that each structure should be > supported solely by the data that is deposited with it - > (one dataset one structure) but in terms of good science > we want to produce the best model we can, and that might be > the rigid-body-located structure from another dataset. I don't think that is quite the right way to look at it. In general we refine our model so that it both - agrees with the data - agrees with a priori knowledge In maximum likelihood terms: we want to find the model that is the most likely explanation for our observed data. An inherently unlikely model is also an inherently unlikely explanation. Therefore we focus on likely models. We impose geometric restraints because we believe that we have a better a priori expectation for bond lengths and angles than can be determined de novo from the data in this one experiment. Similarly we impose the known sequence of our protein on the model, even if the maps are not sufficiently good to identify each amino acid directly from the electron density. If we have an a priori expectation for the conformation of the whole protein, or large pieces of it, then we should account for this in the model, even if the data is not sufficiently good to reproduce this expectation de novo. Therefore if you have a high-resolution structure available, the best treatment of low-resolution data may well be to place the known structure as a rigid body. If you suspect hinge motions or other large scale inter-domain shifts, you might want to refine the hinge angle explicitly, but unfortunately our usual refinement programs are not really set up for this. These are important issues, and are close to the heart of the Maximum Likelihood approach to model refinement. Ethan > cdekker wrote: > > Hi, > > > > Your reply to the ccp4bb has confused me a bit. I am currently refining > > a low res structure and realise that I don't know what to expect for > > final R and Rfree - it is definitely not what most people would publish. > > So the absolute values of R and Rfree are not telling me much, the only > > gauge I have is that as long as both R and Rfree are decreasing I am > > improving the model (and yes, at the moment that is only rigid body > > refinement). > > In your email reply you suggest that even though a refinement to > > convergence that will lead to an increased Rfree (and lower R? - a > > classic case of overfitting!) would be a better model than the > > rigid-body-refined only model. This is what confuses me. > > I can see your reasoning that starting with an atomic model to solve > > low-res data can lead to this behaviour, but then should the solution > > not be a modification of the starting model (maybe high B-factors?) to > > compensate for the difference in resolution of model and data? > > > > Carien > > > > On 4 Jun 2007, at 19:38, Edward A Berry wrote: > > > >> Ibrahim M. Moustafa wrote: > >>> The last question: In the same paper, for the complex structure R and > >>> Rfree are equal (30%) is that an indication for improper refinement > >>> in these published structure? I'd love to hear your comments on that > >>> too. > >> Several times I solved low resolution structures using high resolution > >> models, and noticed that R-free increased during atomic positional > >> refinement. This could be expected from the assertion that after > >> refinement to convergence, the final values should not depend on > >> the starting point: If I had started with a crude model and refined > >> against low resolution data, Rfree would not have gone as low as the > >> high-resolution model, so if I start with the high resolution model > >> and refine, Rfree should worsen to the same value as the structure > >> converges to the same point. > >> > >> Thinking about the main purpose of the Rfree statistic, in a very > >> real way this tells me that the model was better before this step > >> of refinement, and it would be better to omit the minimization step. > >> Perhaps this is what the authors did. > >> > >>On the other hand it does not seem quite right submit a model that > >> has simply been rigid-body-refined against the data- I would prefer to > >> refine to convergence and submit the best model that can be supported > >> by the data alone, rather than a better model which is really the model > >> from a better dataset repositioned in the new crystal. > >> > >> Ed > > > > > > The Institute of Cancer Research: Royal Cancer Hospital, a charitable > > Company Limited by Guarantee, Registered in England under Company No. > > 534147 with its Registered Office at 123 Old Brompton Road, London SW7 3RP. > > > > This e-mail message is confidential and for use by the addressee only. > > If the message is received by any
Re: [ccp4bb] B-factor & Space gr questions!
I think the relevant point in this discussion is that the original paper discussed the apo and substrate complexes of the protein. For the structure with lower resolution data you may indeed get a better model by taking the high resolution model and just applying rigid body refinement to it. After that step you would like to find and model the differences between the two structures. This includes the bound substrate (or the lack thereof) and any significant structural changes that accompany substrate binding. Significant meaning those changes that can be reliably determined at the lower resolution. For most of the structure that may mean you are best off by simply taking the rigid-body-refined coordinates of the higher resolution structure without further refinement. I see no problem in doing so and as long as interesting differences between the structures can be clearly defined and the procedure is explicity described in publications this should be perfectly reasonable. Bart Edward A Berry wrote: You have a good point there and I would be interested in hearing some other opinions, so I take the liberty of reposting- My instinctive preference is that each structure should be supported solely by the data that is deposited with it - (one dataset one structure) but in terms of good science we want to produce the best model we can, and that might be the rigid-body-located structure from another dataset. In particular the density for the ligand might be clearer before overfitting with the low resolution data. Even if the free-R set is not preserved for the new crystal, R and R-free tend to diverge rapidly once any kind of fitting with a low data/param is performed, so I think the new structure must not have been refined much beyond rigid body (and over-all B which is included in any kind of refinement). And that choice may be well justified. Ed cdekker wrote: Hi, Your reply to the ccp4bb has confused me a bit. I am currently refining a low res structure and realise that I don't know what to expect for final R and Rfree - it is definitely not what most people would publish. So the absolute values of R and Rfree are not telling me much, the only gauge I have is that as long as both R and Rfree are decreasing I am improving the model (and yes, at the moment that is only rigid body refinement). In your email reply you suggest that even though a refinement to convergence that will lead to an increased Rfree (and lower R? - a classic case of overfitting!) would be a better model than the rigid-body-refined only model. This is what confuses me. I can see your reasoning that starting with an atomic model to solve low-res data can lead to this behaviour, but then should the solution not be a modification of the starting model (maybe high B-factors?) to compensate for the difference in resolution of model and data? Carien On 4 Jun 2007, at 19:38, Edward A Berry wrote: Ibrahim M. Moustafa wrote: The last question: In the same paper, for the complex structure R and Rfree are equal (30%) is that an indication for improper refinement in these published structure? I'd love to hear your comments on that too. Several times I solved low resolution structures using high resolution models, and noticed that R-free increased during atomic positional refinement. This could be expected from the assertion that after refinement to convergence, the final values should not depend on the starting point: If I had started with a crude model and refined against low resolution data, Rfree would not have gone as low as the high-resolution model, so if I start with the high resolution model and refine, Rfree should worsen to the same value as the structure converges to the same point. Thinking about the main purpose of the Rfree statistic, in a very real way this tells me that the model was better before this step of refinement, and it would be better to omit the minimization step. Perhaps this is what the authors did. On the other hand it does not seem quite right submit a model that has simply been rigid-body-refined against the data- I would prefer to refine to convergence and submit the best model that can be supported by the data alone, rather than a better model which is really the model from a better dataset repositioned in the new crystal. Ed The Institute of Cancer Research: Royal Cancer Hospital, a charitable Company Limited by Guarantee, Registered in England under Company No. 534147 with its Registered Office at 123 Old Brompton Road, London SW7 3RP. This e-mail message is confidential and for use by the addressee only. If the message is received by anyone other than the addressee, please return the message to the sender by replying to it and then delete the message from your computer and network. -- == Bart Hazes (Assistant Professor) Dept. of Medical Microbiology & Immunology U
Re: [ccp4bb] B-factor & Space gr questions!
Wouldn't the desirability of this depend on the extent to which the molecule has moved between the high-resolution and low-resolution datasets ? I would have thought that there was an effective information transfer between R-work and R-free once the rigid body movements became too large, which might provide one with an over-optimistic idea of what the R-free would be with the high-resolution model with the low-resolution data. Phil Princeton NJ Edward A Berry wrote: Even if the free-R set is not preserved for the new crystal, R and R-free tend to diverge rapidly once any kind of fitting with a low data/param is performed, so I think the new structure must not have been refined much beyond rigid body (and over-all B which is included in any kind of refinement). And that choice may be well justified. Ed cdekker wrote: Hi, Your reply to the ccp4bb has confused me a bit. I am currently refining a low res structure and realise that I don't know what to expect for final R and Rfree - it is definitely not what most people would publish. So the absolute values of R and Rfree are not telling me much, the only gauge I have is that as long as both R and Rfree are decreasing I am improving the model (and yes, at the moment that is only rigid body refinement). In your email reply you suggest that even though a refinement to convergence that will lead to an increased Rfree (and lower R? - a classic case of overfitting!) would be a better model than the rigid-body-refined only model. This is what confuses me. I can see your reasoning that starting with an atomic model to solve low-res data can lead to this behaviour, but then should the solution not be a modification of the starting model (maybe high B-factors?) to compensate for the difference in resolution of model and data? Carien On 4 Jun 2007, at 19:38, Edward A Berry wrote: Ibrahim M. Moustafa wrote: The last question: In the same paper, for the complex structure R and Rfree are equal (30%) is that an indication for improper refinement in these published structure? I'd love to hear your comments on that too. Several times I solved low resolution structures using high resolution models, and noticed that R-free increased during atomic positional refinement. This could be expected from the assertion that after refinement to convergence, the final values should not depend on the starting point: If I had started with a crude model and refined against low resolution data, Rfree would not have gone as low as the high-resolution model, so if I start with the high resolution model and refine, Rfree should worsen to the same value as the structure converges to the same point. Thinking about the main purpose of the Rfree statistic, in a very real way this tells me that the model was better before this step of refinement, and it would be better to omit the minimization step. Perhaps this is what the authors did. On the other hand it does not seem quite right submit a model that has simply been rigid-body-refined against the data- I would prefer to refine to convergence and submit the best model that can be supported by the data alone, rather than a better model which is really the model from a better dataset repositioned in the new crystal. Ed
Re: [ccp4bb] B-factor & Space gr questions!
You have a good point there and I would be interested in hearing some other opinions, so I take the liberty of reposting- My instinctive preference is that each structure should be supported solely by the data that is deposited with it - (one dataset one structure) but in terms of good science we want to produce the best model we can, and that might be the rigid-body-located structure from another dataset. In particular the density for the ligand might be clearer before overfitting with the low resolution data. Even if the free-R set is not preserved for the new crystal, R and R-free tend to diverge rapidly once any kind of fitting with a low data/param is performed, so I think the new structure must not have been refined much beyond rigid body (and over-all B which is included in any kind of refinement). And that choice may be well justified. Ed cdekker wrote: Hi, Your reply to the ccp4bb has confused me a bit. I am currently refining a low res structure and realise that I don't know what to expect for final R and Rfree - it is definitely not what most people would publish. So the absolute values of R and Rfree are not telling me much, the only gauge I have is that as long as both R and Rfree are decreasing I am improving the model (and yes, at the moment that is only rigid body refinement). In your email reply you suggest that even though a refinement to convergence that will lead to an increased Rfree (and lower R? - a classic case of overfitting!) would be a better model than the rigid-body-refined only model. This is what confuses me. I can see your reasoning that starting with an atomic model to solve low-res data can lead to this behaviour, but then should the solution not be a modification of the starting model (maybe high B-factors?) to compensate for the difference in resolution of model and data? Carien On 4 Jun 2007, at 19:38, Edward A Berry wrote: Ibrahim M. Moustafa wrote: The last question: In the same paper, for the complex structure R and Rfree are equal (30%) is that an indication for improper refinement in these published structure? I'd love to hear your comments on that too. Several times I solved low resolution structures using high resolution models, and noticed that R-free increased during atomic positional refinement. This could be expected from the assertion that after refinement to convergence, the final values should not depend on the starting point: If I had started with a crude model and refined against low resolution data, Rfree would not have gone as low as the high-resolution model, so if I start with the high resolution model and refine, Rfree should worsen to the same value as the structure converges to the same point. Thinking about the main purpose of the Rfree statistic, in a very real way this tells me that the model was better before this step of refinement, and it would be better to omit the minimization step. Perhaps this is what the authors did. On the other hand it does not seem quite right submit a model that has simply been rigid-body-refined against the data- I would prefer to refine to convergence and submit the best model that can be supported by the data alone, rather than a better model which is really the model from a better dataset repositioned in the new crystal. Ed The Institute of Cancer Research: Royal Cancer Hospital, a charitable Company Limited by Guarantee, Registered in England under Company No. 534147 with its Registered Office at 123 Old Brompton Road, London SW7 3RP. This e-mail message is confidential and for use by the addressee only. If the message is received by anyone other than the addressee, please return the message to the sender by replying to it and then delete the message from your computer and network.
Re: [ccp4bb] B-factor & Space gr questions!
Hi All, Thanks a lot for all reply with valuable inputs. In my original post: I meant a = b "does not equal" c. I used # for "does not equal". Many asked where is that paper published! Actually the paper is under revision. When reading, I assumed the unit cell dimensions (or the space group) is a typo as others thought. The low B value for the low resolution structure makes me suspicious that something is wrong. In my little experience, and as others mentioned, B-factor is expected to be around 70-80 for 2.8 A structure and very likely higher for 3.0 A structure. David Briggs suggested that they reported the Wilson B-factor; however, clearly, it is reported as the B-factor of the refined structure. Also, the Rwork = Rfree indicated that something is not right with the refinement protocol but I was not sure what that could be! The suspect that they did not transfer the FreeR sounds reasonable explanation. thanks a lot, Ibrahim At 03:50 AM 6/5/2007, Eleanor Dodson wrote: Yes; a==b for P6i - prob. a typo.. B factors at 3.2A are hard to fix - it will depend on scaling convention to some extent.. Can you download the data and re-run refinement for your own satisfaction. If R ==Rfree for the complex then I suspect they did not transfer the FreeR flags from the apo-protein data to the complex. Again if the data is available you may be able to check this. Eleanor Ibrahim M. Moustafa wrote: Hi all, While reading a crystallographic paper describing the structure of an apo-protein and its complex I noticed that the authors described the space goup as P6122 for the unit cell: a=141.9, b=143.9, c=380.4 ! Could this be considered as a typo or I'm missing something here! the requirement for the hexagonal is a = b # Cright? Another observation in that paper too: the B-factors for the 2.4 A and 3.2 A structures are 39 and 40?? Does this make sense to anyone?? The last question: In the same paper, for the complex structure R and Rfree are equal (30%) is that an indication for improper refinement in these published structure? I'd love to hear your comments on that too. thanks, Ibrahim -- Ibrahim M. Moustafa, Ph.D. Biochemistry and Molecular Biology Dept. 201 Althouse Lab., Uinversity Park Pennsylvania State University, PA16802 Tel. (814)863-8703 Fax. (814)865-7927 -- -- Ibrahim M. Moustafa, Ph.D. Biochemistry and Molecular Biology Dept. 201 Althouse Lab., University Park Pennsylvania State University, PA16802 Tel. (814)863-8703 Fax. (814)865-7927 --
Re: [ccp4bb] B-factor & Space gr questions!
Yes; a==b for P6i - prob. a typo.. B factors at 3.2A are hard to fix - it will depend on scaling convention to some extent.. Can you download the data and re-run refinement for your own satisfaction. If R ==Rfree for the complex then I suspect they did not transfer the FreeR flags from the apo-protein data to the complex. Again if the data is available you may be able to check this. Eleanor Ibrahim M. Moustafa wrote: Hi all, While reading a crystallographic paper describing the structure of an apo-protein and its complex I noticed that the authors described the space goup as P6122 for the unit cell: a=141.9, b=143.9, c=380.4 ! Could this be considered as a typo or I'm missing something here! the requirement for the hexagonal is a = b # Cright? Another observation in that paper too: the B-factors for the 2.4 A and 3.2 A structures are 39 and 40?? Does this make sense to anyone?? The last question: In the same paper, for the complex structure R and Rfree are equal (30%) is that an indication for improper refinement in these published structure? I'd love to hear your comments on that too. thanks, Ibrahim -- Ibrahim M. Moustafa, Ph.D. Biochemistry and Molecular Biology Dept. 201 Althouse Lab., Uinversity Park Pennsylvania State University, PA16802 Tel. (814)863-8703 Fax. (814)865-7927 --
Re: [ccp4bb] B-factor & Space gr questions!
Hi Ibrahim, On 04/06/07, Ibrahim M. Moustafa <[EMAIL PROTECTED]> wrote: Hi all, While reading a crystallographic paper describing the structure of an apo-protein and its complex I noticed that the authors described the space goup as P6122 for the unit cell: a=141.9, b=143.9, c=380.4 Could this be considered as a typo or I'm missing something here! the requirement for the hexagonal is a = b # Cright? You are correct, for Hexagonal, a=b - so It's got to be a typo - data most processing software wouldn't let you do this. Another observation in that paper too: the B-factors for the 2.4 A and 3.2 A structures are 39 and 40?? Does this make sense to anyone? They're quoting Wilson B-factors, I imagine. A small but rather important difference - where was this published? The last question: In the same paper, for the complex structure R and Rfree are equal (30%) is that an indication for improper refinement in these published structure? I'd love to hear your comments on that too. Well, it certainly is a little suspicious looking - I've had similar experiences to Ed, regarding similar R & Rfrees from rigid rigid body refinement prior to positional refinement. Have the authors deposited the Structure factors? I would use EDS to check the maps out: eds.bmc.uu.se/eds/ thanks, Ibrahim HTH, Dave -- --- David Briggs, PhD. Father & Crystallographer www.dbriggs.talktalk.net iChat AIM ID: DBassophile --- Anyone who is capable of getting themselves made President should on no account be allowed to do the job. - Douglas Adams
Re: [ccp4bb] B-factor & Space gr questions!
Ibrahim M. Moustafa wrote: The last question: In the same paper, for the complex structure R and Rfree are equal (30%) is that an indication for improper refinement in these published structure? I'd love to hear your comments on that too. Several times I solved low resolution structures using high resolution models, and noticed that R-free increased during atomic positional refinement. This could be expected from the assertion that after refinement to convergence, the final values should not depend on the starting point: If I had started with a crude model and refined against low resolution data, Rfree would not have gone as low as the high-resolution model, so if I start with the high resolution model and refine, Rfree should worsen to the same value as the structure converges to the same point. Thinking about the main purpose of the Rfree statistic, in a very real way this tells me that the model was better before this step of refinement, and it would be better to omit the minimization step. Perhaps this is what the authors did. On the other hand it does not seem quite right submit a model that has simply been rigid-body-refined against the data- I would prefer to refine to convergence and submit the best model that can be supported by the data alone, rather than a better model which is really the model from a better dataset repositioned in the new crystal. Ed