Hi, Thank you very much. I apologise for very late reply, the main reason for my delayed reply is that my summer holiday with kids/family started in beginning of July. For some reason I believe it is good for me to decouple from the job once in a while.
See comments further down: On 2 Jul 2022, at 10:06, Edward d'Auvergne <[email protected]<mailto:[email protected]>> wrote: On Thu, 30 Jun 2022 at 08:19, Johan Wallerstein <[email protected]<mailto:[email protected]>> wrote: Hi, I perform CPMG-RD cluster fitting using relax, cluster refer to grouping several residues (between 3 to 14 residues) for data from a 45 kDa protein. The software is a good tool for doing this analysis. I marginally adjust the core protocol with the header """Script for performing a full relaxation dispersion analysis using CPMG-type data.""" I use only the CR72-model and I have a PRE_RUN_DIR from a run with individual residues. I use duplicates for error estimation, on both the 800 MHz and 900 MHz data set, and AIC for model selection. When I analyse the clustered data I’m curious to get R2eff_(back_calc) for each data point. I clarify my main question by attaching some of my data. ########### For residue 530, when I do individual fit I get this output. From the log-file: ——— The spin cluster [':530@N']. # Data pipe Num_params_(k) Num_data_sets_(n) Chi2 Criterion No Rex - relax_disp 2 25 21.11216 25.11216 CR72 - relax_disp 5 25 13.93686 23.93686 The model from the data pipe 'CR72 - relax_disp' has been selected. ——— The file ‘disp_530_N.out’ in /final gives the following data table: # Experiment_name Field_strength_(MHz) Disp_point_(Hz) R2eff_(measured) R2eff_(back_calc) R2eff_errors 'SQ CPMG' 799.870000000 25.000000 17.523783179912268 16.953711340740483 0.831932502443187 'SQ CPMG' 799.870000000 50.000000 16.513029763549930 16.914478241596726 0.805586049587058 'SQ CPMG' 799.870000000 75.000000 16.920353186819355 16.875245142453196 0.816049323427317 'SQ CPMG' 799.870000000 100.000000 16.667402888129434 16.836012043882192 0.809527349094067 'SQ CPMG' 799.870000000 150.000000 16.454146002323920 16.757546676539960 0.804090431533660 'SQ CPMG' 799.870000000 200.000000 16.359623786385509 16.679111600438773 0.801698521274394 'SQ CPMG' 799.870000000 300.000000 15.525257427659495 16.523477804972345 0.781054888748662 'SQ CPMG' 799.870000000 350.000000 16.609858567997016 16.447662190184474 0.808054742944598 'SQ CPMG' 799.870000000 400.000000 16.844330710216166 16.374401478154368 0.814080812205130 'SQ CPMG' 799.870000000 500.000000 17.414128601521103 16.238705811895670 0.829011905615397 'SQ CPMG' 799.870000000 600.000000 16.093980388806685 16.120475644003818 0.795034804815920 'SQ CPMG' 799.870000000 800.000000 15.988036247232372 15.937187687218284 0.792401090807446 'SQ CPMG' 799.870000000 1000.000000 15.732649459437805 15.811741022120714 0.786107934589661 'SQ CPMG' 900.130000000 57.000000 19.386713898811351 20.163621643615215 0.801212497068354 'SQ CPMG' 900.130000000 114.000000 21.873502893081564 20.050473540803750 0.859660006101508 'SQ CPMG' 900.130000000 171.000000 19.133628964210569 19.937331394199191 0.795598311646227 'SQ CPMG' 900.130000000 228.000000 20.497316023709256 19.824330722189416 0.826567798566107 'SQ CPMG' 900.130000000 285.000000 20.091262254550443 19.712140304427066 0.817160298225920 'SQ CPMG' 900.130000000 400.000000 19.177817248045365 19.494278567900892 0.796574222459005 'SQ CPMG' 900.130000000 514.000000 19.111643299707755 19.300194513689348 0.795113430566997 'SQ CPMG' 900.130000000 628.000000 18.432363807026835 19.135138271300775 0.780352695478047 'SQ CPMG' 900.130000000 742.000000 19.383070346051138 18.999531230125285 0.801131245976946 'SQ CPMG' 900.130000000 857.000000 18.560791856291317 18.889165645990943 0.783110910382522 'SQ CPMG' 900.130000000 971.000000 18.810639108776328 18.801416118812085 0.788520121686263 'SQ CPMG' 900.130000000 1085.000000 18.943973311789268 18.730884832131551 0.791430360141496 ########### For a cluster fit (including residue 530) I get this output from the log-file: ——— The spin cluster [':530@N', ':536@N', ':537@N', ':538@N', ':550@N', ':551@N', ':552@N']. # Data pipe Num_params_(k) Num_data_sets_(n) Chi2 Criterion No Rex - relax_disp 14 175 458.66116 486.66116 CR72 - relax_disp 23 175 117.29418 163.29418 The model from the data pipe 'CR72 - relax_disp' has been selected. ——— This looks reasonable. This is 7 spins, so on average, 117.29/7 = 16.76, which is a little more than the single spin value of 13.94. But there is no corresponding data table. Do you mean that there is no ‘disp_530_N.out’ file for the clustered analysis? The file is there, I did not realise that the file disp_???.out in a clustered analysis contains the new data. I think I should have understood this… ########### QUESTION 1: Is it possible to get, or easily create a table with, in my case, 175 R2eff_(back_calc) for the cluster, so that I can get better resolution on the Chi2 = 117.29418 above ? And possibly study how a single residue affect the cluster fitting. Try the value.write() user function: https://www.nmr-relax.com/manual/value_write.html Make sure to set the 'bc' flag to True. Thanks! QUESTION 2: Are there any reference to methods used for doing efficient selection of residues included in the cluster? There is obviously an immense number of combinations of residues to make clusters in a normal size protein. I consider making a program/script for this process and would be curious to get some inspiration. As far as I am aware, human logic is used for this process. You identify a rigid moving unit in your system yourself with similar dispersion results and then use clustering on that. I would assume that an automated system to find clusters would be computationally very expensive, despite being able to run on a computer cluster via MPI. And that such a project would take up half or more of a PhD student's time. Then again, I wouldn't be surprised if there is now a publication exploring this concept. If you do find one, I'd be interested in hearing about it. I appreciate these comments and thoughts on the subject, I’m soon back to working mode and will then go more into the topic. If I find something of interest I'll post it. Before holiday I “somehow" managed to create a semi-automatic approach to run all possible combinations of a set of residues in relax, ie the number of combinations of N things taken K at a time. Lets say the cluster I want to investigate consists of 8 residues (N) but I’m not confident in that selection. The 8 residues is the core cluster and I let relax run the combinations of 6 (K) of these 8 core residues, it turns out to be 28 relax runs. I’m not sure this is a good approach, I need to evaluate it. A problem with the data is the low signal / noise ratio. It is a large protein. This has of course many consequences for my CPMG-RD analysis with relax. (28 comes from N!/K!(N-K)! ) Best regards Johan On 2 Jul 2022, at 10:06, Edward d'Auvergne <[email protected]<mailto:[email protected]>> wrote: On Thu, 30 Jun 2022 at 08:19, Johan Wallerstein <[email protected]<mailto:[email protected]>> wrote: Hi, I perform CPMG-RD cluster fitting using relax, cluster refer to grouping several residues (between 3 to 14 residues) for data from a 45 kDa protein. The software is a good tool for doing this analysis. I marginally adjust the core protocol with the header """Script for performing a full relaxation dispersion analysis using CPMG-type data.""" I use only the CR72-model and I have a PRE_RUN_DIR from a run with individual residues. I use duplicates for error estimation, on both the 800 MHz and 900 MHz data set, and AIC for model selection. When I analyse the clustered data I’m curious to get R2eff_(back_calc) for each data point. I clarify my main question by attaching some of my data. ########### For residue 530, when I do individual fit I get this output. From the log-file: ——— The spin cluster [':530@N']. # Data pipe Num_params_(k) Num_data_sets_(n) Chi2 Criterion No Rex - relax_disp 2 25 21.11216 25.11216 CR72 - relax_disp 5 25 13.93686 23.93686 The model from the data pipe 'CR72 - relax_disp' has been selected. ——— The file ‘disp_530_N.out’ in /final gives the following data table: # Experiment_name Field_strength_(MHz) Disp_point_(Hz) R2eff_(measured) R2eff_(back_calc) R2eff_errors 'SQ CPMG' 799.870000000 25.000000 17.523783179912268 16.953711340740483 0.831932502443187 'SQ CPMG' 799.870000000 50.000000 16.513029763549930 16.914478241596726 0.805586049587058 'SQ CPMG' 799.870000000 75.000000 16.920353186819355 16.875245142453196 0.816049323427317 'SQ CPMG' 799.870000000 100.000000 16.667402888129434 16.836012043882192 0.809527349094067 'SQ CPMG' 799.870000000 150.000000 16.454146002323920 16.757546676539960 0.804090431533660 'SQ CPMG' 799.870000000 200.000000 16.359623786385509 16.679111600438773 0.801698521274394 'SQ CPMG' 799.870000000 300.000000 15.525257427659495 16.523477804972345 0.781054888748662 'SQ CPMG' 799.870000000 350.000000 16.609858567997016 16.447662190184474 0.808054742944598 'SQ CPMG' 799.870000000 400.000000 16.844330710216166 16.374401478154368 0.814080812205130 'SQ CPMG' 799.870000000 500.000000 17.414128601521103 16.238705811895670 0.829011905615397 'SQ CPMG' 799.870000000 600.000000 16.093980388806685 16.120475644003818 0.795034804815920 'SQ CPMG' 799.870000000 800.000000 15.988036247232372 15.937187687218284 0.792401090807446 'SQ CPMG' 799.870000000 1000.000000 15.732649459437805 15.811741022120714 0.786107934589661 'SQ CPMG' 900.130000000 57.000000 19.386713898811351 20.163621643615215 0.801212497068354 'SQ CPMG' 900.130000000 114.000000 21.873502893081564 20.050473540803750 0.859660006101508 'SQ CPMG' 900.130000000 171.000000 19.133628964210569 19.937331394199191 0.795598311646227 'SQ CPMG' 900.130000000 228.000000 20.497316023709256 19.824330722189416 0.826567798566107 'SQ CPMG' 900.130000000 285.000000 20.091262254550443 19.712140304427066 0.817160298225920 'SQ CPMG' 900.130000000 400.000000 19.177817248045365 19.494278567900892 0.796574222459005 'SQ CPMG' 900.130000000 514.000000 19.111643299707755 19.300194513689348 0.795113430566997 'SQ CPMG' 900.130000000 628.000000 18.432363807026835 19.135138271300775 0.780352695478047 'SQ CPMG' 900.130000000 742.000000 19.383070346051138 18.999531230125285 0.801131245976946 'SQ CPMG' 900.130000000 857.000000 18.560791856291317 18.889165645990943 0.783110910382522 'SQ CPMG' 900.130000000 971.000000 18.810639108776328 18.801416118812085 0.788520121686263 'SQ CPMG' 900.130000000 1085.000000 18.943973311789268 18.730884832131551 0.791430360141496 ########### For a cluster fit (including residue 530) I get this output from the log-file: ——— The spin cluster [':530@N', ':536@N', ':537@N', ':538@N', ':550@N', ':551@N', ':552@N']. # Data pipe Num_params_(k) Num_data_sets_(n) Chi2 Criterion No Rex - relax_disp 14 175 458.66116 486.66116 CR72 - relax_disp 23 175 117.29418 163.29418 The model from the data pipe 'CR72 - relax_disp' has been selected. ——— This looks reasonable. This is 7 spins, so on average, 117.29/7 = 16.76, which is a little more than the single spin value of 13.94. But there is no corresponding data table. Do you mean that there is no ‘disp_530_N.out’ file for the clustered analysis? ########### QUESTION 1: Is it possible to get, or easily create a table with, in my case, 175 R2eff_(back_calc) for the cluster, so that I can get better resolution on the Chi2 = 117.29418 above ? And possibly study how a single residue affect the cluster fitting. Try the value.write() user function: https://www.nmr-relax.com/manual/value_write.html Make sure to set the 'bc' flag to True. QUESTION 2: Are there any reference to methods used for doing efficient selection of residues included in the cluster? There is obviously an immense number of combinations of residues to make clusters in a normal size protein. I consider making a program/script for this process and would be curious to get some inspiration. As far as I am aware, human logic is used for this process. You identify a rigid moving unit in your system yourself with similar dispersion results and then use clustering on that. I would assume that an automated system to find clusters would be computationally very expensive, despite being able to run on a computer cluster via MPI. And that such a project would take up half or more of a PhD student's time. Then again, I wouldn't be surprised if there is now a publication exploring this concept. If you do find one, I'd be interested in hearing about it. Regards, Edward
_______________________________________________ nmr-relax-users mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nmr-relax-users
