Re: [relax-users] Possible to get R2eff_(back_calc) when doing clustered CPMG-RD analysis?

Johan Wallerstein Thu, 28 Jul 2022 12:41:03 -0700

Hi,

Thank you very much.
I apologise for very late reply, the main reason for my delayed reply is that 
my summer holiday with kids/family started in beginning of July. For some 
reason I believe it is good for me to decouple from the job once in a while.


See comments further down:

On 2 Jul 2022, at 10:06, Edward d'Auvergne 
<[email protected]<mailto:[email protected]>> wrote:

On Thu, 30 Jun 2022 at 08:19, Johan Wallerstein 
<[email protected]<mailto:[email protected]>> wrote:

Hi,

I perform CPMG-RD cluster fitting using relax, cluster refer to grouping 
several residues (between 3 to 14 residues) for data from a 45 kDa protein. The 
software is a good tool for doing this analysis. I marginally adjust the core 
protocol with the header



"""Script for performing a full relaxation dispersion analysis using CPMG-type 
data."""

I use only the CR72-model and I have a PRE_RUN_DIR from a run with individual 
residues. I use duplicates for error estimation, on both the 800 MHz and 900 
MHz data set, and AIC for model selection.
When I analyse the clustered data I’m curious to get R2eff_(back_calc) for each 
data point. I clarify my main question by attaching some of my data.

###########

For residue 530, when I do individual fit I get this output.

From the log-file:

———

The spin cluster [':530@N'].
# Data pipe            Num_params_(k)    Num_data_sets_(n)    Chi2        
Criterion
No Rex - relax_disp    2                 25                   21.11216    
25.11216
CR72 - relax_disp      5                 25                   13.93686    
23.93686
The model from the data pipe 'CR72 - relax_disp' has been selected.

———

The file ‘disp_530_N.out’ in /final gives the following data table:

# Experiment_name    Field_strength_(MHz) Disp_point_(Hz)      R2eff_(measured) 
    R2eff_(back_calc)    R2eff_errors
'SQ CPMG'                   799.870000000            25.000000   
17.523783179912268   16.953711340740483    0.831932502443187
'SQ CPMG'                   799.870000000            50.000000   
16.513029763549930   16.914478241596726    0.805586049587058
'SQ CPMG'                   799.870000000            75.000000   
16.920353186819355   16.875245142453196    0.816049323427317
'SQ CPMG'                   799.870000000           100.000000   
16.667402888129434   16.836012043882192    0.809527349094067
'SQ CPMG'                   799.870000000           150.000000   
16.454146002323920   16.757546676539960    0.804090431533660
'SQ CPMG'                   799.870000000           200.000000   
16.359623786385509   16.679111600438773    0.801698521274394
'SQ CPMG'                   799.870000000           300.000000   
15.525257427659495   16.523477804972345    0.781054888748662
'SQ CPMG'                   799.870000000           350.000000   
16.609858567997016   16.447662190184474    0.808054742944598
'SQ CPMG'                   799.870000000           400.000000   
16.844330710216166   16.374401478154368    0.814080812205130
'SQ CPMG'                   799.870000000           500.000000   
17.414128601521103   16.238705811895670    0.829011905615397
'SQ CPMG'                   799.870000000           600.000000   
16.093980388806685   16.120475644003818    0.795034804815920
'SQ CPMG'                   799.870000000           800.000000   
15.988036247232372   15.937187687218284    0.792401090807446
'SQ CPMG'                   799.870000000          1000.000000   
15.732649459437805   15.811741022120714    0.786107934589661
'SQ CPMG'                   900.130000000            57.000000   
19.386713898811351   20.163621643615215    0.801212497068354
'SQ CPMG'                   900.130000000           114.000000   
21.873502893081564   20.050473540803750    0.859660006101508
'SQ CPMG'                   900.130000000           171.000000   
19.133628964210569   19.937331394199191    0.795598311646227
'SQ CPMG'                   900.130000000           228.000000   
20.497316023709256   19.824330722189416    0.826567798566107
'SQ CPMG'                   900.130000000           285.000000   
20.091262254550443   19.712140304427066    0.817160298225920
'SQ CPMG'                   900.130000000           400.000000   
19.177817248045365   19.494278567900892    0.796574222459005
'SQ CPMG'                   900.130000000           514.000000   
19.111643299707755   19.300194513689348    0.795113430566997
'SQ CPMG'                   900.130000000           628.000000   
18.432363807026835   19.135138271300775    0.780352695478047
'SQ CPMG'                   900.130000000           742.000000   
19.383070346051138   18.999531230125285    0.801131245976946
'SQ CPMG'                   900.130000000           857.000000   
18.560791856291317   18.889165645990943    0.783110910382522
'SQ CPMG'                   900.130000000           971.000000   
18.810639108776328   18.801416118812085    0.788520121686263
'SQ CPMG'                   900.130000000          1085.000000   
18.943973311789268   18.730884832131551    0.791430360141496

###########

For a cluster fit (including residue 530) I get this output from the log-file:

———

The spin cluster [':530@N', ':536@N', ':537@N', ':538@N', ':550@N', ':551@N', 
':552@N'].
# Data pipe            Num_params_(k)    Num_data_sets_(n)    Chi2         
Criterion
No Rex - relax_disp    14                175                  458.66116    
486.66116
CR72 - relax_disp      23                175                  117.29418    
163.29418
The model from the data pipe 'CR72 - relax_disp' has been selected.
———

This looks reasonable.  This is 7 spins, so on average, 117.29/7 =
16.76, which is a little more than the single spin value of 13.94.


But there is no corresponding data table.

Do you mean that there is no ‘disp_530_N.out’ file for the clustered analysis?

The file is there, I did not realise that the file disp_???.out in a clustered 
analysis contains the new data.
I think I should have understood this…



###########

QUESTION 1:
Is it possible to get, or easily create a table with, in my case, 175 
R2eff_(back_calc) for the cluster, so that I can get better resolution on the 
Chi2 = 117.29418 above ?
And possibly study how a single residue affect the cluster fitting.

Try the value.write() user function:

   https://www.nmr-relax.com/manual/value_write.html

Make sure to set the 'bc' flag to True.

Thanks!


QUESTION 2:
Are there any reference to methods used for doing efficient selection of 
residues included in the cluster? There is obviously an immense number of 
combinations of residues to make clusters in a normal size protein. I consider 
making a program/script for this process and would be curious to get some 
inspiration.

As far as I am aware, human logic is used for this process.  You
identify a rigid moving unit in your system yourself with similar
dispersion results and then use clustering on that.  I would assume
that an automated system to find clusters would be computationally
very expensive, despite being able to run on a computer cluster via
MPI.  And that such a project would take up half or more of a PhD
student's time.  Then again, I wouldn't be surprised if there is now a
publication exploring this concept.  If you do find one, I'd be
interested in hearing about it.

I appreciate these comments and thoughts on the subject, I’m soon back to 
working mode and will then go more into the topic.
If I find something of interest I'll post it.
Before holiday I “somehow" managed to create a semi-automatic approach to run 
all possible combinations of a set of residues in relax, ie the number of 
combinations of N things taken K at a time. Lets say the cluster I want to 
investigate consists of 8 residues (N) but I’m not confident in that selection. 
The 8 residues is the core cluster and I let relax run the combinations of 6 
(K) of these 8 core residues, it turns out to be 28 relax runs. I’m not sure 
this is a good approach, I need to evaluate it.

A problem with the data is the low signal / noise ratio. It is a large protein. 
This has of course many consequences for my CPMG-RD analysis with relax.

(28 comes from N!/K!(N-K)! )

Best regards

Johan



On 2 Jul 2022, at 10:06, Edward d'Auvergne 
<[email protected]<mailto:[email protected]>> wrote:

On Thu, 30 Jun 2022 at 08:19, Johan Wallerstein 
<[email protected]<mailto:[email protected]>> wrote:

Hi,

I perform CPMG-RD cluster fitting using relax, cluster refer to grouping 
several residues (between 3 to 14 residues) for data from a 45 kDa protein. The 
software is a good tool for doing this analysis. I marginally adjust the core 
protocol with the header



"""Script for performing a full relaxation dispersion analysis using CPMG-type 
data."""

I use only the CR72-model and I have a PRE_RUN_DIR from a run with individual 
residues. I use duplicates for error estimation, on both the 800 MHz and 900 
MHz data set, and AIC for model selection.
When I analyse the clustered data I’m curious to get R2eff_(back_calc) for each 
data point. I clarify my main question by attaching some of my data.

###########

For residue 530, when I do individual fit I get this output.

From the log-file:

———

The spin cluster [':530@N'].
# Data pipe            Num_params_(k)    Num_data_sets_(n)    Chi2        
Criterion
No Rex - relax_disp    2                 25                   21.11216    
25.11216
CR72 - relax_disp      5                 25                   13.93686    
23.93686
The model from the data pipe 'CR72 - relax_disp' has been selected.

———

The file ‘disp_530_N.out’ in /final gives the following data table:

# Experiment_name    Field_strength_(MHz) Disp_point_(Hz)      R2eff_(measured) 
    R2eff_(back_calc)    R2eff_errors
'SQ CPMG'                   799.870000000            25.000000   
17.523783179912268   16.953711340740483    0.831932502443187
'SQ CPMG'                   799.870000000            50.000000   
16.513029763549930   16.914478241596726    0.805586049587058
'SQ CPMG'                   799.870000000            75.000000   
16.920353186819355   16.875245142453196    0.816049323427317
'SQ CPMG'                   799.870000000           100.000000   
16.667402888129434   16.836012043882192    0.809527349094067
'SQ CPMG'                   799.870000000           150.000000   
16.454146002323920   16.757546676539960    0.804090431533660
'SQ CPMG'                   799.870000000           200.000000   
16.359623786385509   16.679111600438773    0.801698521274394
'SQ CPMG'                   799.870000000           300.000000   
15.525257427659495   16.523477804972345    0.781054888748662
'SQ CPMG'                   799.870000000           350.000000   
16.609858567997016   16.447662190184474    0.808054742944598
'SQ CPMG'                   799.870000000           400.000000   
16.844330710216166   16.374401478154368    0.814080812205130
'SQ CPMG'                   799.870000000           500.000000   
17.414128601521103   16.238705811895670    0.829011905615397
'SQ CPMG'                   799.870000000           600.000000   
16.093980388806685   16.120475644003818    0.795034804815920
'SQ CPMG'                   799.870000000           800.000000   
15.988036247232372   15.937187687218284    0.792401090807446
'SQ CPMG'                   799.870000000          1000.000000   
15.732649459437805   15.811741022120714    0.786107934589661
'SQ CPMG'                   900.130000000            57.000000   
19.386713898811351   20.163621643615215    0.801212497068354
'SQ CPMG'                   900.130000000           114.000000   
21.873502893081564   20.050473540803750    0.859660006101508
'SQ CPMG'                   900.130000000           171.000000   
19.133628964210569   19.937331394199191    0.795598311646227
'SQ CPMG'                   900.130000000           228.000000   
20.497316023709256   19.824330722189416    0.826567798566107
'SQ CPMG'                   900.130000000           285.000000   
20.091262254550443   19.712140304427066    0.817160298225920
'SQ CPMG'                   900.130000000           400.000000   
19.177817248045365   19.494278567900892    0.796574222459005
'SQ CPMG'                   900.130000000           514.000000   
19.111643299707755   19.300194513689348    0.795113430566997
'SQ CPMG'                   900.130000000           628.000000   
18.432363807026835   19.135138271300775    0.780352695478047
'SQ CPMG'                   900.130000000           742.000000   
19.383070346051138   18.999531230125285    0.801131245976946
'SQ CPMG'                   900.130000000           857.000000   
18.560791856291317   18.889165645990943    0.783110910382522
'SQ CPMG'                   900.130000000           971.000000   
18.810639108776328   18.801416118812085    0.788520121686263
'SQ CPMG'                   900.130000000          1085.000000   
18.943973311789268   18.730884832131551    0.791430360141496

###########

For a cluster fit (including residue 530) I get this output from the log-file:

———

The spin cluster [':530@N', ':536@N', ':537@N', ':538@N', ':550@N', ':551@N', 
':552@N'].
# Data pipe            Num_params_(k)    Num_data_sets_(n)    Chi2         
Criterion
No Rex - relax_disp    14                175                  458.66116    
486.66116
CR72 - relax_disp      23                175                  117.29418    
163.29418
The model from the data pipe 'CR72 - relax_disp' has been selected.
———

This looks reasonable.  This is 7 spins, so on average, 117.29/7 =
16.76, which is a little more than the single spin value of 13.94.


But there is no corresponding data table.

Do you mean that there is no ‘disp_530_N.out’ file for the clustered analysis?


###########

QUESTION 1:
Is it possible to get, or easily create a table with, in my case, 175 
R2eff_(back_calc) for the cluster, so that I can get better resolution on the 
Chi2 = 117.29418 above ?
And possibly study how a single residue affect the cluster fitting.

Try the value.write() user function:

   https://www.nmr-relax.com/manual/value_write.html

Make sure to set the 'bc' flag to True.


QUESTION 2:
Are there any reference to methods used for doing efficient selection of 
residues included in the cluster? There is obviously an immense number of 
combinations of residues to make clusters in a normal size protein. I consider 
making a program/script for this process and would be curious to get some 
inspiration.

As far as I am aware, human logic is used for this process.  You
identify a rigid moving unit in your system yourself with similar
dispersion results and then use clustering on that.  I would assume
that an automated system to find clusters would be computationally
very expensive, despite being able to run on a computer cluster via
MPI.  And that such a project would take up half or more of a PhD
student's time.  Then again, I wouldn't be surprised if there is now a
publication exploring this concept.  If you do find one, I'd be
interested in hearing about it.

Regards,

Edward

_______________________________________________
nmr-relax-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nmr-relax-users

Re: [relax-users] Possible to get R2eff_(back_calc) when doing clustered CPMG-RD analysis?

Reply via email to