Re: [ccp4bb] Problems with phasing a protein (1300aa)

Anastassis Perrakis Tue, 24 Mar 2009 02:27:24 -0700

Hi -

I must agree that if one RTFMs its clear. Now, back to the real world.


My experience is that most SCALA users tend to look at I/sigmaI.

I admit that I had been using that for deciding data cutoff, manytimes, but thats another discussion.

The vast majority of "Table 1" I see report I/sigmaI, with neverclarifying if thats <I>/sigma(<I>) or <I/sigma(I)>.


Anyway ...

SCALA indeed clearly says

         SIGMA   :- rms scatter of observations

sd :- average standard deviation derived fromexperimental SDs, after


XDS clearly says as well:

 I/SIGMA  = mean of intensity/Sigma(I) of unique reflections
            (after merging symmetry-related observations)

so, if I understand it right, SCALA and XDS use for a different metricthe same shorthand label,even if they indeed clarify what it is in each case. A user would tendto look in both cases

to I/sigma (since thats the "jargon" that prevailed)

d*TREK could confuse people further as to which one should be used forreporting and decision making.

Scalepack conveniently ignores the problem by not reporting the I/sigma, but stilllets you wonder if you should divide the I with the average error orthe Average 'stat.'.

(I am still confused btw of what one should do).

The Big Question Again:

Which the infamous "I/sigma" that "should be above 2.0" and should bein "Table 1" ...?

If we agree we want <I/sigma(I)> (after sigmas are corrected Ipresume) here is a solution:

Since Phil is generous enough to offer to rename the Mn(I/sd) columnto I/SIGMAand maybe rename I/sigma to I/rms-scatter or so, if Jim could alsorename

one of his columns, and Wladek can add one, we can have a standard.

Or even better if everybody would use <I/sigma(I)> then we would evenstart

getting the tables right in the papers?

(... I could see this train of thought going out of its rails while Iwas typing ...)


Tassos


On Mar 23, 2009, at 17:08, Phil Evans wrote:

I'm happy to change the column titles if it makes it clearer. Actually
the "I/sigma" column in the Scala output is not very useful:
it is  <I> / RMSscatter, ie the mean intensity/mean error, for
individual observations, not taking into account multiple
measurements. Because it is ratio of means (rather than a mean of
ratios), it can behave oddly depending on the distribution of
intensities, for instance giving an overall value which is outside the
range of values in resolution bins. It is the ratio of the previous
two columns.

On the other hand the column labelled "Mn(I)/sd" is the mean of ratios
for each reflection, ie< <I>/σ(<I>) > and does take into account the
multiplicity of measurements, so is much more relevant as an indicator
of data quality

see
http://www.ccp4wiki.org/~ccp4wiki/wiki/index.php?title=Scaling_experimental_intensities_with_Scala

Scala also outputs a convenient "Table 1" summary

On 23 Mar 2009, at 15:50, James Holton wrote:

I guess when I talk about signal-to-noise I assume the one that is
most relevant to the task at hand.  So, to me, I/sigma(I) at the
phasing step would be the average intensity (I) divided by the sigma
(standard deviation) assigned to it AFTER scaling/mergeing.  I admit
that the "I/sigma" column from SCALA is potentially confusing, but
if you are dealing with spot intensities, this is the first I/sigma
you think about, so I guess this is what Phil was thinking.

Personally, I think the descriptions of the columns in this table
are clear if you read the caption before the table in the SCALA
output, but Tassos is right that an alarming number of people have
never done this.  RTFM?

When it doubt, use mtzdmp to see what the average values of the data
columns really are.

-James Holton
MAD Scientist

Anastassis Perrakis wrote:

I like to think of things in terms of signal-to-noise, and one can
use a
rearrangement of the Crick-Magdoff equation to tell you what the I/
sigma
of your data set needs to be for delta-F to be greater than
sigma(delta-F):

I/sigma(I) > 1.3*sqrt(Daltons/sites)/f"

where:
I/sigma(I) is the signal-to-noise ratio of the data set required to
solve it by MAD/SAD
Daltons   is the molecular weight of the protein in amu
sites         is the number of Se sites
f"            is the f" of those sites (in "electrons")


let me see .... we recently solved a 200 residues protein, 4 mol
AU, with 2 Se per mol, total 8 Se.
Since 160 residues were ordered, I will make for you a discount,
18,000 D/monomer, 70,000 in AU.
I truncated data to 4.2 for Se search.

1.3*sqrt(70000/8)/6.5= 19

Statistics from Scala:

 N 1/d^2 Dmin(A) Rmrg  Rfull   Rcum  Ranom  Nanom    Av_I  SIGMA I/
sigma   sd Mn(I/sd)    1 0.0098 10.11  0.048  0.049  0.048
0.036    349     4967    419  11.9    342  25.8       2 0.0196
7.15  0.050  0.044  0.049  0.031    707     5360    462  11.6
372  28.8       3 0.0293  5.84  0.089  0.062  0.057  0.047
975     1634    224   7.3    177  19.4       4 0.0391  5.06  0.065
0.048  0.059  0.039   1140     2107    207  10.2    218  21.0
5 0.0489  4.52  0.061  0.043  0.060  0.034   1315     2523    227
11.1    253  21.9      6 0.0587  4.13  0.072  0.051  0.062  0.035
1470     2142    223   9.6    242  20.2      7 0.0685  3.82  0.091
0.061  0.066  0.042   1605     1566    203   7.7    219  16.1     8
0.0782  3.57  0.128  0.086  0.071  0.052   1737     1034    186
5.6    199  12.3      9 0.0880  3.37  0.189  0.137  0.077  0.074
1859      667    181   3.7    187   8.9     10 0.0978  3.20  0.314
0.224  0.085  0.129   1940      374    170   2.2    178   5.5
So, that would support your argument.

HOWEVER that would mean looking at the M(I/sd) in the table!!!

"Mn(I/sd)" is not the same as  "I/sigma" in Scala notation!!!! Most
people think of I/sigma(I) in your notation,
to be  the I/sigma in the scala output, or the I devided by sd in
the Denzo output. These are (very) different.
I am not sure which you meant since I/sigma(I) is not the full
notation (place the <> in the favorite place first ...), but it
seems correct if you  meant Mn(I/sd) which most people do not quote
or use much ;-)


Greetings,
Tassos


A.


P please don't print this e-mail unless you really need to
Anastassis (Tassos) Perrakis, Principal Investigator / Staff Member
Department of Biochemistry (B8)
Netherlands Cancer Institute,
Dept. B8, 1066 CX Amsterdam, The Netherlands
Tel: +31 20 512 1951 Fax: +31 20 512 1954 Mobile / SMS: +31 6 28 597791

Re: [ccp4bb] Problems with phasing a protein (1300aa)

Reply via email to