Re: [ccp4bb] off-topic: negative thermal shift upon ligand binding

2017-07-08 Thread Nicholas Larsen
In theory, what you say is quite sensible.  But there is one interesting
counter example I am aware of.The fragment tool compound that
eventually gave rise to the clinical compound indeglitazar (
http://www.pnas.org/content/106/1/262.full.pdf) gives a negative shift by
DSF (in our hands):
[image: Inline image 1]

Therefore, this experience taught us to bin compounds that are negative and
positive and follow up on both, prioritizing depending on this and any
other data we might have.In the end, we don't overthink it and just put
them in other assays and crystallography if appropriate.

Waving my hands around, you might imagine a scenario where the dye itself
binds and stabilizes a folded form of the protein.  If the fragment also
binds AND displaces the dye AND the fragment stabilizes the protein LESS
effectively than the dye, THEN I believe you could have a true binder that
gives a negative shift.

Nick


On Sat, Jul 8, 2017 at 8:30 PM, megha abbey  wrote:

> Hello,
>
> I am working on DSF to verify if some compounds bind to my protein. I see
> a negative shift of about 3-4 degrees upon ligand addition (dose-response)
> in comparison to the protein alone. I assume that this might be due to the
> binding of compound to the unfolded stated rather than folded protein.
>
> In such a situation where compounds are to be screened with the aim of
> drug discovery, are these negative thermal shift compounds relevant and how
> can they be followed upon, or they should simply be discarded?
>
> Thank you.
>

-- 
[This e-mail message may contain privileged, confidential and/or 
proprietary information of H3 Biomedicine. If you believe that it has been 
sent to you in error, please contact the sender immediately and delete the 
message including any attachments, without copying, using, or distributing 
any of the information contained therein. This e-mail message should not be 
interpreted to include a digital or electronic signature that can be used 
to authenticate an agreement, contract or other legal document, nor to 
reflect an intention to be bound to any legally-binding agreement or 
contract.]


Re: [ccp4bb] off-topic: negative thermal shift upon ligand binding

2017-07-08 Thread Bonsor, Daniel
What are you using to dissolve the compounds? If DMSO or other organics, you 
maybe witnessing unfolding due to the organics. Have you done a control of just 
the solution with the protein.

Some compounds may drastically change the pH of the buffer your protein is in. 
You maybe observing a change in the stability of your protein due to pH change.

You could also try the same compounds against BSA, lysozyme and other control 
proteins to see if you observe a similar effect.

Dan

Get Outlook for Android



From: megha abbey
Sent: Saturday, 8 July, 20:30
Subject: [ccp4bb] off-topic: negative thermal shift upon ligand binding
To: ccp4bb@jiscmail.ac.uk


Hello,


I am working on DSF to verify if some compounds bind to my protein. I see a 
negative shift of about 3-4 degrees upon ligand addition (dose-response) in 
comparison to the protein alone. I assume that this might be due to the binding 
of compound to the unfolded stated rather than folded protein.


In such a situation where compounds are to be screened with the aim of drug 
discovery, are these negative thermal shift compounds relevant and how can they 
be followed upon, or they should simply be discarded?


Thank you.





[ccp4bb] off-topic: negative thermal shift upon ligand binding

2017-07-08 Thread megha abbey
Hello,

I am working on DSF to verify if some compounds bind to my protein. I see a
negative shift of about 3-4 degrees upon ligand addition (dose-response) in
comparison to the protein alone. I assume that this might be due to the
binding of compound to the unfolded stated rather than folded protein.

In such a situation where compounds are to be screened with the aim of drug
discovery, are these negative thermal shift compounds relevant and how can
they be followed upon, or they should simply be discarded?

Thank you.


[ccp4bb] Arcimboldo, ShelxE, Windows

2017-07-08 Thread Bonsor, Daniel
Dear All,


Does Arcimboldo actually run on WIndows as I can see it is installed and so is 
ShelxE, but I have a warning in the GUI saying "ShelxE is not found and option 
to run on 'this machine' is disabled. Is this a bug, do I need alter something 
so get it to recognize ShelxE or it does not run because I am using WIndows and 
I should get a Mac or Linux.


Thanks for any input or solving this problem.


(This problem is seen in arcimboldo_lite, arcimboldo_borges and 
arcimboldo_shredder).


Dan


Daniel A. Bonsor PhD
Institute of Human Virology,
University of Maryland at Baltimore
725 W. Lombard Street N571
Baltimore
MD 21201




Re: [ccp4bb] Rmergicide Through Programming

2017-07-08 Thread Edward A. Berry

But R-merge is not really narrower as a fraction of the mean value- it just 
gets smaller proportionantly as all the numbers get smaller:
RMSD of .0043 for R-meas multiplied by factor of 0.022/.027 gives 0.0035 which 
is the RMSD for Rmerge. The same was true in the previous example. You could 
multiply R-meas by .5 or .2 and get a sharper distribution yet! And that factor 
would be constant, where this only applies for super-low redundancy.

On 07/08/2017 03:23 PM, James Holton wrote:


The expected distribution of Rmeas values is still wider than that of Rmerge 
for data with I/sigma=30 and average multiplicity=2.0. Graph attached.

I expect that anytime you incorporate more than one source of information you 
run the risk of a noisier statistic because every source of information can 
contain noise.  That is, Rmeas combines information about multiplicity with the 
absolute deviates in the data to form a statistic that is more accurate that 
Rmerge, but also (potentially) less precise.

Perhaps that is what we are debating here?  Which is better? accuracy or 
precision?  Personally, I prefer to know both.

-James Holton
MAD Scientist

On 7/8/2017 11:02 AM, Frank von Delft wrote:


It is quite easy to end up with low multiplicities in the low resolution shell, 
especially for low symmetry and fast-decaying crystals.

It is this scenario where Rmerge (lowres) is more misleading than Reas.

phx


On 08/07/2017 17:31, James Holton wrote:

What does Rmeas tell us that Rmerge doesn't?  Given that we know the 
multiplicity?

-James Holton
MAD Scientist

On 7/8/2017 9:15 AM, Frank von Delft wrote:


Anyway, back to reality:  does anybody still use R statistics to evaluate anything other than 
/strong/ data?  Certainly I never look at it except for the low-resolution bin (or strongest 
reflections). Specifically, a "2%-dataset" in that bin is probably healthy, while a 
"9%-dataset" probably Has Issues.

In which case, back to Jacob's question:  what does Rmerge tell us that Rmeas 
doesn't.

phx




On 08/07/2017 17:02, James Holton wrote:

Sorry for the confusion.  I was going for brevity!  And failed.

I know that the multiplicity correction is applied on a per-hkl basis in the 
calculation of Rmeas.  However, the average multiplicity over the whole 
calculation is most likely not an integer. Some hkls may be observed twice 
while others only once, or perhaps 3-4 times in the same scaling run.

Allow me to do the error propagation properly.  Consider the scenario:

Your outer resolution bin has a true I/sigma = 1.00 and average multiplicity of 2.0. 
Let's say there are 100 hkl indices in this bin.  I choose the "true" 
intensities of each hkl from an exponential (aka Wilson) distribution. Further assume the 
background is high, so the error in each observation after background subtraction may be 
taken from a Gaussian distribution. Let's further choose the per-hkl multiplicity from a 
Poisson distribution with expectation value 2.0, so 0 is possible, but the long-term 
average multiplicity is 2.0. For R calculation, when multiplicity of any given hkl is 
less than 2 it is skipped. What I end up with after 120,000 trials is a distribution of 
values for each R factor.  See attached graph.

What I hope is readily apparent is that the distribution of Rmerge values is taller and 
sharper than that of the Rmeas values.  The most likely Rmeas is 80% and that of Rmerge 
is 64.6%.  This is expected, of course.  But what I hope to impress upon you is that the 
most likely value is not generally the one that you will get! The distribution has a 
width.  Specifically, Rmeas could be as low as 40%, or as high as 209%, depending on the 
trial.  Half of the trial results falling between 71.4% and 90.3%, a range of 19 
percentage points.  Rmerge has a middle-half range from 57.6% to 72.9% (15.3 percentage 
points).  This range of possible values of Rmerge or Rmeas from data with the same 
intrinsic quality is what I mean when I say "numerical instability".  Each and 
every trial had the same true I/sigma and multiplicity, and yet the R factors I get vary 
depending on the trial.  Unfortunately for most of us with real data, you only ever get 
one trial, and you can't predict which
Rmeas or Rmerge you'll get.

My point here is that R statistics in general are not comparable from 
experiment to experiment when you are looking at data with low average 
intensity and low multiplicity, and it appears that Rmeas is less stable than 
Rmerge.  Not by much, mind you, but still jumps around more.

Hope that is clearer?

Note that in no way am I suggesting that low-multiplicity is the right way to 
collect data.  Far from it.  Especially with modern detectors that have 
negligible read-out noise. But when micro crystals only give off a handful of 
photons each before they die, low multiplicity might be all you have.

-James Holton
MAD Scientist



On 7/7/2017 2:33 PM, Edward A. Berry wrote:

I think the confusion here is that the 

Re: [ccp4bb] Rmergicide Through Programming

2017-07-08 Thread James Holton


The expected distribution of Rmeas values is still wider than that of 
Rmerge for data with I/sigma=30 and average multiplicity=2.0. Graph 
attached.


I expect that anytime you incorporate more than one source of 
information you run the risk of a noisier statistic because every source 
of information can contain noise.  That is, Rmeas combines information 
about multiplicity with the absolute deviates in the data to form a 
statistic that is more accurate that Rmerge, but also (potentially) less 
precise.


Perhaps that is what we are debating here?  Which is better? accuracy or 
precision?  Personally, I prefer to know both.


-James Holton
MAD Scientist

On 7/8/2017 11:02 AM, Frank von Delft wrote:


It is quite easy to end up with low multiplicities in the low 
resolution shell, especially for low symmetry and fast-decaying crystals.


It is this scenario where Rmerge (lowres) is more misleading than Reas.

phx


On 08/07/2017 17:31, James Holton wrote:
What does Rmeas tell us that Rmerge doesn't?  Given that we know the 
multiplicity?


-James Holton
MAD Scientist

On 7/8/2017 9:15 AM, Frank von Delft wrote:


Anyway, back to reality:  does anybody still use R statistics to 
evaluate anything other than /strong/ data?  Certainly I never look 
at it except for the low-resolution bin (or strongest reflections). 
Specifically, a "2%-dataset" in that bin is probably healthy, while 
a "9%-dataset" probably Has Issues.


In which case, back to Jacob's question:  what does Rmerge tell us 
that Rmeas doesn't.


phx




On 08/07/2017 17:02, James Holton wrote:

Sorry for the confusion.  I was going for brevity!  And failed.

I know that the multiplicity correction is applied on a per-hkl 
basis in the calculation of Rmeas.  However, the average 
multiplicity over the whole calculation is most likely not an 
integer. Some hkls may be observed twice while others only once, or 
perhaps 3-4 times in the same scaling run.


Allow me to do the error propagation properly.  Consider the scenario:

Your outer resolution bin has a true I/sigma = 1.00 and average 
multiplicity of 2.0. Let's say there are 100 hkl indices in this 
bin.  I choose the "true" intensities of each hkl from an 
exponential (aka Wilson) distribution. Further assume the 
background is high, so the error in each observation after 
background subtraction may be taken from a Gaussian distribution. 
Let's further choose the per-hkl multiplicity from a Poisson 
distribution with expectation value 2.0, so 0 is possible, but the 
long-term average multiplicity is 2.0. For R calculation, when 
multiplicity of any given hkl is less than 2 it is skipped. What I 
end up with after 120,000 trials is a distribution of values for 
each R factor.  See attached graph.


What I hope is readily apparent is that the distribution of Rmerge 
values is taller and sharper than that of the Rmeas values.  The 
most likely Rmeas is 80% and that of Rmerge is 64.6%.  This is 
expected, of course.  But what I hope to impress upon you is that 
the most likely value is not generally the one that you will get! 
The distribution has a width.  Specifically, Rmeas could be as low 
as 40%, or as high as 209%, depending on the trial.  Half of the 
trial results falling between 71.4% and 90.3%, a range of 19 
percentage points.  Rmerge has a middle-half range from 57.6% to 
72.9% (15.3 percentage points).  This range of possible values of 
Rmerge or Rmeas from data with the same intrinsic quality is what I 
mean when I say "numerical instability".  Each and every trial had 
the same true I/sigma and multiplicity, and yet the R factors I get 
vary depending on the trial.  Unfortunately for most of us with 
real data, you only ever get one trial, and you can't predict which 
Rmeas or Rmerge you'll get.


My point here is that R statistics in general are not comparable 
from experiment to experiment when you are looking at data with low 
average intensity and low multiplicity, and it appears that Rmeas 
is less stable than Rmerge.  Not by much, mind you, but still jumps 
around more.


Hope that is clearer?

Note that in no way am I suggesting that low-multiplicity is the 
right way to collect data.  Far from it.  Especially with modern 
detectors that have negligible read-out noise. But when micro 
crystals only give off a handful of photons each before they die, 
low multiplicity might be all you have.


-James Holton
MAD Scientist



On 7/7/2017 2:33 PM, Edward A. Berry wrote:
I think the confusion here is that the "multiplicity correction" 
is applied
on each reflection, where it will be an integer 2 or greater 
(can't estimate
variance with only one measurement). You can only correct in an 
approximate
way using using the average multiplicity of the dataset, since it 
would depend

on the distribution of multiplicity over the reflections.

And the correction is for r-merge. You don't need to apply a 
correction

to R-meas.
R-meas is a redundancy-independent best estimate of the 

Re: [ccp4bb] Rmergicide Through Programming

2017-07-08 Thread Frank von Delft
It is quite easy to end up with low multiplicities in the low resolution 
shell, especially for low symmetry and fast-decaying crystals.


It is this scenario where Rmerge (lowres) is more misleading than Reas.

phx


On 08/07/2017 17:31, James Holton wrote:
What does Rmeas tell us that Rmerge doesn't?  Given that we know the 
multiplicity?


-James Holton
MAD Scientist

On 7/8/2017 9:15 AM, Frank von Delft wrote:


Anyway, back to reality:  does anybody still use R statistics to 
evaluate anything other than /strong/ data? Certainly I never look at 
it except for the low-resolution bin (or strongest reflections).  
Specifically, a "2%-dataset" in that bin is probably healthy, while a 
"9%-dataset" probably Has Issues.


In which case, back to Jacob's question:  what does Rmerge tell us 
that Rmeas doesn't.


phx




On 08/07/2017 17:02, James Holton wrote:

Sorry for the confusion.  I was going for brevity!  And failed.

I know that the multiplicity correction is applied on a per-hkl 
basis in the calculation of Rmeas.  However, the average 
multiplicity over the whole calculation is most likely not an 
integer. Some hkls may be observed twice while others only once, or 
perhaps 3-4 times in the same scaling run.


Allow me to do the error propagation properly.  Consider the scenario:

Your outer resolution bin has a true I/sigma = 1.00 and average 
multiplicity of 2.0. Let's say there are 100 hkl indices in this 
bin.  I choose the "true" intensities of each hkl from an 
exponential (aka Wilson) distribution.  Further assume the 
background is high, so the error in each observation after 
background subtraction may be taken from a Gaussian distribution. 
Let's further choose the per-hkl multiplicity from a Poisson 
distribution with expectation value 2.0, so 0 is possible, but the 
long-term average multiplicity is 2.0. For R calculation, when 
multiplicity of any given hkl is less than 2 it is skipped. What I 
end up with after 120,000 trials is a distribution of values for 
each R factor.  See attached graph.


What I hope is readily apparent is that the distribution of Rmerge 
values is taller and sharper than that of the Rmeas values.  The 
most likely Rmeas is 80% and that of Rmerge is 64.6%.  This is 
expected, of course.  But what I hope to impress upon you is that 
the most likely value is not generally the one that you will get! 
The distribution has a width.  Specifically, Rmeas could be as low 
as 40%, or as high as 209%, depending on the trial.  Half of the 
trial results falling between 71.4% and 90.3%, a range of 19 
percentage points.  Rmerge has a middle-half range from 57.6% to 
72.9% (15.3 percentage points).  This range of possible values of 
Rmerge or Rmeas from data with the same intrinsic quality is what I 
mean when I say "numerical instability".  Each and every trial had 
the same true I/sigma and multiplicity, and yet the R factors I get 
vary depending on the trial. Unfortunately for most of us with real 
data, you only ever get one trial, and you can't predict which Rmeas 
or Rmerge you'll get.


My point here is that R statistics in general are not comparable 
from experiment to experiment when you are looking at data with low 
average intensity and low multiplicity, and it appears that Rmeas is 
less stable than Rmerge.  Not by much, mind you, but still jumps 
around more.


Hope that is clearer?

Note that in no way am I suggesting that low-multiplicity is the 
right way to collect data.  Far from it.  Especially with modern 
detectors that have negligible read-out noise.  But when micro 
crystals only give off a handful of photons each before they die, 
low multiplicity might be all you have.


-James Holton
MAD Scientist



On 7/7/2017 2:33 PM, Edward A. Berry wrote:
I think the confusion here is that the "multiplicity correction" is 
applied
on each reflection, where it will be an integer 2 or greater (can't 
estimate
variance with only one measurement). You can only correct in an 
approximate
way using using the average multiplicity of the dataset, since it 
would depend

on the distribution of multiplicity over the reflections.

And the correction is for r-merge. You don't need to apply a 
correction

to R-meas.
R-meas is a redundancy-independent best estimate of the variance.
Whatever you would have used R-merge for (hopefully taking allowance
for the multiplicity) you can use R-meas and not worry about 
multiplicity.
Again, what information does R-merge provide that R-meas does not 
provide

in a more accurate way?

According to the denso manual, one way to artificially reduce
R-merge is to include reflections with only one measure (averaging
in a lot of zero's always helps bring an average down), and they say
there were actually some programs that did that. However I'm
quite sure none of the ones we rely on today do that.

On 07/07/2017 03:12 PM, Kay Diederichs wrote:

James,

I cannot follow you. "n approaches 1" can only mean n = 2 because 
n is integer. And for 

Re: [ccp4bb] Rmergicide Through Programming

2017-07-08 Thread James Holton
What does Rmeas tell us that Rmerge doesn't?  Given that we know the 
multiplicity?


-James Holton
MAD Scientist

On 7/8/2017 9:15 AM, Frank von Delft wrote:


Anyway, back to reality:  does anybody still use R statistics to 
evaluate anything other than /strong/ data?  Certainly I never look at 
it except for the low-resolution bin (or strongest reflections).  
Specifically, a "2%-dataset" in that bin is probably healthy, while a 
"9%-dataset" probably Has Issues.


In which case, back to Jacob's question:  what does Rmerge tell us 
that Rmeas doesn't.


phx




On 08/07/2017 17:02, James Holton wrote:

Sorry for the confusion.  I was going for brevity! And failed.

I know that the multiplicity correction is applied on a per-hkl basis 
in the calculation of Rmeas.  However, the average multiplicity over 
the whole calculation is most likely not an integer. Some hkls may be 
observed twice while others only once, or perhaps 3-4 times in the 
same scaling run.


Allow me to do the error propagation properly.  Consider the scenario:

Your outer resolution bin has a true I/sigma = 1.00 and average 
multiplicity of 2.0. Let's say there are 100 hkl indices in this 
bin.  I choose the "true" intensities of each hkl from an exponential 
(aka Wilson) distribution.  Further assume the background is high, so 
the error in each observation after background subtraction may be 
taken from a Gaussian distribution. Let's further choose the per-hkl 
multiplicity from a Poisson distribution with expectation value 2.0, 
so 0 is possible, but the long-term average multiplicity is 2.0. For 
R calculation, when multiplicity of any given hkl is less than 2 it 
is skipped. What I end up with after 120,000 trials is a distribution 
of values for each R factor.  See attached graph.


What I hope is readily apparent is that the distribution of Rmerge 
values is taller and sharper than that of the Rmeas values.  The most 
likely Rmeas is 80% and that of Rmerge is 64.6%.  This is expected, 
of course.  But what I hope to impress upon you is that the most 
likely value is not generally the one that you will get! The 
distribution has a width.  Specifically, Rmeas could be as low as 
40%, or as high as 209%, depending on the trial.  Half of the trial 
results falling between 71.4% and 90.3%, a range of 19 percentage 
points.  Rmerge has a middle-half range from 57.6% to 72.9% (15.3 
percentage points). This range of possible values of Rmerge or Rmeas 
from data with the same intrinsic quality is what I mean when I say 
"numerical instability".  Each and every trial had the same true 
I/sigma and multiplicity, and yet the R factors I get vary depending 
on the trial.  Unfortunately for most of us with real data, you only 
ever get one trial, and you can't predict which Rmeas or Rmerge 
you'll get.


My point here is that R statistics in general are not comparable from 
experiment to experiment when you are looking at data with low 
average intensity and low multiplicity, and it appears that Rmeas is 
less stable than Rmerge.  Not by much, mind you, but still jumps 
around more.


Hope that is clearer?

Note that in no way am I suggesting that low-multiplicity is the 
right way to collect data.  Far from it.  Especially with modern 
detectors that have negligible read-out noise.  But when micro 
crystals only give off a handful of photons each before they die, low 
multiplicity might be all you have.


-James Holton
MAD Scientist



On 7/7/2017 2:33 PM, Edward A. Berry wrote:
I think the confusion here is that the "multiplicity correction" is 
applied
on each reflection, where it will be an integer 2 or greater (can't 
estimate
variance with only one measurement). You can only correct in an 
approximate
way using using the average multiplicity of the dataset, since it 
would depend

on the distribution of multiplicity over the reflections.

And the correction is for r-merge. You don't need to apply a correction
to R-meas.
R-meas is a redundancy-independent best estimate of the variance.
Whatever you would have used R-merge for (hopefully taking allowance
for the multiplicity) you can use R-meas and not worry about 
multiplicity.
Again, what information does R-merge provide that R-meas does not 
provide

in a more accurate way?

According to the denso manual, one way to artificially reduce
R-merge is to include reflections with only one measure (averaging
in a lot of zero's always helps bring an average down), and they say
there were actually some programs that did that. However I'm
quite sure none of the ones we rely on today do that.

On 07/07/2017 03:12 PM, Kay Diederichs wrote:

James,

I cannot follow you. "n approaches 1" can only mean n = 2 because n 
is integer. And for n=2 the sqrt(n/(n-1)) factor is well-defined. 
For n=1, neither contributions to Rmeas nor Rmerge nor to any other 
precision indicator can be calculated anyway, because there's 
nothing this measurement can be compared against.


just my 2 cents,

Kay

On Fri, 7 

Re: [ccp4bb] Rmergicide Through Programming

2017-07-08 Thread Frank von Delft
Anyway, back to reality:  does anybody still use R statistics to 
evaluate anything other than /strong/ data?  Certainly I never look at 
it except for the low-resolution bin (or strongest reflections).  
Specifically, a "2%-dataset" in that bin is probably healthy, while a 
"9%-dataset" probably Has Issues.


In which case, back to Jacob's question:  what does Rmerge tell us that 
Rmeas doesn't.


phx




On 08/07/2017 17:02, James Holton wrote:

Sorry for the confusion.  I was going for brevity! And failed.

I know that the multiplicity correction is applied on a per-hkl basis 
in the calculation of Rmeas.  However, the average multiplicity over 
the whole calculation is most likely not an integer. Some hkls may be 
observed twice while others only once, or perhaps 3-4 times in the 
same scaling run.


Allow me to do the error propagation properly.  Consider the scenario:

Your outer resolution bin has a true I/sigma = 1.00 and average 
multiplicity of 2.0. Let's say there are 100 hkl indices in this bin.  
I choose the "true" intensities of each hkl from an exponential (aka 
Wilson) distribution.  Further assume the background is high, so the 
error in each observation after background subtraction may be taken 
from a Gaussian distribution. Let's further choose the per-hkl 
multiplicity from a Poisson distribution with expectation value 2.0, 
so 0 is possible, but the long-term average multiplicity is 2.0. For R 
calculation, when multiplicity of any given hkl is less than 2 it is 
skipped. What I end up with after 120,000 trials is a distribution of 
values for each R factor.  See attached graph.


What I hope is readily apparent is that the distribution of Rmerge 
values is taller and sharper than that of the Rmeas values.  The most 
likely Rmeas is 80% and that of Rmerge is 64.6%.  This is expected, of 
course.  But what I hope to impress upon you is that the most likely 
value is not generally the one that you will get! The distribution has 
a width.  Specifically, Rmeas could be as low as 40%, or as high as 
209%, depending on the trial.  Half of the trial results falling 
between 71.4% and 90.3%, a range of 19 percentage points.  Rmerge has 
a middle-half range from 57.6% to 72.9% (15.3 percentage points).  
This range of possible values of Rmerge or Rmeas from data with the 
same intrinsic quality is what I mean when I say "numerical 
instability".  Each and every trial had the same true I/sigma and 
multiplicity, and yet the R factors I get vary depending on the 
trial.  Unfortunately for most of us with real data, you only ever get 
one trial, and you can't predict which Rmeas or Rmerge you'll get.


My point here is that R statistics in general are not comparable from 
experiment to experiment when you are looking at data with low average 
intensity and low multiplicity, and it appears that Rmeas is less 
stable than Rmerge.  Not by much, mind you, but still jumps around more.


Hope that is clearer?

Note that in no way am I suggesting that low-multiplicity is the right 
way to collect data.  Far from it.  Especially with modern detectors 
that have negligible read-out noise.  But when micro crystals only 
give off a handful of photons each before they die, low multiplicity 
might be all you have.


-James Holton
MAD Scientist



On 7/7/2017 2:33 PM, Edward A. Berry wrote:
I think the confusion here is that the "multiplicity correction" is 
applied
on each reflection, where it will be an integer 2 or greater (can't 
estimate
variance with only one measurement). You can only correct in an 
approximate
way using using the average multiplicity of the dataset, since it 
would depend

on the distribution of multiplicity over the reflections.

And the correction is for r-merge. You don't need to apply a correction
to R-meas.
R-meas is a redundancy-independent best estimate of the variance.
Whatever you would have used R-merge for (hopefully taking allowance
for the multiplicity) you can use R-meas and not worry about 
multiplicity.
Again, what information does R-merge provide that R-meas does not 
provide

in a more accurate way?

According to the denso manual, one way to artificially reduce
R-merge is to include reflections with only one measure (averaging
in a lot of zero's always helps bring an average down), and they say
there were actually some programs that did that. However I'm
quite sure none of the ones we rely on today do that.

On 07/07/2017 03:12 PM, Kay Diederichs wrote:

James,

I cannot follow you. "n approaches 1" can only mean n = 2 because n 
is integer. And for n=2 the sqrt(n/(n-1)) factor is well-defined. 
For n=1, neither contributions to Rmeas nor Rmerge nor to any other 
precision indicator can be calculated anyway, because there's 
nothing this measurement can be compared against.


just my 2 cents,

Kay

On Fri, 7 Jul 2017 10:57:17 -0700, James Holton 
 wrote:



I happen to be one of those people who think Rmerge is a very useful
statistic.  

[ccp4bb] Postdoc position at SGC Univ of Oxford, on structural biology of meganucleases

2017-07-08 Thread Wyatt W. Yue
Dear all

We are recruiting a Postdoctoral Scientist, with experience in structural
biology of protein-nucleic acid complexes, to join the research group led
by Associate Professor Wyatt Yue at the Structural Genomics Consortium
(SGC), University of Oxford.

In collaboration with Precision Biosciences Inc, the group is launching a
project, funded by a Wellcome Trust Pathfinder award, on structure-based
engineering of meganucleases for therapeutic genome editing in
trinucleotide repeat disorders.

You will hold a PhD (or be near completion) in protein biochemistry or
structural biology, have experience in purification, crystallization and
structure determination of protein-nucleic acid complexes, and be able to
work as part of a team interacting with industrial scientists. This is a
full-time post offered for a fixed-term of 18 months in the first instance.

Further particulars, including details on how to apply and links to further
information, can be accessed online:

https://www.ndm.ox.ac.uk/current-job-vacancies/vacancy/129660-Postdoctoral-Research-Scientist-%E2%80%93-Metabolism-and-Organelle-Biogenesis-(Wellcome-Trust-Pathfinder)

Closing date: *12:00 noon (UK) on 13 July 2017 (Thursday next week)*

Informal enquiries can be made to wyatt@sgc.ox.ac.uk

Visit Wyatt Yue’s SGC home page: http://www.thesgc.org/wyatt


Re: [ccp4bb] Rmergicide Through Programming

2017-07-08 Thread James Holton

Sorry for the confusion.  I was going for brevity!  And failed.

I know that the multiplicity correction is applied on a per-hkl basis in 
the calculation of Rmeas.  However, the average multiplicity over the 
whole calculation is most likely not an integer. Some hkls may be 
observed twice while others only once, or perhaps 3-4 times in the same 
scaling run.


Allow me to do the error propagation properly.  Consider the scenario:

Your outer resolution bin has a true I/sigma = 1.00 and average 
multiplicity of 2.0. Let's say there are 100 hkl indices in this bin.  I 
choose the "true" intensities of each hkl from an exponential (aka 
Wilson) distribution.  Further assume the background is high, so the 
error in each observation after background subtraction may be taken from 
a Gaussian distribution. Let's further choose the per-hkl multiplicity 
from a Poisson distribution with expectation value 2.0, so 0 is 
possible, but the long-term average multiplicity is 2.0. For R 
calculation, when multiplicity of any given hkl is less than 2 it is 
skipped. What I end up with after 120,000 trials is a distribution of 
values for each R factor.  See attached graph.


What I hope is readily apparent is that the distribution of Rmerge 
values is taller and sharper than that of the Rmeas values.  The most 
likely Rmeas is 80% and that of Rmerge is 64.6%.  This is expected, of 
course.  But what I hope to impress upon you is that the most likely 
value is not generally the one that you will get! The distribution has a 
width.  Specifically, Rmeas could be as low as 40%, or as high as 209%, 
depending on the trial.  Half of the trial results falling between 71.4% 
and 90.3%, a range of 19 percentage points.  Rmerge has a middle-half 
range from 57.6% to 72.9% (15.3 percentage points).  This range of 
possible values of Rmerge or Rmeas from data with the same intrinsic 
quality is what I mean when I say "numerical instability".  Each and 
every trial had the same true I/sigma and multiplicity, and yet the R 
factors I get vary depending on the trial.  Unfortunately for most of us 
with real data, you only ever get one trial, and you can't predict which 
Rmeas or Rmerge you'll get.


My point here is that R statistics in general are not comparable from 
experiment to experiment when you are looking at data with low average 
intensity and low multiplicity, and it appears that Rmeas is less stable 
than Rmerge.  Not by much, mind you, but still jumps around more.


Hope that is clearer?

Note that in no way am I suggesting that low-multiplicity is the right 
way to collect data.  Far from it.  Especially with modern detectors 
that have negligible read-out noise.  But when micro crystals only give 
off a handful of photons each before they die, low multiplicity might be 
all you have.


-James Holton
MAD Scientist



On 7/7/2017 2:33 PM, Edward A. Berry wrote:
I think the confusion here is that the "multiplicity correction" is 
applied
on each reflection, where it will be an integer 2 or greater (can't 
estimate
variance with only one measurement). You can only correct in an 
approximate
way using using the average multiplicity of the dataset, since it 
would depend

on the distribution of multiplicity over the reflections.

And the correction is for r-merge. You don't need to apply a correction
to R-meas.
R-meas is a redundancy-independent best estimate of the variance.
Whatever you would have used R-merge for (hopefully taking allowance
for the multiplicity) you can use R-meas and not worry about 
multiplicity.

Again, what information does R-merge provide that R-meas does not provide
in a more accurate way?

According to the denso manual, one way to artificially reduce
R-merge is to include reflections with only one measure (averaging
in a lot of zero's always helps bring an average down), and they say
there were actually some programs that did that. However I'm
quite sure none of the ones we rely on today do that.

On 07/07/2017 03:12 PM, Kay Diederichs wrote:

James,

I cannot follow you. "n approaches 1" can only mean n = 2 because n 
is integer. And for n=2 the sqrt(n/(n-1)) factor is well-defined. For 
n=1, neither contributions to Rmeas nor Rmerge nor to any other 
precision indicator can be calculated anyway, because there's nothing 
this measurement can be compared against.


just my 2 cents,

Kay

On Fri, 7 Jul 2017 10:57:17 -0700, James Holton 
 wrote:



I happen to be one of those people who think Rmerge is a very useful
statistic.  Not as a method of evaluating the resolution limit, 
which is

mathematically ridiculous, but for a host of other important things,
like evaluating the performance of data collection equipment, and
evaluating the isomorphism of different crystals, to name a few.

I like Rmerge because it is a simple statistic that has a simple 
formula

and has not undergone any "corrections".  Corrections increase
complexity, and complexity opens the door to 

[ccp4bb] Research Scientist Positions at Vertex Pharmaceuticals UK site

2017-07-08 Thread Jay Bertrand
Dear All,
I wanted to send a reminder message out about the following positions.

Vertex Pharmaceuticals (Europe) Ltd has two open Research Scientist positions 
in the Structural Biology and Biophysics group at our UK site (Abingdon, 
Oxfordshire).

One position is for a structural biologist with experience in Cryo-EM and/or 
membrane protein crystallography and the major role of the person will be to 
produce and study human membrane proteins to support small molecule drug 
discovery (Reference code 9354BR).

The other position is for a membrane protein biochemist and the major role of 
the person will be to provide support to small molecule drug discovery by 
producing proteins for structural and biophysical studies (Reference code 
9355BR).

Please note that applications for this vacancy are to be made online.

To apply for these roles and for further details, please click on the 
appropriate link below:

https://sjobs.brassring.com/TGWebHost/jobdetails.aspx?partnerid=25119=5134=9354BR

https://sjobs.brassring.com/TGWebHost/jobdetails.aspx?partnerid=25119=5134=9355BR

--
Jay Bertrand
Vertex Pharmaceuticals (Europe) Ltd
86-88 Jubilee Ave, Milton Park
Abingdon, Oxfordshire
OX14 4RW
United Kingdom



This email message and any attachments are confidential and intended for use by 
the addressee(s) only. If you are not the intended recipient, please notify me 
immediately by replying to this message, and destroy all copies of this message 
and any attachments. Thank you.