Dear Colleagues,

  The main problem with a lossy compression that suppresses weak
spots is that those spots may be a tip-off to a misidentified
symmetry, so you may wish to keep some faithful copy of the
original diffraction image until you are very certain of having
the symmetry right.

  That being said, such a huge compression sounds very useful,
and I would be happy to add it as an option of CBFlib for people
to play with once the code is reasonably stable and available,
and if it is not tied up in patents or licenses that conflict
with the LGPL.

  Regards,
    Herbert

=====================================================
 Herbert J. Bernstein, Professor of Computer Science
   Dowling College, Kramer Science Center, KSC 121
        Idle Hour Blvd, Oakdale, NY, 11769

                 +1-631-244-3035
                 y...@dowling.edu
=====================================================

On Sun, 9 May 2010, James Holton wrote:

Frank von Delft wrote:
Just looked at the algorithm, how it stores the average "non-spot" through all the images.

What happens with dataset where the "non-spot" (e.g. background) changes systematically through the dataset, i.e. anisotropic datasets or thin crystals lying flat in a thin loop? How much worse is compression for that?
Cheers
phx
Well, what will happen in that case (with the current "algorithm") is that once a background pixel deviates from the median level by more than 4 "sigmas", it will start to get stored losslessly. Essentially, they will be treated as "spots" and the overall compression ratio will start to approach that of bzip2.

A "workaround" for this is simply to store the data set in "chunks" where the background level is similar, but I suppose a more intelligent thing to do would be to simply "scale" each image to the median background image, and store the scale factors (a list of 100 numbers for a 100-image data set) along with the other ancillary data. I haven't done that yet. Didn't want to spend too much time on this in case I incited some kind of revolt.

-James Holton
MAD Scientist




On 07/05/2010 06:07, James Holton wrote:
Ian Tickle wrote:
I found an old e-mail from James Holton where he suggested lossy
compression for diffraction images (as long as it didn't change the
F's significantly!) - I'm not sure whether anything came of that!

Well, yes, something did come of this.... But I don't think Gerard Bricogne is going to like it.

Details are here:
http://bl831.als.lbl.gov/~jamesh/lossy_compression/

Short version is that I found a way to compress a test lysozyme dataset by a factor of ~33 with no apparent ill effects on the data. In fact, anomalous differences were completely unaffected, and Rfree dropped from 0.287 for the original data to 0.275 when refined against Fs from the compressed images. This is no doubt a fluke of the excess noise added by compression, but I think it highlights how the errors in crystallography are dominated by the inadequacies of the electron density models we use, and not the quality of our data.

The page above lists two data sets: "A" and "B", and I am interested to know if and how anyone can "tell" which one of these data sets was compressed. The first image of each data set can be found here:
http://bl831.als.lbl.gov/~jamesh/lossy_compression/firstimage.tar.bz2

-James Holton
MAD Scientist

Reply via email to