So far I have gotten several "votes" based on the lossless compression ratio of the images, but, before I reveal the "answer" to the CCP4BB I remind everyone that the LOSSY compression ratio of the compressed images is 34-fold! So bzip2 and gzip are now incredibly inefficient methods of storage for the "compressed data set".

I am mainly curious if anyone can find some significant change in the data quality upon processing these images. At higher compression ratios than this, the visual appearance of the background does indeed become quite "jpegy", but the cool thing about video compression is that it is very good at preserving the "local average value" of a group of pixels, and thus the fit of the background around a spot to a plane that is done during data reduction still works, even at VERY high compression ratios (200 or more). But you do eventually end up sacrificing faint spots. This is the "judgment call" I'd like opinions on. Personally, I don't think the faint spots are all that important, but others might have some religion about them...

Thanks for the input!

-James Holton
MAD Scientist


H. Raaijmakers wrote:
James,

caseB was lossy compressed.
It is 10% smaller when compressed (gzip, bzip2), so it contains
significantly less information.

cheers,

Hans

James Holton schreef:
Ian Tickle wrote:
I found an old e-mail from James Holton where he suggested lossy
compression for diffraction images (as long as it didn't change the
F's significantly!) - I'm not sure whether anything came of that!

Well, yes, something did come of this....  But I don't think Gerard
Bricogne is going to like it.

Details are here:
http://bl831.als.lbl.gov/~jamesh/lossy_compression/

Short version is that I found a way to compress a test lysozyme dataset
by a factor of ~33 with no apparent ill effects on the data.  In fact,
anomalous differences were completely unaffected, and Rfree dropped from
0.287 for the original data to 0.275 when refined against Fs from the
compressed images.  This is no doubt a fluke of the excess noise added
by compression, but I think it highlights how the errors in
crystallography are dominated by the inadequacies of the electron
density models we use, and not the quality of our data.

The page above lists two data sets: "A" and "B", and I am interested to
know if and how anyone can "tell" which one of these data sets was
compressed.  The first image of each data set can be found here:
http://bl831.als.lbl.gov/~jamesh/lossy_compression/firstimage.tar.bz2

-James Holton
MAD Scientist



Reply via email to