Re: [ccp4bb] Off topic: 'Difficult' Datasets for Processing Practice

James Holton Sat, 22 Sep 2018 08:47:22 -0700

It was brought to my attention that the link to the preprint I providedbelow doesn't work, but this one does:


https://www.biorxiv.org/content/early/2018/08/18/394965


Thanks to Folmer Fredslund for pointing this out to me!

-James Holton
MAD Scientist

On 9/21/2018 3:50 PM, James Holton wrote:

For teaching purposes I have found that controlled pairs of data setsare most instructive. You are right that an easy one-button-pushprocessing run tells you nothing, but so does abang-it-crashed-now-what data set. Most useful are two data sets thatare identical in every respect but one, and that one thing is thepoint you are trying to get across. It's hard to collect suchperfectly paired data sets, so I ended up just simulating them. Ideliberately chose a high-symmetry space group to keep the downloadsize small. You can download them from here:
http://bl831.als.lbl.gov/~jamesh/workshop/
These five datasets represent the four biggest problems I see usershave when trying to solve structures: 1) poor anomalous signal, 2)overlaps from a bad crystal orientation, 3) hidden radiation damage tosites, and 4) ice rings. The 5th "goodsignal" dataset is the positivecontrol.
The web page contains everything from images to processed MTZ files,maps and the "right answer" in pdb and mtz format. A slightly more"realistic" version with a bigger download size is here:
http://bl831.als.lbl.gov/~jamesh/workshop2/
This is the one I used for my "weak anomalous challenge" a few yearsback. The teaching advantage is that you can use the image-mixerscript to modulate the severity of problems like ice rings andanomalous signal. If you make a competition of it, people tend to getmore interested.
When it comes to beam centers, it is not all that hard to take a dataset with a "correct" beam center and just edit the headers. How you dothis depends on the file format, but I have some instructions forediting images in general here:
http://bl831.als.lbl.gov/~jamesh/bin_stuff/
In general, you can usually separate the header from the data with theunix command "head" or "dd", edit the header with your favorite texteditor, and then put the two parts back together with "cat". As forwhich beam center is "correct", it is important to tell your studentsthat that depends on which software you are using. I wrote all thisdown in the last paragraph on page 7 of this doc:
https://submit.biorxiv.org/submission/pdf?msid=BIORXIV/2018/394965
This doc also describes another simulated data set that demonstratesthe challenges of combining lots of short wedges together. May or maynot be too advanced a topic for your students? Or maybe not. As youcan guess I'm experimenting with biorxiv. So far, no comments.
Good luck with your class!

-James Holton
MAD Scientist


On 9/19/2018 5:15 PM, Whitley, Matthew J wrote:
Dear colleagues,

For teaching purposes, I am looking for a small number (< 5) of
macromolecular diffraction datasets (raw images) that might be
considered 'difficult' for a beginning crystallography student to
process.  By 'difficult' I generally mean not able to be processed
automatically by a common processing package (XDS, Mosflm, DIALS, etc)
using default settings, i.e., no black box "click and done" processing.
The datasets I am looking for would have some stumbling block such as
incorrect experimental parameters recorded in the image headers,
multiple lattices that cause indexing to fail, datasets for which
determining the correct space group is tricky, datasets for experiments
in which the crystal slipped or moved in the beam, or anything else you
can think of.  The idea is for these beginning students to examine
several datasets that highlight various phenomena that can lead one
astray during processing.

A good candidate dataset would also ideally comprise a modest number of
images so as to keep integration time to a minimum.  Factors that are
mostly irrelevant for my purpose: resolution (as long as better than
~3.5 Å), source (home vs synchrotron), presence/absence of anomalous
scattering,  presence/absence of ligands, monomeric vs oligomeric
structures, etc.  Also, to be clear, I am not looking for datasets that
have so many pathologies that they would require many long hours of work
for an expert to process correctly.

I have checked public repositories such as proteindiffraction.org and
SBGrid databank, but all of the datasets I acquired from these sources
process satisfactorily with little effort, and in any event I know of no
way to search for 'challenging' datasets.  (I also wonder whether
anybody is in the habit of depositing, shall we say, less-than-pristine
images to public repositories?)

If you know of such a dataset that is already publicly available, or if
you have such a dataset that you are willing to share for solely
educational purposes, I would appreciate hearing from you, either on- or
off-list.

Thank you in advance for your suggestions.

Matthew
########################################################################

To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=CCP4BB&A=1


########################################################################

To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=CCP4BB&A=1

Re: [ccp4bb] Off topic: 'Difficult' Datasets for Processing Practice

Reply via email to