For teaching purposes I have found that controlled pairs of data sets
are most instructive. You are right that an easy one-button-push
processing run tells you nothing, but so does a bang-it-crashed-now-what
data set. Most useful are two data sets that are identical in every
respect but one, and that one thing is the point you are trying to get
across. It's hard to collect such perfectly paired data sets, so I
ended up just simulating them. I deliberately chose a high-symmetry
space group to keep the download size small. You can download them from
here:
http://bl831.als.lbl.gov/~jamesh/workshop/
These five datasets represent the four biggest problems I see users have
when trying to solve structures: 1) poor anomalous signal, 2) overlaps
from a bad crystal orientation, 3) hidden radiation damage to sites, and
4) ice rings. The 5th "goodsignal" dataset is the positive control.
The web page contains everything from images to processed MTZ files,
maps and the "right answer" in pdb and mtz format. A slightly more
"realistic" version with a bigger download size is here:
http://bl831.als.lbl.gov/~jamesh/workshop2/
This is the one I used for my "weak anomalous challenge" a few years
back. The teaching advantage is that you can use the image-mixer script
to modulate the severity of problems like ice rings and anomalous
signal. If you make a competition of it, people tend to get more
interested.
When it comes to beam centers, it is not all that hard to take a data
set with a "correct" beam center and just edit the headers. How you do
this depends on the file format, but I have some instructions for
editing images in general here:
http://bl831.als.lbl.gov/~jamesh/bin_stuff/
In general, you can usually separate the header from the data with the
unix command "head" or "dd", edit the header with your favorite text
editor, and then put the two parts back together with "cat". As for
which beam center is "correct", it is important to tell your students
that that depends on which software you are using. I wrote all this
down in the last paragraph on page 7 of this doc:
https://submit.biorxiv.org/submission/pdf?msid=BIORXIV/2018/394965
This doc also describes another simulated data set that demonstrates the
challenges of combining lots of short wedges together. May or may not
be too advanced a topic for your students? Or maybe not. As you can
guess I'm experimenting with biorxiv. So far, no comments.
Good luck with your class!
-James Holton
MAD Scientist
On 9/19/2018 5:15 PM, Whitley, Matthew J wrote:
Dear colleagues,
For teaching purposes, I am looking for a small number (< 5) of
macromolecular diffraction datasets (raw images) that might be
considered 'difficult' for a beginning crystallography student to
process. By 'difficult' I generally mean not able to be processed
automatically by a common processing package (XDS, Mosflm, DIALS, etc)
using default settings, i.e., no black box "click and done" processing.
The datasets I am looking for would have some stumbling block such as
incorrect experimental parameters recorded in the image headers,
multiple lattices that cause indexing to fail, datasets for which
determining the correct space group is tricky, datasets for experiments
in which the crystal slipped or moved in the beam, or anything else you
can think of. The idea is for these beginning students to examine
several datasets that highlight various phenomena that can lead one
astray during processing.
A good candidate dataset would also ideally comprise a modest number of
images so as to keep integration time to a minimum. Factors that are
mostly irrelevant for my purpose: resolution (as long as better than
~3.5 Å), source (home vs synchrotron), presence/absence of anomalous
scattering, presence/absence of ligands, monomeric vs oligomeric
structures, etc. Also, to be clear, I am not looking for datasets that
have so many pathologies that they would require many long hours of work
for an expert to process correctly.
I have checked public repositories such as proteindiffraction.org and
SBGrid databank, but all of the datasets I acquired from these sources
process satisfactorily with little effort, and in any event I know of no
way to search for 'challenging' datasets. (I also wonder whether
anybody is in the habit of depositing, shall we say, less-than-pristine
images to public repositories?)
If you know of such a dataset that is already publicly available, or if
you have such a dataset that you are willing to share for solely
educational purposes, I would appreciate hearing from you, either on- or
off-list.
Thank you in advance for your suggestions.
Matthew
########################################################################
To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=CCP4BB&A=1