For teaching purposes I have found that controlled pairs of data sets are most instructive.  You are right that an easy one-button-push processing run tells you nothing, but so does a bang-it-crashed-now-what data set.  Most useful are two data sets that are identical in every respect but one, and that one thing is the point you are trying to get across.  It's hard to collect such perfectly paired data sets, so I ended up just simulating them. I deliberately chose a high-symmetry space group to keep the download size small. You can download them from here:

http://bl831.als.lbl.gov/~jamesh/workshop/

These five datasets represent the four biggest problems I see users have when trying to solve structures: 1) poor anomalous signal, 2) overlaps from a bad crystal orientation, 3) hidden radiation damage to sites, and 4) ice rings.  The 5th "goodsignal" dataset is the positive control.

The web page contains everything from images to processed MTZ files, maps and the "right answer" in pdb and mtz format.  A slightly more "realistic" version with a bigger download size is here:

http://bl831.als.lbl.gov/~jamesh/workshop2/

This is the one I used for my "weak anomalous challenge" a few years back. The teaching advantage is that you can use the image-mixer script to modulate the severity of problems like ice rings and anomalous signal.  If you make a competition of it, people tend to get more interested.

When it comes to beam centers, it is not all that hard to take a data set with a "correct" beam center and just edit the headers. How you do this depends on the file format, but I have some instructions for editing images in general here:

http://bl831.als.lbl.gov/~jamesh/bin_stuff/

In general, you can usually separate the header from the data with the unix command "head" or "dd", edit the header with your favorite text editor, and then put the two parts back together with "cat". As for which beam center is "correct", it is important to tell your students that that depends on which software you are using.  I wrote all this down in the last paragraph on page 7 of this doc:

https://submit.biorxiv.org/submission/pdf?msid=BIORXIV/2018/394965

This doc also describes another simulated data set that demonstrates the challenges of combining lots of short wedges together.  May or may not be too advanced a topic for your students?  Or maybe not. As you can guess I'm experimenting with biorxiv.  So far, no comments.

Good luck with your class!

-James Holton
MAD Scientist


On 9/19/2018 5:15 PM, Whitley, Matthew J wrote:
Dear colleagues,

For teaching purposes, I am looking for a small number (< 5) of
macromolecular diffraction datasets (raw images) that might be
considered 'difficult' for a beginning crystallography student to
process.  By 'difficult' I generally mean not able to be processed
automatically by a common processing package (XDS, Mosflm, DIALS, etc)
using default settings, i.e., no black box "click and done" processing.
The datasets I am looking for would have some stumbling block such as
incorrect experimental parameters recorded in the image headers,
multiple lattices that cause indexing to fail, datasets for which
determining the correct space group is tricky, datasets for experiments
in which the crystal slipped or moved in the beam, or anything else you
can think of.  The idea is for these beginning students to examine
several datasets that highlight various phenomena that can lead one
astray during processing.

A good candidate dataset would also ideally comprise a modest number of
images so as to keep integration time to a minimum.  Factors that are
mostly irrelevant for my purpose: resolution (as long as better than
~3.5 Å), source (home vs synchrotron), presence/absence of anomalous
scattering,  presence/absence of ligands, monomeric vs oligomeric
structures, etc.  Also, to be clear, I am not looking for datasets that
have so many pathologies that they would require many long hours of work
for an expert to process correctly.

I have checked public repositories such as proteindiffraction.org and
SBGrid databank, but all of the datasets I acquired from these sources
process satisfactorily with little effort, and in any event I know of no
way to search for 'challenging' datasets.  (I also wonder whether
anybody is in the habit of depositing, shall we say, less-than-pristine
images to public repositories?)

If you know of such a dataset that is already publicly available, or if
you have such a dataset that you are willing to share for solely
educational purposes, I would appreciate hearing from you, either on- or
off-list.

Thank you in advance for your suggestions.

Matthew


########################################################################

To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=CCP4BB&A=1

Reply via email to