Dear Colleagues,

  Clearly, no system will be able to perfectly preserve every pixel of
every dataset collected at a cost that can be afforded.  Resources are
finite and we must set priorities.  I would suggest that, in order
of declining priority, we try our best to retain:

  1.  raw data that might tend to refute published results
  2.  raw data that might tend to support published results
  3.  raw data that may be of significant use in currently
ongoing studies either in refutation or support
  4.  raw data that may be of significant use in future
studies

While no archiving system can be perfect, we should not let the
search for a perfect solution prevent us from working with
currently available good solutions, and even in this era of tight
budgets, there are good solutions.

  Regards,
    Herbert

On 4/5/12 7:16 AM, John R Helliwell wrote:
Dear 'aales...@burnham.org',

Re the pixel detector; yes this is an acknowledged raw data archiving
challenge; possible technical solutions include:- summing to make
coarser images ie in angular range, lossless compression (nicely
described on this CCP4bb by James Holton) or preserving a sufficient
sample of data....(but nb this debate is certainly not yet concluded).

Re "And all this hassle is for the only real purpose of preventing data fraud?"

Well.....Why publish data?
Please let me offer some reasons:
• To enhance the reproducibility of a scientific experiment
• To verify or support the validity of deductions from an experiment
• To safeguard against error
• To allow other scholars to conduct further research based on
experiments already conducted
• To allow reanalysis at a later date, especially to extract 'new'
science as new techniques are developed
• To provide example materials for teaching and learning
• To provide long-term preservation of experimental results and future
access to them
• To permit systematic collection for comparative studies
• And, yes, To better safeguard against fraud than is apparently the
case at present

Also to (probably) comply with your funding agency's grant conditions:-
Increasingly, funding agencies are requesting or requiring data
management policies (including provision for retention and access) to
be taken into account when awarding grants. See e.g. the Research
Councils UK Common Principles on Data Policy
(http://www.rcuk.ac.uk/research/Pages/DataPolicy.aspx) and the Digital
Curation Centre overview of funding policies in the UK
(http://www.dcc.ac.uk/resources/policy-and-legal/overview-funders-data-policies).
See also http://forums.iucr.org/viewtopic.php?f=21&t=58 for discussion
on policies relevant to crystallography in other countries. Nb these
policies extend over derived, processed and raw data, ie without
really an adequate clarity of policy from one to the other stages of
the 'data pyramid' ((see
http://www.stm-assoc.org/integration-of-data-and-publications).


And just to mention IUCr Journals Notes for Authors for biological
macromolecular structures, where we have our ie macromolecular
crystallography's version of the 'data pyramid' :-

(1) Derived data
• Atomic coordinates, anisotropic or isotropic displacement
parameters, space group information, secondary structure and
information about biological functionality must be deposited with the
Protein Data Bank before or in concert with article publication; the
article will link to the PDB deposition using the PDB reference code.
• Relevant experimental parameters, unit-cell dimensions are required
as an integral part of article submission and are published within the
article.

(2) Processed experimental data
• Structure factors must be deposited with the Protein Data Bank
before or in concert with article publication; the article will link
to the PDB deposition using the PDB reference code.

(3) Primary experimental data (here I give small and macromolecule
Notes for Authors details):-
For small-unit-cell crystal/molecular structures and macromolecular
structures IUCr journals have no current binding policy regarding
publication of diffraction images or similar raw data entities.
However, the journals welcome efforts made to preserve and provide
primary experimental data sets. Authors are encouraged to make
arrangements for the diffraction data images for their structure to be
archived and available on request.
For articles that present the results of powder diffraction profile
fitting or refinement (Rietveld) methods, the primary diffraction
data, i.e. the numerical intensity of each measured point on the
profile as a function of scattering angle, should be deposited.
Fibre data should contain appropriate information such as a photograph
of the data. As primary diffraction data cannot be satisfactorily
extracted from such figures, the basic digital diffraction data should
be deposited.


Finally to mention that many IUCr Commissions are interested in the
possibility of establishing community practices for the orderly
retention and referencing of raw data sets, and the IUCr would like to
see such data sets become part of the routine record of scientific
research in the future, to the extent that this proves feasible and
cost-effective.
I draw your attention therefore to the IUCr Forum on such matters at:-
http://forums.iucr.org/
Within this Forum you can find for example the ICSU convened Strategic
Coordinating Committee on Information and Data fairly recent report;
within this we learn of many other areas of science efforts on data
archiving and eg that the radio astronomy square kilometre array will
pose the biggest raw data archiving challenge on the planet.[Our needs
are thereby relatively modest.]

The IUCr Diffraction Data Deposition Working Group is actively
addressing all these various issues.
We weclome your input at the IUCr Forum, which will thereby be most
timely. Thankyou.

Best wishes,
Yours sincerely,
John
Professor John R Helliwell DSc


On Thu, Apr 5, 2012 at 1:24 AM, aaleshin<aales...@burnham.org>  wrote:
People who raise their voices for a prolonged storage of raw images miss a
simple fact that the volume of collected data increases proportionally if
not faster than the cost of storage space drops. I just had an opportunity
to collect data with the PILATUS detector at SSRL and say you that monster
allows slicing the data 4-5 times thinner than other detectors do. Some
people also like collecting very redundant data sets. Even now, transferring
and storage of raw data from a synchrotron is a pain in the neck, but in a
few years it may become simply impractical. And all this hassle is for the
only real purpose of preventing data fraud? An't there a cheaper and more
adequate solutions to the problem?

I also wonder why after the first occurrence of data fraud several years
ago, PDB did not take any action to prevent its appearance in the future? Or
administrative actions are simply impossible nowadays without a mega-dollar
grant?



--

Reply via email to