Re: [ccp4bb] To archive or not to archive, that's the question!

2011-10-29 Thread Jrh
Dear Gerard K,
Many thanks indeed for this.
Like Gerard Bricogne you also indicate that the location option being the 
decentralised one is 'quite simple and very cheap in terms of centralised 
cost'. The SR Facilities worldwide I hope can surely follow the lead taken by 
Diamond Light Source and PaN, the European Consortium of SR and Neutron 
Facilities, and keep their data archives and also assist authors with the doi 
registration process for those datasets that result in publication. Linking to 
these dois from the PDB for example is as you confirm straightforward. 

Gerard B's pressing of the above approach via the 'Pilot project'  within the 
IUCr DDD WG various discussions, with a nicely detailed plan, brought home to 
me the merit of the above approach for the even greater challenge for raw data 
archiving for chemical crystallography, both in terms of number of datasets and 
also the SR Facilities role being much smaller. IUCr Journals also note the 
challenge of moving large quantities of data around ie if the Journals were to 
try and host everything for chemical crystallography, and them thus becoming 
'the centre' for these datasets.

So:-  Universities are now establishing their own institutional repositories, 
driven largely by Open Access demands of funders. For these to host raw 
datasets that underpin publications is a reasonable role in my view and indeed 
they already have this category in the University of Manchester eScholar 
system, for example.  I am set to explore locally here whether they would 
accommodate all our Lab's raw Xray images datasets per annum that underpin our 
published crystal structures. 

It would be helpful if readers of this CCP4bb could kindly also explore with 
their own universities if they have such an institutional repository and if raw 
data sets could be accommodated. Please do email me off list with this 
information if you prefer but within the CCP4bb is also good. 

Such an approach involving institutional repositories would also work of course 
for the 25% of MX structures that are for non SR datasets.

All the best for a splendid PDB40 Event.

Greetings,
John
Prof John R Helliwell DSc 
 
 

On 28 Oct 2011, at 22:02, Gerard DVD Kleywegt ger...@xray.bmc.uu.se wrote:

 Hi all,
 
 It appears that during my time here at Cold Spring Harbor, I have missed a 
 small debate on CCP4BB (in which my name has been used in vain to boot).
 
 I have not yet had time to read all the contributions, but would like to make 
 a few points that hopefully contribute to the discussion and keep it with two 
 feet on Earth (as opposed to La La Land where the people live who think that 
 image archiving can be done on a shoestring budget... more about this in a 
 bit).
 
 Note: all of this is on personal title, i.e. not official wwPDB gospel. Oh, 
 and sorry for the new subject line, but this way I can track the replies more 
 easily.
 
 It seems to me that there are a number of issues that need to be separated:
 
 (1) the case for/against storing raw data
 (2) implementation and resources
 (3) funding
 (4) location
 
 I will say a few things about each of these issues in turn:
 
 ---
 
 (1) Arguments in favour and against the concept of storing raw image data, as 
 well as possible alternative solutions that could address some of the issues 
 at lower cost or complexity.
 
 I realise that my views carry a weight=1.0 just like everybody else's, and 
 many of the arguments and counter-arguments have already been made, so I will 
 not add to these at this stage.
 
 ---
 
 (2) Implementation details and required resources.
 
 If the community should decide that archiving raw data would be 
 scientifically useful, then it has to decide how best to do it. This will 
 determine the level of resources required to do it. Questions include:
 
 - what should be archived? (See Jim H's list from (a) to (z) or so.) An 
 initial plan would perhaps aim for the images associated with the data used 
 in the final refinement of deposited structures.
 
 - how much data are we talking about per dataset/structure/year?
 
 - should it be stored close to the source (i.e., responsibility and costs for 
 depositors or synchrotrons) or centrally (i.e., costs for some central 
 resource)? If it is going to be stored centrally, the cost will be 
 substantial. For example, at the EBI -the European Bioinformatics Institute- 
 we have 15 PB of storage. We pay about 1500 GBP (~2300 USD) per TB of storage 
 (not the kind you buy at Dixons or Radio Shack, obviously). For stored data, 
 we have a data-duplication factor of ~8, i.e. every file is stored 8 times 
 (at three data centres, plus back-ups, plus a data-duplication centre, plus 
 unreleased versus public versions of the archive). (Note - this is only for 
 the EBI/PDBe! RCSB and PDBj will have to acquire storage as well.) Moreover, 
 disks have to be housed in a building (not free!), with cooling, security 
 measures, security staff, 

Re: [ccp4bb] To archive or not to archive, that's the question!

2011-10-29 Thread Herbert J. Bernstein

One important issue to address is how deal with the perceived
reliability issues of the federated model and how to start to
approach the higher reliability of the centralized model described bu
Gerard K, but without incurring what seems to be at present
unacceptable costs.  One answer comes from the approach followed in
communications systems.  If the probability of data loss in each
communication subsystem is, say, 1/1000, then the probability of data
loss in two independent copies of the same lossy system is only
1/1,000,000.  We could apply that lessonto the
federated data image archive model by asking each institution
to partner with a second independent, and hopefully geographically
distant, institution, with an agreement for each to host copies
of the other's images.  If we restrict that duplication protocol, at least at
first, to those images strongly related to an actual publication/PDB
deposition, the incremental cost of greatly improved reliability
would be very low, with no disruption of the basic federated
approach being suggested.

Please note that I am not suggesting that institutional repositories
will have 1/1000 data loss rates, but they will certainly have some
data loss rate, and this modest change in the proposal would help to
greatly lower the impact of that data loss rate and allow us to go
forward with greater confidence.

Regards,
  Herbert


At 7:53 AM +0100 10/29/11, Jrh wrote:

Dear Gerard K,
Many thanks indeed for this.
Like Gerard Bricogne you also indicate that the location option 
being the decentralised one is 'quite simple and very cheap in terms 
of centralised cost'. The SR Facilities worldwide I hope can surely 
follow the lead taken by Diamond Light Source and PaN, the European 
Consortium of SR and Neutron Facilities, and keep their data 
archives and also assist authors with the doi registration process 
for those datasets that result in publication. Linking to these dois 
from the PDB for example is as you confirm straightforward.


Gerard B's pressing of the above approach via the 'Pilot project' 
within the IUCr DDD WG various discussions, with a nicely detailed 
plan, brought home to me the merit of the above approach for the 
even greater challenge for raw data archiving for chemical 
crystallography, both in terms of number of datasets and also the SR 
Facilities role being much smaller. IUCr Journals also note the 
challenge of moving large quantities of data around ie if the 
Journals were to try and host everything for chemical 
crystallography, and them thus becoming 'the centre' for these 
datasets.


So:-  Universities are now establishing their own institutional 
repositories, driven largely by Open Access demands of funders. For 
these to host raw datasets that underpin publications is a 
reasonable role in my view and indeed they already have this 
category in the University of Manchester eScholar system, for 
example.  I am set to explore locally here whether they would 
accommodate all our Lab's raw Xray images datasets per annum that 
underpin our published crystal structures.


It would be helpful if readers of this CCP4bb could kindly also 
explore with their own universities if they have such an 
institutional repository and if raw data sets could be accommodated. 
Please do email me off list with this information if you prefer but 
within the CCP4bb is also good.


Such an approach involving institutional repositories would also 
work of course for the 25% of MX structures that are for non SR 
datasets.


All the best for a splendid PDB40 Event.

Greetings,
John
Prof John R Helliwell DSc



On 28 Oct 2011, at 22:02, Gerard DVD Kleywegt ger...@xray.bmc.uu.se wrote:


 Hi all,

 It appears that during my time here at Cold Spring Harbor, I have 
missed a small debate on CCP4BB (in which my name has been used in 
vain to boot).

 
 I have not yet had time to read all the contributions, but would 
like to make a few points that hopefully contribute to the 
discussion and keep it with two feet on Earth (as opposed to La La 
Land where the people live who think that image archiving can be 
done on a shoestring budget... more about this in a bit).


 Note: all of this is on personal title, i.e. not official wwPDB 
gospel. Oh, and sorry for the new subject line, but this way I can 
track the replies more easily.


 It seems to me that there are a number of issues that need to be separated:

 (1) the case for/against storing raw data
 (2) implementation and resources
 (3) funding
 (4) location

 

 I will say a few things about each of these issues in turn:

 ---

 (1) Arguments in favour and against the concept of storing raw 
image data, as well as possible alternative solutions that could 
address some of the issues at lower cost or complexity.


 I realise that my views carry a weight=1.0 just like everybody 
else's, and many of the arguments and counter-arguments have 
already been made, so I will not add to these at this stage.


 

Re: [ccp4bb] To archive or not to archive, that's the question!

2011-10-29 Thread Jrh
Dear Herbert,
I imagine it likely that eg The Univ Manchester eScholar system will have in 
place duplicate storage for the reasons you outline below. However for it to be 
geographically distant is, to my reckoning, less likely, but still possible. I 
will add that further query to my first query to my eScholar user support re 
dataset sizes and doi registration.
Greetings,
John
Prof John R Helliwell DSc 
 
 

On 29 Oct 2011, at 15:49, Herbert J. Bernstein y...@bernstein-plus-sons.com 
wrote:

 One important issue to address is how deal with the perceived
 reliability issues of the federated model and how to start to
 approach the higher reliability of the centralized model described bu
 Gerard K, but without incurring what seems to be at present
 unacceptable costs.  One answer comes from the approach followed in
 communications systems.  If the probability of data loss in each
 communication subsystem is, say, 1/1000, then the probability of data
 loss in two independent copies of the same lossy system is only
 1/1,000,000.  We could apply that lessonto the
 federated data image archive model by asking each institution
 to partner with a second independent, and hopefully geographically
 distant, institution, with an agreement for each to host copies
 of the other's images.  If we restrict that duplication protocol, at least at
 first, to those images strongly related to an actual publication/PDB
 deposition, the incremental cost of greatly improved reliability
 would be very low, with no disruption of the basic federated
 approach being suggested.
 
 Please note that I am not suggesting that institutional repositories
 will have 1/1000 data loss rates, but they will certainly have some
 data loss rate, and this modest change in the proposal would help to
 greatly lower the impact of that data loss rate and allow us to go
 forward with greater confidence.
 
 Regards,
  Herbert
 
 
 At 7:53 AM +0100 10/29/11, Jrh wrote:
 Dear Gerard K,
 Many thanks indeed for this.
 Like Gerard Bricogne you also indicate that the location option being the 
 decentralised one is 'quite simple and very cheap in terms of centralised 
 cost'. The SR Facilities worldwide I hope can surely follow the lead taken 
 by Diamond Light Source and PaN, the European Consortium of SR and Neutron 
 Facilities, and keep their data archives and also assist authors with the 
 doi registration process for those datasets that result in publication. 
 Linking to these dois from the PDB for example is as you confirm 
 straightforward.
 
 Gerard B's pressing of the above approach via the 'Pilot project' within the 
 IUCr DDD WG various discussions, with a nicely detailed plan, brought home 
 to me the merit of the above approach for the even greater challenge for raw 
 data archiving for chemical crystallography, both in terms of number of 
 datasets and also the SR Facilities role being much smaller. IUCr Journals 
 also note the challenge of moving large quantities of data around ie if the 
 Journals were to try and host everything for chemical crystallography, and 
 them thus becoming 'the centre' for these datasets.
 
 So:-  Universities are now establishing their own institutional 
 repositories, driven largely by Open Access demands of funders. For these to 
 host raw datasets that underpin publications is a reasonable role in my view 
 and indeed they already have this category in the University of Manchester 
 eScholar system, for example.  I am set to explore locally here whether they 
 would accommodate all our Lab's raw Xray images datasets per annum that 
 underpin our published crystal structures.
 
 It would be helpful if readers of this CCP4bb could kindly also explore with 
 their own universities if they have such an institutional repository and if 
 raw data sets could be accommodated. Please do email me off list with this 
 information if you prefer but within the CCP4bb is also good.
 
 Such an approach involving institutional repositories would also work of 
 course for the 25% of MX structures that are for non SR datasets.
 
 All the best for a splendid PDB40 Event.
 
 Greetings,
 John
 Prof John R Helliwell DSc
 
 
 
 On 28 Oct 2011, at 22:02, Gerard DVD Kleywegt ger...@xray.bmc.uu.se wrote:
 
 Hi all,
 
 It appears that during my time here at Cold Spring Harbor, I have missed a 
 small debate on CCP4BB (in which my name has been used in vain to boot).
 
 I have not yet had time to read all the contributions, but would like to 
 make a few points that hopefully contribute to the discussion and keep it 
 with two feet on Earth (as opposed to La La Land where the people live who 
 think that image archiving can be done on a shoestring budget... more about 
 this in a bit).
 
 Note: all of this is on personal title, i.e. not official wwPDB gospel. Oh, 
 and sorry for the new subject line, but this way I can track the replies 
 more easily.
 
 It seems to me that there are a number of issues that need to be separated:
 
 

Re: [ccp4bb] To archive or not to archive, that's the question!

2011-10-29 Thread Herbert J. Bernstein

Dear John,

  Most sound institutional data repositories use some form of
off-site backup.  However, not all of them do, and the
standards of reliabilty vary.  The advantages of an explicit
partnering system are both practical and psychological.  The
practical part is the major improvement in reliability --
even if we start at 6 nines, 12 nines is better.  The
psychological part is that members of the community can
feel reassured that reliability has in been improved to
levels at which they can focus on other, more scientific
issues, instead ot the question of reliability.

  Regards,
Herbert

=
 Herbert J. Bernstein, Professor of Computer Science
   Dowling College, Kramer Science Center, KSC 121
Idle Hour Blvd, Oakdale, NY, 11769

 +1-631-244-3035
 y...@dowling.edu
=

On Sat, 29 Oct 2011, Jrh wrote:


Dear Herbert,
I imagine it likely that eg The Univ Manchester eScholar system will have in 
place duplicate storage for the reasons you outline below. However for it to be 
geographically distant is, to my reckoning, less likely, but still possible. I 
will add that further query to my first query to my eScholar user support re 
dataset sizes and doi registration.
Greetings,
John
Prof John R Helliwell DSc



On 29 Oct 2011, at 15:49, Herbert J. Bernstein y...@bernstein-plus-sons.com 
wrote:


One important issue to address is how deal with the perceived
reliability issues of the federated model and how to start to
approach the higher reliability of the centralized model described bu
Gerard K, but without incurring what seems to be at present
unacceptable costs.  One answer comes from the approach followed in
communications systems.  If the probability of data loss in each
communication subsystem is, say, 1/1000, then the probability of data
loss in two independent copies of the same lossy system is only
1/1,000,000.  We could apply that lessonto the
federated data image archive model by asking each institution
to partner with a second independent, and hopefully geographically
distant, institution, with an agreement for each to host copies
of the other's images.  If we restrict that duplication protocol, at least at
first, to those images strongly related to an actual publication/PDB
deposition, the incremental cost of greatly improved reliability
would be very low, with no disruption of the basic federated
approach being suggested.

Please note that I am not suggesting that institutional repositories
will have 1/1000 data loss rates, but they will certainly have some
data loss rate, and this modest change in the proposal would help to
greatly lower the impact of that data loss rate and allow us to go
forward with greater confidence.

Regards,
 Herbert


At 7:53 AM +0100 10/29/11, Jrh wrote:

Dear Gerard K,
Many thanks indeed for this.
Like Gerard Bricogne you also indicate that the location option being the 
decentralised one is 'quite simple and very cheap in terms of centralised 
cost'. The SR Facilities worldwide I hope can surely follow the lead taken by 
Diamond Light Source and PaN, the European Consortium of SR and Neutron 
Facilities, and keep their data archives and also assist authors with the doi 
registration process for those datasets that result in publication. Linking to 
these dois from the PDB for example is as you confirm straightforward.

Gerard B's pressing of the above approach via the 'Pilot project' within the 
IUCr DDD WG various discussions, with a nicely detailed plan, brought home to 
me the merit of the above approach for the even greater challenge for raw data 
archiving for chemical crystallography, both in terms of number of datasets and 
also the SR Facilities role being much smaller. IUCr Journals also note the 
challenge of moving large quantities of data around ie if the Journals were to 
try and host everything for chemical crystallography, and them thus becoming 
'the centre' for these datasets.

So:-  Universities are now establishing their own institutional repositories, 
driven largely by Open Access demands of funders. For these to host raw 
datasets that underpin publications is a reasonable role in my view and indeed 
they already have this category in the University of Manchester eScholar 
system, for example.  I am set to explore locally here whether they would 
accommodate all our Lab's raw Xray images datasets per annum that underpin our 
published crystal structures.

It would be helpful if readers of this CCP4bb could kindly also explore with 
their own universities if they have such an institutional repository and if raw 
data sets could be accommodated. Please do email me off list with this 
information if you prefer but within the CCP4bb is also good.

Such an approach involving institutional repositories would also work of course 
for the 25% of MX structures that are for non SR datasets.

All the best 

Re: [ccp4bb] Seeded rescreening with robot?

2011-10-29 Thread Pius Padayatti
I have few personal remarks about revere matrix seeding protocol suggested here
just an addition to Artems suggested protocol
harvesting the whole drop of interest invite guaranteed
salt crystals in the second round especially if one is
using the same screen back.
If one found crystals all over your second trial go back to your
first seed drop and try to get the microcrystals and do the seeding again
you will see the difference. And in this case if you found smaller but
significant improvement in the same condition with better crystals in
other conditions.
(but check in those conditions that can form salt crystals when you
mix initial drop and the current conditions, by setting up control
drops )) I would say keep going for several rounds and you are
definitely in the right track.Or else forget about the whole seed you
looking at in those particular experiment.

In short if you found microcrystals (15 to 20 um)
harvesting them is what i found is more useful and better success than
harvesting the whole drops.

Gluteraldehyde added to the seeds is a good one
but what effect you see may not essentailly due to seeding
but combineing approaches like mentioned is definitely inovel and
thanks for sharing.

Does anybody here have tried this in mesophase and
detergent solubilized vapor diffusion and had success?

Thanks
Padayatti PS
Polgenix Inc

On Sat, Oct 29, 2011 at 12:14 AM, Artem Evdokimov
artem.evdoki...@gmail.com wrote:
 By popular request here's my favorite version of the in-screen seeding. We
 use a Mosquito but it doesn't have to be a specific robot as long as it can
 dispense relatively tiny volumes of seed stock.

 Caveats: (1) if I am desperate enough to do this, then the situation is
 pretty bad indeed and I don't mind wasting some protein (2) my success rate
 is not hugely favorable but this does work on occasion when other things
 have failed

 (1) identify a few likely conditions. Ideally they have microcrystals but
 desperation has made me try 'lovable precipitates' in the past, with a
 modest degree of success.
 (2) harvest entire drop using a few ul of mother liquor as diluent
 (3) break the existing crystals using your favorite method (sead beed, etc.)
 mine involves swirling a pipette tip in the mixture, running it along the
 walls, with rapid pipetting up and down. Dilute seed stock to useful volume
 (enough for screening).
 (4) I do not normally centrifuge the resulting seed stock, but some people
 do
 (5) dispense your screen as always with the usual protein/reservoir ratio.
 Let's say you like drops of the 0.2ul+0.2ul variety - add 25 nanoliters of
 the seed stock *last*. Optional mixing of the condition is a fun thing to
 try but it seems not to matter very much. Note that I typically use the same
 tips to dispense seed stock, fully aware that this causes
 cross-contamination of conditions. I don't mind :)
   (5a) variation - add seed stock to protein, then dispense ASAP.
 Surprisingly not a bad option, practically speaking.
   (5b) variation - crosslink seed stock very gently in solution (with trace
 of glutaraldehyde) before use. Buffers/additives with primary or secondary
 amine groups do interfere, of course.
   (5c) variation - mix seeds from SEVERAL initial hit conditions, then use
 as one seed stock. Be ready for fireworks as they may not be compatible!
 (6) endure nail-biting wait for results :)
 As noted earlier, it's not a sure-fire way to get new hit conditions but it
 does seem to work and it's a fun way to put to use a remainder of otherwise
 useless protein (when you've tried all other tricks you like to try).
 Comments and suggestions are always welcome!
 Artem


 On Fri, Oct 28, 2011 at 1:55 PM, Artem Evdokimov artem.evdoki...@gmail.com
 wrote:

 I would be glad to share ours.

 Artem

 On Oct 28, 2011 1:29 PM, Watson, Randy ewats...@uthsc.edu wrote:

 Hi all,

 I am trying to optimize crystals and have heard of a technique where one
 can prepare a seedstock from existing crystals and use it to broadly
 re-screen from scratch for hits in new conditions.

 I have access to a mosquito robot and was wondering if anyone has a
 protocol or recommendations on how to go about doing this using a robot for
 screening.

 Thank you!
 Randy Watson





-- 
Pius S Padayatti,PhD,
Phone: 216-658-4528


Re: [ccp4bb] Seeded rescreening with robot?

2011-10-29 Thread Bosch, Juergen
Hm,
I wouldn't call these micro crystals anymore, I mount those things and get 
datasets from them (sometimes). In my world 50 µm is defined as big.

Jürgen

On Oct 29, 2011, at 2:51 PM, Pius Padayatti wrote:

In short if you found microcrystals (15 to 20 um)
harvesting them is what i found is more useful and better success than
harvesting the whole drops.

..
Jürgen Bosch
Johns Hopkins University
Bloomberg School of Public Health
Department of Biochemistry  Molecular Biology
Johns Hopkins Malaria Research Institute
615 North Wolfe Street, W8708
Baltimore, MD 21205
Office: +1-410-614-4742
Lab:  +1-410-614-4894
Fax:  +1-410-955-2926
http://web.mac.com/bosch_lab/