Re: [ccp4bb] To archive or not to archive, that's the question!
Dear Gerard K, Many thanks indeed for this. Like Gerard Bricogne you also indicate that the location option being the decentralised one is 'quite simple and very cheap in terms of centralised cost'. The SR Facilities worldwide I hope can surely follow the lead taken by Diamond Light Source and PaN, the European Consortium of SR and Neutron Facilities, and keep their data archives and also assist authors with the doi registration process for those datasets that result in publication. Linking to these dois from the PDB for example is as you confirm straightforward. Gerard B's pressing of the above approach via the 'Pilot project' within the IUCr DDD WG various discussions, with a nicely detailed plan, brought home to me the merit of the above approach for the even greater challenge for raw data archiving for chemical crystallography, both in terms of number of datasets and also the SR Facilities role being much smaller. IUCr Journals also note the challenge of moving large quantities of data around ie if the Journals were to try and host everything for chemical crystallography, and them thus becoming 'the centre' for these datasets. So:- Universities are now establishing their own institutional repositories, driven largely by Open Access demands of funders. For these to host raw datasets that underpin publications is a reasonable role in my view and indeed they already have this category in the University of Manchester eScholar system, for example. I am set to explore locally here whether they would accommodate all our Lab's raw Xray images datasets per annum that underpin our published crystal structures. It would be helpful if readers of this CCP4bb could kindly also explore with their own universities if they have such an institutional repository and if raw data sets could be accommodated. Please do email me off list with this information if you prefer but within the CCP4bb is also good. Such an approach involving institutional repositories would also work of course for the 25% of MX structures that are for non SR datasets. All the best for a splendid PDB40 Event. Greetings, John Prof John R Helliwell DSc On 28 Oct 2011, at 22:02, Gerard DVD Kleywegt ger...@xray.bmc.uu.se wrote: Hi all, It appears that during my time here at Cold Spring Harbor, I have missed a small debate on CCP4BB (in which my name has been used in vain to boot). I have not yet had time to read all the contributions, but would like to make a few points that hopefully contribute to the discussion and keep it with two feet on Earth (as opposed to La La Land where the people live who think that image archiving can be done on a shoestring budget... more about this in a bit). Note: all of this is on personal title, i.e. not official wwPDB gospel. Oh, and sorry for the new subject line, but this way I can track the replies more easily. It seems to me that there are a number of issues that need to be separated: (1) the case for/against storing raw data (2) implementation and resources (3) funding (4) location I will say a few things about each of these issues in turn: --- (1) Arguments in favour and against the concept of storing raw image data, as well as possible alternative solutions that could address some of the issues at lower cost or complexity. I realise that my views carry a weight=1.0 just like everybody else's, and many of the arguments and counter-arguments have already been made, so I will not add to these at this stage. --- (2) Implementation details and required resources. If the community should decide that archiving raw data would be scientifically useful, then it has to decide how best to do it. This will determine the level of resources required to do it. Questions include: - what should be archived? (See Jim H's list from (a) to (z) or so.) An initial plan would perhaps aim for the images associated with the data used in the final refinement of deposited structures. - how much data are we talking about per dataset/structure/year? - should it be stored close to the source (i.e., responsibility and costs for depositors or synchrotrons) or centrally (i.e., costs for some central resource)? If it is going to be stored centrally, the cost will be substantial. For example, at the EBI -the European Bioinformatics Institute- we have 15 PB of storage. We pay about 1500 GBP (~2300 USD) per TB of storage (not the kind you buy at Dixons or Radio Shack, obviously). For stored data, we have a data-duplication factor of ~8, i.e. every file is stored 8 times (at three data centres, plus back-ups, plus a data-duplication centre, plus unreleased versus public versions of the archive). (Note - this is only for the EBI/PDBe! RCSB and PDBj will have to acquire storage as well.) Moreover, disks have to be housed in a building (not free!), with cooling, security measures, security staff,
Re: [ccp4bb] To archive or not to archive, that's the question!
One important issue to address is how deal with the perceived reliability issues of the federated model and how to start to approach the higher reliability of the centralized model described bu Gerard K, but without incurring what seems to be at present unacceptable costs. One answer comes from the approach followed in communications systems. If the probability of data loss in each communication subsystem is, say, 1/1000, then the probability of data loss in two independent copies of the same lossy system is only 1/1,000,000. We could apply that lessonto the federated data image archive model by asking each institution to partner with a second independent, and hopefully geographically distant, institution, with an agreement for each to host copies of the other's images. If we restrict that duplication protocol, at least at first, to those images strongly related to an actual publication/PDB deposition, the incremental cost of greatly improved reliability would be very low, with no disruption of the basic federated approach being suggested. Please note that I am not suggesting that institutional repositories will have 1/1000 data loss rates, but they will certainly have some data loss rate, and this modest change in the proposal would help to greatly lower the impact of that data loss rate and allow us to go forward with greater confidence. Regards, Herbert At 7:53 AM +0100 10/29/11, Jrh wrote: Dear Gerard K, Many thanks indeed for this. Like Gerard Bricogne you also indicate that the location option being the decentralised one is 'quite simple and very cheap in terms of centralised cost'. The SR Facilities worldwide I hope can surely follow the lead taken by Diamond Light Source and PaN, the European Consortium of SR and Neutron Facilities, and keep their data archives and also assist authors with the doi registration process for those datasets that result in publication. Linking to these dois from the PDB for example is as you confirm straightforward. Gerard B's pressing of the above approach via the 'Pilot project' within the IUCr DDD WG various discussions, with a nicely detailed plan, brought home to me the merit of the above approach for the even greater challenge for raw data archiving for chemical crystallography, both in terms of number of datasets and also the SR Facilities role being much smaller. IUCr Journals also note the challenge of moving large quantities of data around ie if the Journals were to try and host everything for chemical crystallography, and them thus becoming 'the centre' for these datasets. So:- Universities are now establishing their own institutional repositories, driven largely by Open Access demands of funders. For these to host raw datasets that underpin publications is a reasonable role in my view and indeed they already have this category in the University of Manchester eScholar system, for example. I am set to explore locally here whether they would accommodate all our Lab's raw Xray images datasets per annum that underpin our published crystal structures. It would be helpful if readers of this CCP4bb could kindly also explore with their own universities if they have such an institutional repository and if raw data sets could be accommodated. Please do email me off list with this information if you prefer but within the CCP4bb is also good. Such an approach involving institutional repositories would also work of course for the 25% of MX structures that are for non SR datasets. All the best for a splendid PDB40 Event. Greetings, John Prof John R Helliwell DSc On 28 Oct 2011, at 22:02, Gerard DVD Kleywegt ger...@xray.bmc.uu.se wrote: Hi all, It appears that during my time here at Cold Spring Harbor, I have missed a small debate on CCP4BB (in which my name has been used in vain to boot). I have not yet had time to read all the contributions, but would like to make a few points that hopefully contribute to the discussion and keep it with two feet on Earth (as opposed to La La Land where the people live who think that image archiving can be done on a shoestring budget... more about this in a bit). Note: all of this is on personal title, i.e. not official wwPDB gospel. Oh, and sorry for the new subject line, but this way I can track the replies more easily. It seems to me that there are a number of issues that need to be separated: (1) the case for/against storing raw data (2) implementation and resources (3) funding (4) location I will say a few things about each of these issues in turn: --- (1) Arguments in favour and against the concept of storing raw image data, as well as possible alternative solutions that could address some of the issues at lower cost or complexity. I realise that my views carry a weight=1.0 just like everybody else's, and many of the arguments and counter-arguments have already been made, so I will not add to these at this stage.
Re: [ccp4bb] To archive or not to archive, that's the question!
Dear Herbert, I imagine it likely that eg The Univ Manchester eScholar system will have in place duplicate storage for the reasons you outline below. However for it to be geographically distant is, to my reckoning, less likely, but still possible. I will add that further query to my first query to my eScholar user support re dataset sizes and doi registration. Greetings, John Prof John R Helliwell DSc On 29 Oct 2011, at 15:49, Herbert J. Bernstein y...@bernstein-plus-sons.com wrote: One important issue to address is how deal with the perceived reliability issues of the federated model and how to start to approach the higher reliability of the centralized model described bu Gerard K, but without incurring what seems to be at present unacceptable costs. One answer comes from the approach followed in communications systems. If the probability of data loss in each communication subsystem is, say, 1/1000, then the probability of data loss in two independent copies of the same lossy system is only 1/1,000,000. We could apply that lessonto the federated data image archive model by asking each institution to partner with a second independent, and hopefully geographically distant, institution, with an agreement for each to host copies of the other's images. If we restrict that duplication protocol, at least at first, to those images strongly related to an actual publication/PDB deposition, the incremental cost of greatly improved reliability would be very low, with no disruption of the basic federated approach being suggested. Please note that I am not suggesting that institutional repositories will have 1/1000 data loss rates, but they will certainly have some data loss rate, and this modest change in the proposal would help to greatly lower the impact of that data loss rate and allow us to go forward with greater confidence. Regards, Herbert At 7:53 AM +0100 10/29/11, Jrh wrote: Dear Gerard K, Many thanks indeed for this. Like Gerard Bricogne you also indicate that the location option being the decentralised one is 'quite simple and very cheap in terms of centralised cost'. The SR Facilities worldwide I hope can surely follow the lead taken by Diamond Light Source and PaN, the European Consortium of SR and Neutron Facilities, and keep their data archives and also assist authors with the doi registration process for those datasets that result in publication. Linking to these dois from the PDB for example is as you confirm straightforward. Gerard B's pressing of the above approach via the 'Pilot project' within the IUCr DDD WG various discussions, with a nicely detailed plan, brought home to me the merit of the above approach for the even greater challenge for raw data archiving for chemical crystallography, both in terms of number of datasets and also the SR Facilities role being much smaller. IUCr Journals also note the challenge of moving large quantities of data around ie if the Journals were to try and host everything for chemical crystallography, and them thus becoming 'the centre' for these datasets. So:- Universities are now establishing their own institutional repositories, driven largely by Open Access demands of funders. For these to host raw datasets that underpin publications is a reasonable role in my view and indeed they already have this category in the University of Manchester eScholar system, for example. I am set to explore locally here whether they would accommodate all our Lab's raw Xray images datasets per annum that underpin our published crystal structures. It would be helpful if readers of this CCP4bb could kindly also explore with their own universities if they have such an institutional repository and if raw data sets could be accommodated. Please do email me off list with this information if you prefer but within the CCP4bb is also good. Such an approach involving institutional repositories would also work of course for the 25% of MX structures that are for non SR datasets. All the best for a splendid PDB40 Event. Greetings, John Prof John R Helliwell DSc On 28 Oct 2011, at 22:02, Gerard DVD Kleywegt ger...@xray.bmc.uu.se wrote: Hi all, It appears that during my time here at Cold Spring Harbor, I have missed a small debate on CCP4BB (in which my name has been used in vain to boot). I have not yet had time to read all the contributions, but would like to make a few points that hopefully contribute to the discussion and keep it with two feet on Earth (as opposed to La La Land where the people live who think that image archiving can be done on a shoestring budget... more about this in a bit). Note: all of this is on personal title, i.e. not official wwPDB gospel. Oh, and sorry for the new subject line, but this way I can track the replies more easily. It seems to me that there are a number of issues that need to be separated:
Re: [ccp4bb] To archive or not to archive, that's the question!
Dear John, Most sound institutional data repositories use some form of off-site backup. However, not all of them do, and the standards of reliabilty vary. The advantages of an explicit partnering system are both practical and psychological. The practical part is the major improvement in reliability -- even if we start at 6 nines, 12 nines is better. The psychological part is that members of the community can feel reassured that reliability has in been improved to levels at which they can focus on other, more scientific issues, instead ot the question of reliability. Regards, Herbert = Herbert J. Bernstein, Professor of Computer Science Dowling College, Kramer Science Center, KSC 121 Idle Hour Blvd, Oakdale, NY, 11769 +1-631-244-3035 y...@dowling.edu = On Sat, 29 Oct 2011, Jrh wrote: Dear Herbert, I imagine it likely that eg The Univ Manchester eScholar system will have in place duplicate storage for the reasons you outline below. However for it to be geographically distant is, to my reckoning, less likely, but still possible. I will add that further query to my first query to my eScholar user support re dataset sizes and doi registration. Greetings, John Prof John R Helliwell DSc On 29 Oct 2011, at 15:49, Herbert J. Bernstein y...@bernstein-plus-sons.com wrote: One important issue to address is how deal with the perceived reliability issues of the federated model and how to start to approach the higher reliability of the centralized model described bu Gerard K, but without incurring what seems to be at present unacceptable costs. One answer comes from the approach followed in communications systems. If the probability of data loss in each communication subsystem is, say, 1/1000, then the probability of data loss in two independent copies of the same lossy system is only 1/1,000,000. We could apply that lessonto the federated data image archive model by asking each institution to partner with a second independent, and hopefully geographically distant, institution, with an agreement for each to host copies of the other's images. If we restrict that duplication protocol, at least at first, to those images strongly related to an actual publication/PDB deposition, the incremental cost of greatly improved reliability would be very low, with no disruption of the basic federated approach being suggested. Please note that I am not suggesting that institutional repositories will have 1/1000 data loss rates, but they will certainly have some data loss rate, and this modest change in the proposal would help to greatly lower the impact of that data loss rate and allow us to go forward with greater confidence. Regards, Herbert At 7:53 AM +0100 10/29/11, Jrh wrote: Dear Gerard K, Many thanks indeed for this. Like Gerard Bricogne you also indicate that the location option being the decentralised one is 'quite simple and very cheap in terms of centralised cost'. The SR Facilities worldwide I hope can surely follow the lead taken by Diamond Light Source and PaN, the European Consortium of SR and Neutron Facilities, and keep their data archives and also assist authors with the doi registration process for those datasets that result in publication. Linking to these dois from the PDB for example is as you confirm straightforward. Gerard B's pressing of the above approach via the 'Pilot project' within the IUCr DDD WG various discussions, with a nicely detailed plan, brought home to me the merit of the above approach for the even greater challenge for raw data archiving for chemical crystallography, both in terms of number of datasets and also the SR Facilities role being much smaller. IUCr Journals also note the challenge of moving large quantities of data around ie if the Journals were to try and host everything for chemical crystallography, and them thus becoming 'the centre' for these datasets. So:- Universities are now establishing their own institutional repositories, driven largely by Open Access demands of funders. For these to host raw datasets that underpin publications is a reasonable role in my view and indeed they already have this category in the University of Manchester eScholar system, for example. I am set to explore locally here whether they would accommodate all our Lab's raw Xray images datasets per annum that underpin our published crystal structures. It would be helpful if readers of this CCP4bb could kindly also explore with their own universities if they have such an institutional repository and if raw data sets could be accommodated. Please do email me off list with this information if you prefer but within the CCP4bb is also good. Such an approach involving institutional repositories would also work of course for the 25% of MX structures that are for non SR datasets. All the best
Re: [ccp4bb] Seeded rescreening with robot?
I have few personal remarks about revere matrix seeding protocol suggested here just an addition to Artems suggested protocol harvesting the whole drop of interest invite guaranteed salt crystals in the second round especially if one is using the same screen back. If one found crystals all over your second trial go back to your first seed drop and try to get the microcrystals and do the seeding again you will see the difference. And in this case if you found smaller but significant improvement in the same condition with better crystals in other conditions. (but check in those conditions that can form salt crystals when you mix initial drop and the current conditions, by setting up control drops )) I would say keep going for several rounds and you are definitely in the right track.Or else forget about the whole seed you looking at in those particular experiment. In short if you found microcrystals (15 to 20 um) harvesting them is what i found is more useful and better success than harvesting the whole drops. Gluteraldehyde added to the seeds is a good one but what effect you see may not essentailly due to seeding but combineing approaches like mentioned is definitely inovel and thanks for sharing. Does anybody here have tried this in mesophase and detergent solubilized vapor diffusion and had success? Thanks Padayatti PS Polgenix Inc On Sat, Oct 29, 2011 at 12:14 AM, Artem Evdokimov artem.evdoki...@gmail.com wrote: By popular request here's my favorite version of the in-screen seeding. We use a Mosquito but it doesn't have to be a specific robot as long as it can dispense relatively tiny volumes of seed stock. Caveats: (1) if I am desperate enough to do this, then the situation is pretty bad indeed and I don't mind wasting some protein (2) my success rate is not hugely favorable but this does work on occasion when other things have failed (1) identify a few likely conditions. Ideally they have microcrystals but desperation has made me try 'lovable precipitates' in the past, with a modest degree of success. (2) harvest entire drop using a few ul of mother liquor as diluent (3) break the existing crystals using your favorite method (sead beed, etc.) mine involves swirling a pipette tip in the mixture, running it along the walls, with rapid pipetting up and down. Dilute seed stock to useful volume (enough for screening). (4) I do not normally centrifuge the resulting seed stock, but some people do (5) dispense your screen as always with the usual protein/reservoir ratio. Let's say you like drops of the 0.2ul+0.2ul variety - add 25 nanoliters of the seed stock *last*. Optional mixing of the condition is a fun thing to try but it seems not to matter very much. Note that I typically use the same tips to dispense seed stock, fully aware that this causes cross-contamination of conditions. I don't mind :) (5a) variation - add seed stock to protein, then dispense ASAP. Surprisingly not a bad option, practically speaking. (5b) variation - crosslink seed stock very gently in solution (with trace of glutaraldehyde) before use. Buffers/additives with primary or secondary amine groups do interfere, of course. (5c) variation - mix seeds from SEVERAL initial hit conditions, then use as one seed stock. Be ready for fireworks as they may not be compatible! (6) endure nail-biting wait for results :) As noted earlier, it's not a sure-fire way to get new hit conditions but it does seem to work and it's a fun way to put to use a remainder of otherwise useless protein (when you've tried all other tricks you like to try). Comments and suggestions are always welcome! Artem On Fri, Oct 28, 2011 at 1:55 PM, Artem Evdokimov artem.evdoki...@gmail.com wrote: I would be glad to share ours. Artem On Oct 28, 2011 1:29 PM, Watson, Randy ewats...@uthsc.edu wrote: Hi all, I am trying to optimize crystals and have heard of a technique where one can prepare a seedstock from existing crystals and use it to broadly re-screen from scratch for hits in new conditions. I have access to a mosquito robot and was wondering if anyone has a protocol or recommendations on how to go about doing this using a robot for screening. Thank you! Randy Watson -- Pius S Padayatti,PhD, Phone: 216-658-4528
Re: [ccp4bb] Seeded rescreening with robot?
Hm, I wouldn't call these micro crystals anymore, I mount those things and get datasets from them (sometimes). In my world 50 µm is defined as big. Jürgen On Oct 29, 2011, at 2:51 PM, Pius Padayatti wrote: In short if you found microcrystals (15 to 20 um) harvesting them is what i found is more useful and better success than harvesting the whole drops. .. Jürgen Bosch Johns Hopkins University Bloomberg School of Public Health Department of Biochemistry Molecular Biology Johns Hopkins Malaria Research Institute 615 North Wolfe Street, W8708 Baltimore, MD 21205 Office: +1-410-614-4742 Lab: +1-410-614-4894 Fax: +1-410-955-2926 http://web.mac.com/bosch_lab/