Mike McCarty wrote:

> This message is written, not as a criticism, nor chastisement,
> but in the hopes that I can contribute to the future success
> of the GIMPS project. Having been once caught in a similar
> circumstance in about 1984 or so, and having learned some
> hard won knowledge of these matters in this way, I hope to pass along
> some of what I learned, and possibly contribute toward devising
> some of the recommendations I urge upon the project.

Fine.

> The backup in question is one of DATA, not HARDWARE.

I apologize for assuming that the context was sufficient to indicate that 
that's exactly what I meant by "backup" -- data, not hardware.

When Torben Schluentz wrote: "And I don't understand why it should be so hard 
to get a backup up rolling for the old version", I thought that he meant, "... 
to restore ("get rolling") old-version backup data to the old-version server 
hardware.  So I replied with an explanation of why that was not feasible.  When 
I wrote, "It's a hardware delay, not a matter of getting a backup running", I 
again was assuming everyone would understand that I meant "... getting backup 
data restored so that the system can resume running."

> I asked the same question, more politely,

I apologize for not having posted a reply sooner to your Wed, 29 Oct 2008 
22:59:33 -0600 posting to this mailing list.  I sometimes am not as organized 
as I need to be, and though I hadn't forgotten your posting, I had not yet 
scanned back over recent Prime Digest text to see whether I had made all the 
replies I intended to make.

> and was completely ignored.

No, you weren't ignored.  You haven't gotten a response until now, but you were 
not _ignored_.

If you will go to mersenneforum.org and look at the "PrimeNet 5.0 Upgrade" 
thread (http://mersenneforum.org/showthread.php?t=10832), you will see that in 
the last part of post #162 (30 Oct 08) I (using the nym "cheesehead") posted an 
excerpt from your 29 Oct posting and then wrote, "As I said above, may I have 
your help in revising my description of the situation, to correct inaccuracies 
and omissions? In particular, is the last post's description of an alternative 
("If there is a server ...") correct and feasible?"

Posts #164 & 165 of that thread are the only responses to my post #162, and 
neither of them actually says anything about my last question ("In particular, 
is the last post's description of an alternative ("If there is a server ...") 
correct and feasible?") about what you wrote.  I hoped to get a more relevant 
response, so I put that on my "wait a while" list, and went on to other matters.

Obviously (now it is!), the thing for me to have done was to post to this 
mailing list a short response to you saying that I was seeking help from more 
authoritative sources.  (Even Microsoft Support did that for me recently, but I 
failed to make the logical connection to you.)

Had I, on, say, Tuesday, reviewed my "wait" list or done my Prime Digest 
review, I could have noted that no one had posted any further response to my 
inquiry at mersenneforum./org, and I could have come back here with an 
appropriate conveyance of my lack of authoritative answer.

> Whatever hardware is running the current server software could
> be loaded with a data backup of the older version and be running
> it NOW.

Oh?  You know that to be a fact?  (I don't.)

> It could have been running in a few hours, probably less
> than one day, if a proper data backup existed.

... and (a) v5 server development would have to stop until the replacement disk 
was obtained, (b) all users who were already working on v5 assignments with v25 
Prime95/mprime would have to either stop also, or downgrade to v24, and (c) the 
v5 assignment and completion data would have to have been back-merged into the 
v4 database where possible (not universally because of the different types of 
v5 assignment vs. v4 assignment types).
 
> As for me, I don't demand anything from a bunch of people who contribute
> their own time and money to a project with no remuneration. I do,
> however, maintain backups of my own machines, and keep them off
> site. If my house burned down, I could have my own personal
> computer back up and running, with a loss of just a few day's
> data, within a few hours. My latest backups I keep at home for
> convenience, so I'd lose a few days, but on a regular basis
> I move my older backups off site. Also, since I don't back
> up every day, I'd lose a few days with any complete hard disc
> failure, though I could remedy that simply by doing backups
> every day, should I so chose.

Do you have any actual evidence that the PrimeNet administrators do not perform 
any of those steps, or their equivalent?

Or are you just assuming that because their decisions about bringing back up 
the PrimeNet server do not match what you would have done, that necessarily 
implies that they are incompetent regarding data backups?  

Can you imagine _any_ possibility in which they do indeed have all the data 
backups you expect of them, but because of the actual factors they face, not 
all of which may be apparent to you, they could reasonably make different 
decisions about how to bring PrimeNet back online?

> I _expect_ (not _demand_) that anyone who expends significant amounts
> of time, effort, and money, and involving the participation and
> contribution of time, efforts, and monies of other people would, for
> his own peace of mind, maintain a regular backup schedule with off site
> storage,

Do you have any actual evidence that PrimeNet administrators do not meet your 
expectation in that regard?

> and have tested his ability to recover from total hardware
> loss. I have done so with my own personal computer, which does not
> involve any other persons, their time, efforts, or monies.
> 
> I am _shocked_ and _dismayed_ (not _angry_) that this was, apparently,
> not done.

Do you have any actual evidence that what you allege was not done was, in 
reality, not done?  If so, what is that evidence?

> I _hope_ (not _demand_) that a periodic backup regimen is being
> initiated, if, as it appears, no such backup schedule exists.

What, _exactly_, causes it to appear to you that no such backup schedule 
exists?  I have not yet seen you describe even one such cause.

> I believe it is completely appropriate that those who, understandably,
> expect, as do I, that such a backup schedule would exist in these
> circumstances, would be shocked, dismayed, and discouraged from
> continuing with the project, given the apparent lack of foresight
> on the part of those responsible, especially since the
> response seems to indicate a belief that the current emergency
> situation is one which was not under their control, when in
> actuality, a proper backup would mean that there was, in fact,
> no emergency at all, but rather only a momentary lapse of
> service.

So, too, do I think that after publicly making such derogatory remarks about 
PrimeNet administration without having presented any evidence that such remarks 
are justified, it is entirely appropriate to expect you to promptly present or 
describe the actual evidence you have to justify those remarks.

> I believe it is perfectly reasonable for the participants in this
> project to expect, and receive, an explanation for why a simple
> disc failure resulted in an emergency. No such explanation has
> been forthcoming. I believe it is perfectly reasonable for the
> participants in this project to expect, and receive, a description
> of what plan of action has been implemented to prevent a similar
> future occurrence from being an emergency. No such description
> has been forthcoming.

 ... on this Prime mailing list, perhaps, but this is not the only medium of 
communication within GIMPS.

Can you make the same accusation ("No such explanation ...") about information 
presented at mersenneforum.org?  You wouldn't be this assertive if you weren't 
confident about having all the relevant information, now, would you?

> I do not believe that it is reasonable for the participants
> in this project to _demand_ anything as a result of the current
> emergency. I do think it is reasonable for the participants to
> re evaluate their willingness to continue with the project,
> given the unexpected current situation.

... after ensuring that they had _all_ relevant information, including that 
available on mersenneforum.org, right?

> While the exact moment in time at which a hard disc will fail is
> not, even in principle, predictable, it is a certainty that
> every hard disc is eventually going to fail, and it is possible to
> have a plan of action for when that eventuality takes place.
> That's why we all have spare tires and jacks in our automobiles.
> They are our tire failure plan of action. Regular backups,
> stored off site, with appropriate access, are an appropriate
> disc failure plan of action.

Do you have any evidence that PrimeNet administrators have not made regular 
backups?

Have you _asked_ any PrimeNet administrator directly?  (I am not a PrimeNet 
administrator, just an interested user.)

> Not having a plan of action, or having an inadequate plan of
> action, is a plan for a failure not being a momentary nuisance,
> but an emergency resulting in a hasty and not well thought
> out response. The only way to ascertain whether a plan of action
> is adequate is actually to test it, by simulating a complete
> loss of whatever the plan is supposed to protect.

So, when have you directly asked a primeNet administrator whether any of the 
items in the preceding sentences applied to them?

> Plans of action need regular review, and modification.
> If, as seems likely, the current plan of action is "Rely upon
> RAID for data recovery, and obtain a replacement disc", then
> it seems that the plan has perhaps been reviewed insufficiently
> often.

Oh? Exactly why does it seem likely to you that the current PrimeNet plan of 
action is as you accuse?  What evidence do you have?

> Part of an extended plan of action is a review of
> current practices, and modification of same, as necessary,
> with retest.

Have you reviewed the current practices of PrimeNet in a manner which goes 
beyond simply making negative assumptions?

> Using RAID is not a substitute for backups.

No one said it was.  Did you assume that someone thought that?

> One
> needs to evaluate just how disastrous an event is being protected
> against, and then plan accordingly.

Have you reviewed the PrimeNet plan with any PrimeNet administrator?

> Part of plan design and review is careful assessment of cost
> versus benefit, chosing what is being protected against,
> and planning accordingly.

... not to mention the explicit consideration of relevent assumptions, whenever 
necessary.  I once saw a lot of effort having to be put into rescuing a project 
for which a particular little assumption had never been written and explicitly 
checked during development.

> I would try to enlist the
> aid of some who may be more knowledgeable in such planning
> and review and test design.

When did you consult with PrimeNet administrators?

> I'm sure there are some in the
> project who would be willing to help on that score.

... and have you sought their help?

> One needs to consider what data to back up, what media to
> use, how often to back up, and carefully plan off site
> storage and access, based on how long one is willing to be
> "down" and how much data one is willing to lose
> in an event.

What does the PrimeNet plan have to say about those factors, according to your 
knowledge of that plan?

> One needs to plan recovery procedures with these in mind,
> as well.
> 
> Sometimes, staged recovery is beneficial, providing some
> measure of recovery immediately, with perhaps some data
> loss, which can later be more fully recovered and merged
> with more data at a later time, perhaps days later. There
> might be some person(s) willing to "mirror" the site,
> providing instant recovery of service, and easing the
> burden of the one maintaing backups of the primary copy.
> This could perhaps result in practial impossibility of
> complete data loss, due to geograhic redundancy.

Have you discussed that with any PrimeNet administrator?

> If there exists a collection of undamaged backups available,
> then I don't understand why they are not being used.

How did you determine that such backups were not being used, so that the 
question of "why not" is even applicable?

> If there is no backup, then I strongly encourage the persons
> responsible to educate themselves in what constitutes an adequate
> backup regimen for their needs, and institute same instanter. In
> either case, I strongly urge the responsible persons to give the
> project participants the expected responses explaining why the disc
> failure precipitated an emergency, and what steps are being taken
> to prevent that from happening in future.

I hope you will forward to them all the data you have that has justified your 
statements/assertions/questions in this posting.  Can you share a bit of it 
with us on this mailing list?

Richard Woods,
who is very curious to see your factual information as queried above.


      
_______________________________________________
Prime mailing list
[email protected]
http://hogranch.com/mailman/listinfo/prime

Reply via email to