Re: Why did it crash?

kenvogt Wed, 15 Jul 2009 08:36:19 -0700

So I'm not clear as to where things stand now. Are there rolling
snapshots or not?


On Jul 14, 9:04 am, Nickolas Toursky <[email protected]> wrote:
> Hi guys,
>
> We have developed a new staging environment after this has happened.
> It gives us an ability to test the new features more accurately before
> deploying them live.
>
> Nick
>
> 2009/7/14 Esé <[email protected]>:
>
>
>
>
>
> > hey folks,
>
> > would love to get an update on this as well. it's a little terrifying
> > to hear about rogue dev scalr processes killing production farms. are
> > there safeguards in place now to prevent this kind of thing happening?
>
> > hoping for a speedy response, thanks!
>
> > E.
>
> > On Jul 6, 11:39 am, rainier2 <[email protected]> wrote:
> >> Hey, just looking for a little closure here.
>
> >> Was this a newly deployed production poller, or what it the dev poller
> >> that broke out of the dev sandbox?
>
> >> Has Scalr.net taken any actions to prevent a similar problem in the
> >> future?
>
> >> Thanks!
>
> >> On May 7, 12:08 pm, Cole <[email protected]> wrote:
>
> >> > Woa, this is kind of a deal breaker here!  Did this really happen?
> >> > Rightscale's seeming quite cost-effective now if this is the case!
>
> >> > On May 7, 10:30 am, Niv <[email protected]> wrote:
>
> >> > > and i have to add that the cause of the major data loss is your no-
> >> > > good way of doing the snapshots. once a snapshot creation starts, the
> >> > > older snapshot is immediately corrupt.
> >> > > your human error caused my instances to crash mid-snapshot creation
> >> > > and when restarted, the servers failed to download the snapshot and
> >> > > kept terminating.
> >> > > this bug was submitted more than six months ago and you've done
> >> > > absolutely nothing to fix it.
>
> >> > > On May 7, 5:22 pm, Niv <[email protected]> wrote:
>
> >> > > > ruined my day & upcoming weekend + major data loss + ~20 extra
> >> > > > instances running for several hours doing nothing. yay.
>
> >> > > > On May 7, 5:11 pm, Alex Kovalyov <[email protected]> wrote:
>
> >> > > > > Martin, it was a user error on Scalr.net side.Devversion ofpoller
> >> > > > > has gone  nuts and selectively terminated instances on few farms
> >> > > > > before it was killed.
>
> >> > > > > On 7 май, 11:10, Martin Sweeney <[email protected]> wrote:
>
> >> > > > > > So my farm decided to crash this morning, all backups and 
> >> > > > > > database
> >> > > > > > bundles worked fine and another set of instances are in its 
> >> > > > > > place.
> >> > > > > > Hurrah!
>
> >> > > > > > What concerns me is why all four instances decided to crash 
> >> > > > > > within 3
> >> > > > > > minutes of each other. They're not connected by anything other 
> >> > > > > > than
> >> > > > > > connections to databases and memcache servers etc, but they all 
> >> > > > > > went
> >> > > > > > at once.
>
> >> > > > > > Instance 'i-46f94bxx' found in database but not found on EC2. 
> >> > > > > > Crashed.
> >> > > > > > Instance 'i-9a9014xx' found in database but not found on EC2. 
> >> > > > > > Crashed.
> >> > > > > > Instance 'i-29009axx' found in database but not found on EC2. 
> >> > > > > > Crashed.
> >> > > > > > Instance 'i-27a2ccxx' found in database but not found on EC2. 
> >> > > > > > Crashed.
>
> >> > > > > > Is there anywhere I can find more info on this other than my 
> >> > > > > > Event
> >> > > > > > log?
>
> >> > > > > > M.

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"scalr-discuss" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at 
http://groups.google.com/group/scalr-discuss?hl=en
-~----------~----~----~----~------~----~------~--~---

Re: Why did it crash?

Reply via email to