Re: Why did it crash?

Nickolas Toursky Tue, 14 Jul 2009 09:04:22 -0700

Hi guys,

We have developed a new staging environment after this has happened.
It gives us an ability to test the new features more accurately before
deploying them live.


Nick

2009/7/14 Esé <[email protected]>:
>
> hey folks,
>
> would love to get an update on this as well. it's a little terrifying
> to hear about rogue dev scalr processes killing production farms. are
> there safeguards in place now to prevent this kind of thing happening?
>
> hoping for a speedy response, thanks!
>
> E.
>
> On Jul 6, 11:39 am, rainier2 <[email protected]> wrote:
>> Hey, just looking for a little closure here.
>>
>> Was this a newly deployed production poller, or what it the dev poller
>> that broke out of the dev sandbox?
>>
>> Has Scalr.net taken any actions to prevent a similar problem in the
>> future?
>>
>> Thanks!
>>
>> On May 7, 12:08 pm, Cole <[email protected]> wrote:
>>
>> > Woa, this is kind of a deal breaker here!  Did this really happen?
>> > Rightscale's seeming quite cost-effective now if this is the case!
>>
>> > On May 7, 10:30 am, Niv <[email protected]> wrote:
>>
>> > > and i have to add that the cause of the major data loss is your no-
>> > > good way of doing the snapshots. once a snapshot creation starts, the
>> > > older snapshot is immediately corrupt.
>> > > your human error caused my instances to crash mid-snapshot creation
>> > > and when restarted, the servers failed to download the snapshot and
>> > > kept terminating.
>> > > this bug was submitted more than six months ago and you've done
>> > > absolutely nothing to fix it.
>>
>> > > On May 7, 5:22 pm, Niv <[email protected]> wrote:
>>
>> > > > ruined my day & upcoming weekend + major data loss + ~20 extra
>> > > > instances running for several hours doing nothing. yay.
>>
>> > > > On May 7, 5:11 pm, Alex Kovalyov <[email protected]> wrote:
>>
>> > > > > Martin, it was a user error on Scalr.net side.Devversion ofpoller
>> > > > > has gone  nuts and selectively terminated instances on few farms
>> > > > > before it was killed.
>>
>> > > > > On 7 май, 11:10, Martin Sweeney <[email protected]> wrote:
>>
>> > > > > > So my farm decided to crash this morning, all backups and database
>> > > > > > bundles worked fine and another set of instances are in its place.
>> > > > > > Hurrah!
>>
>> > > > > > What concerns me is why all four instances decided to crash within 
>> > > > > > 3
>> > > > > > minutes of each other. They're not connected by anything other than
>> > > > > > connections to databases and memcache servers etc, but they all 
>> > > > > > went
>> > > > > > at once.
>>
>> > > > > > Instance 'i-46f94bxx' found in database but not found on EC2. 
>> > > > > > Crashed.
>> > > > > > Instance 'i-9a9014xx' found in database but not found on EC2. 
>> > > > > > Crashed.
>> > > > > > Instance 'i-29009axx' found in database but not found on EC2. 
>> > > > > > Crashed.
>> > > > > > Instance 'i-27a2ccxx' found in database but not found on EC2. 
>> > > > > > Crashed.
>>
>> > > > > > Is there anywhere I can find more info on this other than my Event
>> > > > > > log?
>>
>> > > > > > M.
> >
>

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"scalr-discuss" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at 
http://groups.google.com/group/scalr-discuss?hl=en
-~----------~----~----~----~------~----~------~--~---

Re: Why did it crash?

Reply via email to