Jeff, That should be fine. I assumed you were using EBS as your data volume because most people we talk to do (for better or worse).
On Tue, Aug 16, 2011 at 10:32 AM, Jeff Pollard <[email protected]>wrote: > Hey Sean, > > Thanks very much for the reply. I'll certainly try going to 10.10 with the > upgrade, that's good info. > > Re EBS: were you saying to attach the EBS volume to the new node and use > the EBS volume as the data volume? We had been using EBS as a backup > volume, but using ephemeral storage for the actual data directory. We did > this primarily for I/O performance reasons and also cause EBS seems to have > had a bad operations track record at AWS. In my steps from my previous > email, our "shared location" actually was EBS, but we were planning on > offloading the files to the ephemeral disk and using that as the data volume > for Riak. Does that make sense? > > > On Tue, Aug 16, 2011 at 7:18 AM, Sean Cribbs <[email protected]> wrote: > >> Jeff, >> >> We highly recommend you upgrade to 10.10 or later. 10.04 has some known >> problems when running under Xen (especially on EC2) -- in some cases under >> load, the network interface will break, making the node temporarily >> inaccessible. >> >> When you do upgrade, the simplest way (if possible) would be to remount >> the attached EBS volumes where your Riak data is stored onto the new nodes. >> Otherwise, the steps you list are correct. >> >> Regarding swap, whether you have it on or not is a personal decision. Riak >> will "do the right thing" and exit when it can't allocate more memory, >> allowing you to figure out what went wrong -- as opposed to grinding the >> machine into IO oblivion while consuming more and more swap. That said, in >> some deployments (notably not on EC2), swap can be helpful. >> >> Hope that helps, >> >> -- >> Sean Cribbs <[email protected]> >> Developer Advocate >> Basho Technologies, Inc. >> http://www.basho.com/ >> >> On Tue, Aug 16, 2011 at 3:46 AM, Jeff Pollard <[email protected]>wrote: >> >>> Hello everyone, >>> >>> We've got a very interesting problem. We're hosting our 5-node cluster >>> on EC2 running Ubuntu 10.04 LTS (Lucid Lynx) Server >>> 64-bit<http://aws.amazon.com/amis/4348> using >>> m2.xlarge instance types, and over the past 5 days we've had two EC2 servers >>> randomly restart on us. We've checked the logs and there was nothing that >>> we saw that indicated why they restarted. One second they were happily >>> logging and the next second the server was in the process of rebooting. >>> This is particularly bad because every time the node comes back up we get >>> merge errors due to an existing bug in Riak and have to restore from a >>> recent backup. >>> >>> Just today we noticed that the EC2 servers did not have swap enabled >>> (apparently the norm for xlarge+ instances), which we thought might have >>> been our problem? My knowledge of what happens when swap is off is pretty >>> poor - but I have been told that the Linux OOM killer should still be >>> invoked and start trying to kill processes, rather than the server simply >>> restarting. Is that correct? Also, how would Riak hypothetically handle >>> swap being off on a system? We're using Bitcask if that helps. >>> >>> Secondly, one of our ops guys here thinks the issue might be related to a >>> bug <http://ubuntuforums.org/showthread.php?t=1436497> (?) that others >>> Ubuntu users of the same version seem to have. In fact, we do see the same >>> "INFO: task cron:15047 blocked for more than 120 seconds: line in our log >>> file. We're also running a AMI that isn't the official one from Canonical, >>> so the thought being an upgrade to the official AMI would help. >>> >>> If we do want to upgrade, it will mean moving each cluster node to new >>> hardware. I wanted to ask the list to make sure we were doing it correctly. >>> Here is the plan to transfer a node to new hardware -- note that these >>> steps will be done on one node at a time, and we'll make sure the cluster >>> has stabilized after doing one node before moving on to the next one. >>> >>> 1. Stop riak on old server. >>> 2. Copy data directory (including bitcask, mr_queue and ring folders) >>> to a shared location. >>> 3. Shutdown old server. >>> 4. Boot new replacement server, installing (but not starting) Riak. >>> 5. Transfer data directory from shared location to data folder on new >>> node. >>> 6. Start riak. >>> >>> My main concern is if the ring state will transfer to a new node safely, >>> assuming the new server has the same hostname and node name as the old >>> server? The new server will have a different IP address, but all our node >>> names in our cluster use hostnames, and those will not be changing. >>> >>> _______________________________________________ >>> riak-users mailing list >>> [email protected] >>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com >>> >>> >> > -- Sean Cribbs <[email protected]> Developer Advocate Basho Technologies, Inc. http://www.basho.com/
_______________________________________________ riak-users mailing list [email protected] http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
