Riak crashed with MANIFEST not found

2012-05-25 Thread Nam Nguyen
Hi, I'm running a 5-m2.2xlarge-node Riak cluster on EC2. Today, two of the nodes crashed. Running risk console I found this line which seemed to be related: 21:44:30.449 [error] Failed to start riak_kv_eleveldb_backend Reason: {db_open,IO error:

Re: Riak crashed with MANIFEST not found

2012-05-25 Thread jshoffstall
Hi, Nam, On the node that is reporting the LevelDB Manifest error, I would do the following: 1. Stop the node if it isn't down already. 2. Backup /var/lib/riak/leveldb/50239118783249787813251666124688006726811648 to another folder outside of /var/lib/riak/leveldb. 3. Run the erl binary

Re: Riak crashed with MANIFEST not found

2012-05-25 Thread Nam Nguyen
Hi Justin, I had essentially done the same thing you suggested. I moved the affected directory out of leveldb and restarted riak. Though I did not run repair, it seemed to go well. The nodes are leaving the cluster now. However, it has been three hours and riak-admin transfers still shows many

Re: Riak crashed with MANIFEST not found

2012-05-25 Thread jshoffstall
Nam, What is the output of `riak-admin member_status` and `riak-admin ring_status`? Justin Shoffstall Developer Advocate | Basho Technologies, Inc. -- View this message in context: http://riak-users.197444.n3.nabble.com/Riak-crashed-with-MANIFEST-not-found-tp4015987p4016157.html Sent from the

Re: Riak crashed with MANIFEST not found

2012-05-25 Thread Nam Nguyen
Output of the commands: ubuntu@ip-10-20-2-243:~$ riak-admin member_status Attempting to restart script through sudo -u riak = Membership == Status RingPendingNode

Re: Riak crashed with MANIFEST not found

2012-05-25 Thread jshoffstall
Nam, Thanks for speaking with me tonight. Let us know if the cluster has any more trouble, or if we can do anything else for you. Cheers, Justin Shoffstall Developer Advocate | Basho Technologies, Inc. On May 25, 2012, at 10:01 PM, Nam Nguyen-2 [via Riak Users] wrote: Output of the

Re: Riak crashed with MANIFEST not found

2012-05-25 Thread jshoffstall
Nam, To recap the upshot of our offline chat tonight: Though the leave operations on your cluster progressed fine, in the future I would just take the damaged nodes down, do the repair like I mentioned in the earlier post in this thread, and bring the nodes back up. No membership changes should