Something else I tried to give the cluster more time to settle was to wait until riak-admin transfers reported no pending transfers between updating nodes. I've had cases where the transfers didn't complete within at least a couple of hours of waiting. What would be typical amount of time for pending transfers to complete?
-- Jeremy On Fri, Nov 18, 2011 at 6:48 AM, Jeremy Raymond <[email protected]> wrote: > Hello, > > I'll setup my deploy script to capture this information and send you the > info off-list (probably sometime next week). > > -- > Jeremy > > On 2011-11-15, at 1:16 PM, Jon Meredith wrote: > > Hi Joel, > > That's not a message I'd expect to see on a clean restart. We'll need > some more information to diagnose it. Next time it crashes, could you > provide the contents of your ring file (you can just grab the most recent > one out of /var/lib/riak/ring - location may vary depending on your > platform) and it would be very helpful if you could modify your deploy > script to capture the file list for the leveldb directory on *all* of your > nodes immediately before you bounce riak to do the update. When it > crashes, the console.log from all the nodes would also be useful. If any > of those files contain sensitive information, please contact me off list. > > BR, Jon > > On Tue, Nov 15, 2011 at 6:48 AM, Jeremy Raymond <[email protected]>wrote: > >> I'm using Riak 1.0.1 and I have a script that deploys updates to each of >> my 3 nodes to update the Erlang mapred modules. What I do is stop a node, >> deploy the new mapred modues, restart the node, wait for the riak_kv >> service to start, then move onto the next node. Sometimes when I do this >> one of the nodes that is not the current one being updated will go down. >> Each time this has happened thus far it's been the same node that will go >> down (the last one). I see this error in the logs: >> >> [error] Failed to start riak_kv_eleveldb_backend Reason: {db_open,"IO >> error: >> /var/lib/riak/leveldb/913438523331814323877303020447676887284957839360/MANIFEST-000002: >> No such file or directory"} >> >> If I manually restart the node, things go back to normal. Any ideas on >> what's going on? I've attached the error log. >> >> -- >> Jeremy >> >> _______________________________________________ >> riak-users mailing list >> [email protected] >> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com >> >> > > > -- > Jon Meredith > Platform Engineering Manager > Basho Technologies, Inc. > [email protected] > > >
_______________________________________________ riak-users mailing list [email protected] http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
