I'm running Riak2.0pre11 but I keep mentioning Erlang because I have seen a
similar situation with hanging a couple of years ago with RabbitMQ. I
suspect that even if there is a Riak bug involved, there is probably also
some Erlang problem as well.

Now I have discovered that by using "pstree -p" I can learn the process IDs
of the processes so I tried killing them. No luck. I cannot even kill the
"ps ax" or "netstat -lnp" processes that are hanging as well. Then I tried
"kill -9" and still they are stuck.

Two years ago we had to do a hard reset of the VM host server (i.e. kill
all the VMs on the same box) in order to resolve this. I'm going to try the
EC2 control panel to stop or terminate the VM, but even if that works it is
not really a satisfactory solution to the problem. I'd be really interested
if anyone else has seen this kind of a hang in production and how you cope
with it.



On Wed, Mar 19, 2014 at 2:36 PM, Matthew Von-Maszewski
<matth...@basho.com>wrote:

> I thought I knew the cause of this problem.  I do not.  We need to await
> input from others.
>
> My apologies.
>
> Other basic questions will be:  what version of Riak, what is your
> app.config, how many servers/nodes, any reason this one node is "different"?
>
> Matthew
>
>
> On Mar 19, 2014, at 5:30 PM, Michael Dillon <mdillon...@pagefreezer.com>
> wrote:
>
> We are using AMazon EC2 m3.x2large nodes and while the freeze is occurring
> free reports
>
>              total       used       free     shared    buffers     cached
>
> Mem:      30623232    8818792   21804440          0      88092    4411832
>
> -/+ buffers/cache:    4318868   26304364
>
> Swap:            0          0          0
>
> The Erlang processes seem to be unkillable because "shutdown -r now" is
> also hanging. Right now these nodes are just being used for some testing,
> but eventually we will go into production and I really need to have a plan
> for how to detect and then deal with these Erlang freezes. Or better yet, a
> way to avoid them even if it means detecting some condition in advance and
> then rebooting the node.
>
>
>
> On Wed, Mar 19, 2014 at 2:07 PM, Matthew Von-Maszewski <matth...@basho.com
> > wrote:
>
>>
>> Any chance you are overflowing into swap?  Or in the case of XEN have you
>> exceeded the guaranteed RAM for the VM memory and moved into the disk
>> backed portion of "ram"?
>>
>> What backend do you use within riak?  Do you have memory statistics from
>> before and after the seizure/freeze?
>>
>> Matthew
>>
>>
>> On Mar 19, 2014, at 4:56 PM, Michael Dillon <mdillon...@pagefreezer.com>
>> wrote:
>>
>> > I've run into a problem with Riak freezing completely on one node
>> running on Ubuntu 12.04 LTS on a XEN VM (EC2). If I ssh into the node and
>> run "ps ax" that shell session also freezes. I also tried another ssh
>> session with "netstat -lnp" to see if I could find the process ID to kill,
>> but that also froze.
>> >
>> > I must admit that I have seen a similar problem with RabbitMQ running
>> on Ubuntu 10 LTS on a an OpenVPS VM a few years ago.
>> >
>> > I suppose this is an Erlang issue of some sort, but I would really like
>> some way to kill the Riak processes without a reboot if possible.
>> >
>> > --
>> > PageFreezer.com
>> > #200 - 311 Water Street
>> > Vancouver,  BC  V6B 1B8
>> > _______________________________________________
>> > riak-users mailing list
>> > riak-users@lists.basho.com
>> > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>
>>
>
>
> --
> PageFreezer.com
> #200 - 311 Water Street
> Vancouver,  BC  V6B 1B8
>  _______________________________________________
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
>
>


-- 
PageFreezer.com
#200 - 311 Water Street
Vancouver,  BC  V6B 1B8
_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Reply via email to