Re: HintedHandoff process does not finish

2013-09-30 Thread Aaron Morton
 What can be the reason for the handoff process not to finish?
Check for other errors about timing out during hint reply. 

 What would be the best way to recover from this situation?
If they are really causing trouble drop the hints via HintedHandoffManager JMX 
MBean or stopping the node and deleting the files on disk. Then use repair 
later. 

 What can be done to prevent this from happening again?
Hints are stored when either the node is down before the request starts or when 
the coordinator times out waiting for the remote node. Check the logs for nodes 
going down, and check the MessagingService MBean for TimedOuts from other 
nodes. This may indicate issues with a cross DC connection. 

Cheers

-
Aaron Morton
New Zealand
@aaronmorton

Co-Founder  Principal Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

On 27/09/2013, at 11:18 PM, Tom van den Berge t...@drillster.com wrote:

 Hi,
 
 One one of my nodes, the (storage) load increased dramatically (doubled), 
 within one or two hours. The hints column family was causing the growth. I 
 noticed one HintedHandoff process that was started some two hours ago, but 
 hadn't finished. Normally, these processes take only a few seconds, 15 
 seconds max, in my cluster.
 
 The not-finishing process was handing the hints over to a host in another 
 data center. There were no warning or error messages in the logs, other than 
 the repeated flushing high-traffic column family hints.
 I'm using Cassandra 1.2.3.
 What can be the reason for the handoff process not to finish?
 What would be the best way to recover from this situation?
 What can be done to prevent this from happening again?
 
 Thanks in advance,
 Tom
 
 
 
 
 



HintedHandoff process does not finish

2013-09-27 Thread Tom van den Berge
Hi,

One one of my nodes, the (storage) load increased dramatically (doubled),
within one or two hours. The hints column family was causing the growth. I
noticed one HintedHandoff process that was started some two hours ago, but
hadn't finished. Normally, these processes take only a few seconds, 15
seconds max, in my cluster.

The not-finishing process was handing the hints over to a host in another
data center. There were no warning or error messages in the logs, other
than the repeated flushing high-traffic column family hints.
I'm using Cassandra 1.2.3.

   - What can be the reason for the handoff process not to finish?
   - What would be the best way to recover from this situation?
   - What can be done to prevent this from happening again?


Thanks in advance,
Tom