[ 
https://issues.apache.org/jira/browse/CASSANDRA-9526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ron Kuris updated CASSANDRA-9526:
---------------------------------
    Attachment: PHI-Log-Debug-When-Close.patch.txt
                PHI-Race-Condition.patch.txt
                Monitor-Phi-JMX.patch.txt

There are three patches here. The main fix is in Monitor-Phi-JMX.patch. This 
fully resolves the reported issue.

While inspecting this code, I noticed a small unlikely race condition. If two 
phi values come in at the same time as the first one for a host, one could be 
lost due to the way the values are being added to the Hashtable. The second 
patch resolves that window, by switching to a ConcurrentHashMap and using 
putIfAbsent to atomically check for a prior value.

I doubt this could actually happen in the wild but it's still good defensive 
coding. Also, it removes Hashtable which is always synchronized.

The third patch will start generating debug log messages when PHI starts 
getting close. It's a great way to see that phi_convict_threshold might be too 
low. It's not WARN or even INFO because this could generate a lot of logs, but 
arguably it could be. If someone has trouble with nodes going offline, they can 
turn up the debugging levels and see that phi_convict_threshold is the culprit.

There is also some other code cleanup in the Phi-Log-Debug-When-Close patch.

> Provide a JMX hook to monitor phi values in the FailureDetector
> ---------------------------------------------------------------
>
>                 Key: CASSANDRA-9526
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-9526
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Ron Kuris
>             Fix For: 2.0.x
>
>         Attachments: Monitor-Phi-JMX.patch.txt, 
> PHI-Log-Debug-When-Close.patch.txt, PHI-Race-Condition.patch.txt
>
>
> phi_convict_threshold can be tuned, but there's currently no way to monitor 
> the phi values to see if you're getting close.
> The attached patch adds the ability to get these values via JMX.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to