[ https://issues.apache.org/jira/browse/CASSANDRA-12146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Aleksey Yeschenko updated CASSANDRA-12146: ------------------------------------------ Fix Version/s: (was: 3.9) 3.8 > Use dedicated executor for sending JMX notifications > ---------------------------------------------------- > > Key: CASSANDRA-12146 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12146 > Project: Cassandra > Issue Type: Bug > Components: Observability > Reporter: Stefan Podkowinski > Assignee: Stefan Podkowinski > Fix For: 2.2.8, 3.0.9, 3.8 > > Attachments: 12146-2.2.patch > > > I'm currently looking into an issue with our repair process where we can > notice a significant delay at the end of the repair task and before nodetool > is actually terminating. At the same time JMX NOTIF_LOST errors are reported > in nodetool during most repair runs. > Currently {{StorageService.repairAsync(keyspace, options)}} is called through > JMX, which will start a new thread executing RepairRunnable using the > provided options. StorageService itself implements > NotificationBroadcasterSupport and will send JMX progress notifications > emitted from RepairRunnable (or during bootstrap). If you take a closer look > at {{RepairRunnable}}, {{JMXProgressSupport}} and > {{StorageService/NotificationBroadcasterSupport.sendNotification}} you'll > notice that this all happens within the calling thread, i.e. RepairRunnable. > Given the lost notifications and all kind of potential networking related > issues, I'm not really comfortable having the repair coordinator thread > running in the JMX stack. Fortunately NotificationBroadcasterSupport accepts > a custom executor as constructor argument. See attached patched. -- This message was sent by Atlassian JIRA (v6.3.4#6332)