Jaskey Lam created ROCKETMQ-184:
-----------------------------------

             Summary: It takes too long(3-33 seconds) to switch to read from 
slave when master crashes
                 Key: ROCKETMQ-184
                 URL: https://issues.apache.org/jira/browse/ROCKETMQ-184
             Project: Apache RocketMQ
          Issue Type: Improvement
          Components: rocketmq-client, rocketmq-remoting
            Reporter: Jaskey Lam
            Assignee: Xiaorui Wang


When master crashes, no notifier callback has been triggered to pull message 
again.

Instead, it relies on the scan service to trigger timeout and then re pull.

But the pulling command has 30 seconds timeout, and after timeout, pulling 
operation will be scheduled after 3 seconds.

So it takes 3 to 33 seconds to switch to slave, which is too long and can be 
optimized.


The root cause is the below re pull is too long to be triggered when master 
crashes

{code}

            @Override
            public void onException(Throwable e) {
                if 
(!pullRequest.getMessageQueue().getTopic().startsWith(MixAll.RETRY_GROUP_TOPIC_PREFIX))
 {
                    log.warn("execute the pull request exception", e);
                }

                
DefaultMQPushConsumerImpl.this.executePullRequestLater(pullRequest, 
PULL_TIME_DELAY_MILLS_WHEN_EXCEPTION);
            }

{code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to