xuesongxs opened a new issue, #15559: URL: https://github.com/apache/pulsar/issues/15559
**Describe the bug** Pulsar v2.8.1 Pulsar cluster: 3 brokers Producer can‘t continue sending messages after all brokers are restarted. **To Reproduce** Steps to reproduce the behavior: 1. Create producer ``` public class PulsarProducerDemo3 { // 连接集群 broker private static String localClusterUrl = "pulsar://127.0.0.1:6650,127.0.0.1:6651,127.0.0.1:6652"; public static void main(String[] args) { try { Producer<String> producer = getProducer(); Long start = System.currentTimeMillis(); int i = 0; while (i < 50000) { producer.send(i + ""); System.out.println("send msg:" + i + ""); i++; } } catch (Exception e) { System.err.println("send fail:" + e); } } public static Producer<String> getProducer() throws Exception { PulsarClient client; Map<String, Object> prop = new HashMap<>(); prop.put("topicName", "persistent://public/default/test-string3"); client = PulsarClient.builder().serviceUrl(localClusterUrl).build(); Producer<String> producer = client.newProducer(Schema.STRING) .loadConf(prop) .create(); return producer; } } ``` 2. Run producer 3. After sending 100 messages, stop all brokers 4. See producer's log ``` [pulsar-timer-5-1] INFO org.apache.pulsar.client.impl.ConnectionHandler - [persistent://public/default/test-string3-partition-2] [pulsar-cluster-10-0] Reconnecting after connection was closed [pulsar-client-io-1-1] WARN org.apache.pulsar.client.impl.ConnectionPool - Failed to open connection to 127.0.0.1:6650 : org.apache.pulsar.shade.io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: /127.0.0.1:6650 [pulsar-client-io-1-1] WARN org.apache.pulsar.client.impl.ConnectionHandler - [persistent://public/default/test-string3-partition-2] [pulsar-cluster-10-0] Error connecting to broker: org.apache.pulsar.client.api.PulsarClientException: java.util.concurrent.CompletionException: org.apache.pulsar.shade.io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: /127.0.0.1:6650 [pulsar-client-io-1-1] WARN org.apache.pulsar.client.impl.ConnectionHandler - [persistent://public/default/test-string3-partition-2] [pulsar-cluster-10-0] Could not get connection to broker: org.apache.pulsar.client.api.PulsarClientException: java.util.concurrent.CompletionException: org.apache.pulsar.shade.io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: /127.0.0.1:6650 -- Will try again in 1.496 s [pulsar-timer-5-1] INFO org.apache.pulsar.client.impl.ConnectionHandler - [persistent://public/default/test-string3-partition-2] [pulsar-cluster-10-0] Reconnecting after connection was closed [pulsar-client-io-1-1] WARN org.apache.pulsar.client.impl.ConnectionPool - Failed to open connection to 172.32.149.123:16650 : org.apache.pulsar.shade.io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: /172.32.149.123:16650 [pulsar-client-io-1-1] WARN org.apache.pulsar.client.impl.ConnectionHandler - [persistent://public/default/test-string3-partition-2] [pulsar-cluster-10-0] Error connecting to broker: org.apache.pulsar.client.api.PulsarClientException: java.util.concurrent.CompletionException: org.apache.pulsar.shade.io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: /127.0.0.1:6651 [pulsar-client-io-1-1] WARN org.apache.pulsar.client.impl.ConnectionHandler - [persistent://public/default/test-string3-partition-2] [pulsar-cluster-10-0] Could not get connection to broker: org.apache.pulsar.client.api.PulsarClientException: java.util.concurrent.CompletionException: org.apache.pulsar.shade.io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: /127.0.0.1:6651 -- Will try again in 3.193 s [pulsar-timer-5-1] INFO org.apache.pulsar.client.impl.ConnectionHandler - [persistent://public/default/test-string3-partition-1] [pulsar-cluster-11-0] Reconnecting after connection was closed [pulsar-client-io-1-1] WARN org.apache.pulsar.client.impl.ConnectionPool - Failed to open connection to 127.0.0.1:6652 : org.apache.pulsar.shade.io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: /127.0.0.1:6652 [pulsar-client-io-1-1] WARN org.apache.pulsar.client.impl.ConnectionHandler - [persistent://public/default/test-string3-partition-1] [pulsar-cluster-11-0] Error connecting to broker: org.apache.pulsar.client.api.PulsarClientException: java.util.concurrent.CompletionException: org.apache.pulsar.shade.io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: /127.0.0.1:6652 [pulsar-client-io-1-1] WARN org.apache.pulsar.client.impl.ConnectionHandler - [persistent://public/default/test-string3-partition-1] [pulsar-cluster-11-0] Could not get connection to broker: org.apache.pulsar.client.api.PulsarClientException: java.util.concurrent.CompletionException: org.apache.pulsar.shade.io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: /127.0.0.1:6652 -- Will try again in 2.97 s ``` 5. Start all brokers 6. See producer's log ``` [pulsar-client-io-1-1] INFO org.apache.pulsar.client.impl.ConnectionPool - [[id: 0x7fa30072, L:/127.0.0.1:50232 - R:/127.0.0.1:6650]] Connected to server [pulsar-client-io-1-1] INFO org.apache.pulsar.client.impl.ProducerImpl - [persistent://public/default/test-string3-partition-2] [pulsar-cluster-14-0] Creating producer on cnx [id: 0x7fa30072, L:/127.0.0.1:50232 - R:/127.0.0.1:16651] [pulsar-timer-5-1] INFO org.apache.pulsar.client.impl.ConnectionHandler - [persistent://public/default/test-string3-partition-0] [pulsar-cluster-14-1] Reconnecting after connection was closed [pulsar-client-io-1-1] INFO org.apache.pulsar.client.impl.ConnectionPool - [[id: 0xa49808e4, L:/127.0.0.1:58935 - R:/127.0.0.1:6651]] Connected to server [pulsar-client-io-1-1] INFO org.apache.pulsar.client.impl.ProducerImpl - [persistent://public/default/test-string3-partition-0] [pulsar-cluster-14-1] Creating producer on cnx [id: 0xa49808e4, L:/127.0.0.1:58935 - R:/127.0.0.1:6652] ``` Creating producer success, but producer can‘t continue sending messages after all brokers are restarted. 7. Jstack producer's pid ``` Full thread dump Java HotSpot(TM) 64-Bit Server VM (25.251-b08 mixed mode): "Attach Listener" #13 daemon prio=9 os_prio=0 tid=0x00007fc510001000 nid=0x4dc2 waiting on condition [0x00000000 00000000] java.lang.Thread.State: RUNNABLE "DestroyJavaVM" #12 prio=5 os_prio=0 tid=0x00007fc53c009800 nid=0x4c40 waiting on condition [0x0000000000000000] java.lang.Thread.State: RUNNABLE "pulsar-external-listener-3-1" #11 prio=5 os_prio=0 tid=0x00007fc4f40b3800 nid=0x4c95 waiting on condition [0x00 007fc5185f5000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x000000078f4085b8> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$Con ditionObject) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronize r.java:2039) at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.ja va:1081) at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.ja va:809) at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at org.apache.pulsar.shade.io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable. java:30) at java.lang.Thread.run(Thread.java:748) "pulsar-timer-5-1" #10 prio=5 os_prio=0 tid=0x00007fc4f4065800 nid=0x4c4c waiting on condition [0x00007fc5192170 00] java.lang.Thread.State: TIMED_WAITING (sleeping) at java.lang.Thread.sleep(Native Method) at org.apache.pulsar.shade.io.netty.util.HashedWheelTimer$Worker.waitForNextTick(HashedWheelTimer.java:5 66) at org.apache.pulsar.shade.io.netty.util.HashedWheelTimer$Worker.run(HashedWheelTimer.java:462) at org.apache.pulsar.shade.io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable. java:30) at java.lang.Thread.run(Thread.java:748) "pulsar-client-io-1-1" #8 prio=5 os_prio=0 tid=0x00007fc53c40b800 nid=0x4c4b runnable [0x00007fc52c18a000] java.lang.Thread.State: RUNNABLE at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method) at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269) at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93) at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86) - locked <0x000000078f300d50> (a org.apache.pulsar.shade.io.netty.channel.nio.SelectedSelectionKeySet) - locked <0x000000078f300d68> (a java.util.Collections$UnmodifiableSet) - locked <0x000000078f300d08> (a sun.nio.ch.EPollSelectorImpl) at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97) at org.apache.pulsar.shade.io.netty.channel.nio.SelectedSelectionKeySetSelector.select(SelectedSelection KeySetSelector.java:62) at org.apache.pulsar.shade.io.netty.channel.nio.NioEventLoop.select(NioEventLoop.java:814) at org.apache.pulsar.shade.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:457) at org.apache.pulsar.shade.io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExe cutor.java:986) at org.apache.pulsar.shade.io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) at org.apache.pulsar.shade.io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable. java:30) at java.lang.Thread.run(Thread.java:748) "Service Thread" #7 daemon prio=9 os_prio=0 tid=0x00007fc53c0d4000 nid=0x4c49 runnable [0x0000000000000000] java.lang.Thread.State: RUNNABLE "C1 CompilerThread1" #6 daemon prio=9 os_prio=0 tid=0x00007fc53c0b7800 nid=0x4c48 waiting on condition [0x000000 0000000000] java.lang.Thread.State: RUNNABLE "C2 CompilerThread0" #5 daemon prio=9 os_prio=0 tid=0x00007fc53c0b4800 nid=0x4c47 waiting on condition [0x000000 0000000000] java.lang.Thread.State: RUNNABLE ``` **Expected behavior** A clear and concise description of what you expected to happen. **Screenshots** If applicable, add screenshots to help explain your problem. **Desktop (please complete the following information):** - OS: [e.g. iOS] **Additional context** Add any other context about the problem here. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pulsar.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org