codelipenghui opened a new issue #10703: URL: https://github.com/apache/pulsar/issues/10703
**Describe the bug** All the broker can't complete the managed ledger creation, the managed ledger will in the `None` state for 1min and after 1min to reconnect to the topic, the managed ledger been removed due to `too long pending time` Here is the logs: ``` May 25 15:24:35 ip-10-0-0-179 pulsar[39322]: 15:24:35.579 [pulsar-io-4-7] INFO org.apache.pulsar.broker.service.ServerCnx - [/10.0.0.179:46878][persistent://public/default/test-partition-0] Creating producer. producerId=0 May 25 15:24:35 ip-10-0-0-179 pulsar[39322]: 15:24:35.580 [pulsar-ordered-OrderedExecutor-3-0] WARN org.apache.bookkeeper.mledger.impl.ManagedLedgerFactoryImpl - [public/default/persistent/test-partition-0] Managed ledger has been pending in initialize state more than 1899091 milliseconds, remove it from cache to retry ... May 25 15:24:35 ip-10-0-0-179 pulsar[39322]: 15:24:35.580 [pulsar-ordered-OrderedExecutor-3-0] INFO org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl - Opening managed ledger public/default/persistent/test-partition-0 May 25 15:24:35 ip-10-0-0-179 pulsar[39322]: 15:24:35.581 [bookkeeper-ml-scheduler-OrderedScheduler-6-0] INFO org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl - [public/default/persistent/test-partition-0] Creating ledger, metadata: {component=[109, 97, 110, 97, 103, 101, 100, 45, 108, 101, 100, 103, 101, 114], pulsar/managed-ledger=[112, 117, 98, 108, 105, 99, 47, 100, 101, 102, 97, 117, 108, 116, 47, 112, 101, 114, 115, 105, 115, 116, 101, 110, 116, 47, 116, 101, 115, 116, 45, 112, 97, 114, 116, 105, 116, 105, 111, 110, 45, 48], application=[112, 117, 108, 115, 97, 114]} - metadata ops timeout : 60 seconds May 25 15:24:35 ip-10-0-0-179 pulsar[39322]: 15:24:35.581 [bookkeeper-ml-scheduler-OrderedScheduler-6-0] WARN org.apache.bookkeeper.client.BookieWatcherImpl - New ensemble: [10.0.0.179:3181, 10.0.0.104:3181] is not adhering to Placement Policy. quarantinedBookies: [] May 25 15:24:35 ip-10-0-0-179 pulsar[39322]: 15:24:35.586 [main-EventThread] INFO org.apache.bookkeeper.client.LedgerCreateOp - Ensemble: [10.0.0.179:3181, 10.0.0.104:3181] for ledger: 42 May 25 15:24:38 ip-10-0-0-179 pulsar[39322]: 15:24:38.762 [pulsar-web-40-5] INFO org.eclipse.jetty.server.RequestLog - 10.0.0.156 - - [25/May/2021:15:24:38 +0000] "GET /metrics HTTP/1.1" 302 0 "-" "Prometheus/2.21.0" 0 May 25 15:24:38 ip-10-0-0-179 pulsar[39322]: 15:24:38.764 [prometheus-stats-41-1] INFO org.eclipse.jetty.server.RequestLog - 10.0.0.156 - - [25/May/2021:15:24:38 +0000] "GET /metrics/ HTTP/1.1" 200 43130 "http://10.0.0.179:8080/metrics" "Prometheus/2.21.0" 2 May 25 15:24:43 ip-10-0-0-179 pulsar[39322]: 15:24:43.866 [pulsar-io-4-6] INFO org.apache.pulsar.broker.service.ServerCnx - Closed connection from /127.0.0.1:59372 May 25 15:24:43 ip-10-0-0-179 pulsar[39322]: 15:24:43.866 [pulsar-io-4-7] INFO org.apache.pulsar.broker.service.ServerCnx - Closed connection from /10.0.0.179:46878 May 25 15:24:48 ip-10-0-0-179 pulsar[39322]: 15:24:48.762 [pulsar-web-40-5] INFO org.eclipse.jetty.server.RequestLog - 10.0.0.156 - - [25/May/2021:15:24:48 +0000] "GET /metrics HTTP/1.1" 302 0 "-" "Prometheus/2.21.0" 0 May 25 15:24:48 ip-10-0-0-179 pulsar[39322]: 15:24:48.764 [prometheus-stats-41-1] INFO org.eclipse.jetty.server.RequestLog - 10.0.0.156 - - [25/May/2021:15:24:48 +0000] "GET /metrics/ HTTP/1.1" 200 43128 "http://10.0.0.179:8080/metrics" "Prometheus/2.21.0" 2 May 25 15:24:58 ip-10-0-0-179 pulsar[39322]: 15:24:58.761 [pulsar-web-40-5] INFO org.eclipse.jetty.server.RequestLog - 10.0.0.156 - - [25/May/2021:15:24:58 +0000] "GET /metrics HTTP/1.1" 302 0 "-" "Prometheus/2.21.0" 0 May 25 15:24:58 ip-10-0-0-179 pulsar[39322]: 15:24:58.764 [prometheus-stats-41-1] INFO org.eclipse.jetty.server.RequestLog - 10.0.0.156 - - [25/May/2021:15:24:58 +0000] "GET /metrics/ HTTP/1.1" 200 43126 "http://10.0.0.179:8080/metrics" "Prometheus/2.21.0" 2 May 25 15:25:01 ip-10-0-0-179 CRON[51146]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1) May 25 15:25:08 ip-10-0-0-179 pulsar[39322]: 15:25:08.761 [pulsar-web-40-5] INFO org.eclipse.jetty.server.RequestLog - 10.0.0.156 - - [25/May/2021:15:25:08 +0000] "GET /metrics HTTP/1.1" 302 0 "-" "Prometheus/2.21.0" 0 May 25 15:25:08 ip-10-0-0-179 pulsar[39322]: 15:25:08.764 [prometheus-stats-41-1] INFO org.eclipse.jetty.server.RequestLog - 10.0.0.156 - - [25/May/2021:15:25:08 +0000] "GET /metrics/ HTTP/1.1" 200 43128 "http://10.0.0.179:8080/metrics" "Prometheus/2.21.0" 2 May 25 15:25:18 ip-10-0-0-179 pulsar[39322]: 15:25:18.761 [pulsar-web-40-5] INFO org.eclipse.jetty.server.RequestLog - 10.0.0.156 - - [25/May/2021:15:25:18 +0000] "GET /metrics HTTP/1.1" 302 0 "-" "Prometheus/2.21.0" 0 May 25 15:25:18 ip-10-0-0-179 pulsar[39322]: 15:25:18.764 [prometheus-stats-41-1] INFO org.eclipse.jetty.server.RequestLog - 10.0.0.156 - - [25/May/2021:15:25:18 +0000] "GET /metrics/ HTTP/1.1" 200 43128 "http://10.0.0.179:8080/metrics" "Prometheus/2.21.0" 2 May 25 15:25:28 ip-10-0-0-179 pulsar[39322]: 15:25:28.762 [pulsar-web-40-5] INFO org.eclipse.jetty.server.RequestLog - 10.0.0.156 - - [25/May/2021:15:25:28 +0000] "GET /metrics HTTP/1.1" 302 0 "-" "Prometheus/2.21.0" 1 May 25 15:25:28 ip-10-0-0-179 pulsar[39322]: 15:25:28.764 [prometheus-stats-41-1] INFO org.eclipse.jetty.server.RequestLog - 10.0.0.156 - - [25/May/2021:15:25:28 +0000] "GET /metrics/ HTTP/1.1" 200 43158 "http://10.0.0.179:8080/metrics" "Prometheus/2.21.0" 2 May 25 15:25:35 ip-10-0-0-179 pulsar[39322]: 15:25:35.580 [pulsar-io-4-8] ERROR org.apache.pulsar.broker.service.ServerCnx - [/10.0.0.179:46878] Failed to create topic persistent://public/default/test-partition-0, producerId=0 May 25 15:25:35 ip-10-0-0-179 pulsar[39322]: java.util.concurrent.CompletionException: org.apache.pulsar.common.util.FutureUtil$LowOverheadTimeoutException: Failed to load topic within timeout May 25 15:25:35 ip-10-0-0-179 pulsar[39322]: #011at java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:331) ~[?:?] May 25 15:25:35 ip-10-0-0-179 pulsar[39322]: #011at java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:346) ~[?:?] May 25 15:25:35 ip-10-0-0-179 pulsar[39322]: #011at java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:632) ~[?:?] May 25 15:25:35 ip-10-0-0-179 pulsar[39322]: #011at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506) ~[?:?] May 25 15:25:35 ip-10-0-0-179 pulsar[39322]: #011at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2088) ~[?:?] May 25 15:25:35 ip-10-0-0-179 pulsar[39322]: #011at org.apache.pulsar.common.util.FutureUtil.lambda$addTimeoutHandling$1(FutureUtil.java:129) ~[org.apache.pulsar-pulsar-common-2.8.0-SNAPSHOT.jar:2.8.0-SNAPSHOT] May 25 15:25:35 ip-10-0-0-179 pulsar[39322]: #011at io.netty.util.concurrent.PromiseTask.runTask(PromiseTask.java:98) [io.netty-netty-common-4.1.63.Final.jar:4.1.63.Final] May 25 15:25:35 ip-10-0-0-179 pulsar[39322]: #011at io.netty.util.concurrent.ScheduledFutureTask.run(ScheduledFutureTask.java:170) [io.netty-netty-common-4.1.63.Final.jar:4.1.63.Final] May 25 15:25:35 ip-10-0-0-179 pulsar[39322]: #011at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164) [io.netty-netty-common-4.1.63.Final.jar:4.1.63.Final] May 25 15:25:35 ip-10-0-0-179 pulsar[39322]: #011at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472) [io.netty-netty-common-4.1.63.Final.jar:4.1.63.Final] May 25 15:25:35 ip-10-0-0-179 pulsar[39322]: #011at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:384) [io.netty-netty-transport-native-epoll-4.1.63.Final-linux-x86_64.jar:4.1.63.Final] May 25 15:25:35 ip-10-0-0-179 pulsar[39322]: #011at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989) [io.netty-netty-common-4.1.63.Final.jar:4.1.63.Final] May 25 15:25:35 ip-10-0-0-179 pulsar[39322]: #011at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) [io.netty-netty-common-4.1.63.Final.jar:4.1.63.Final] May 25 15:25:35 ip-10-0-0-179 pulsar[39322]: #011at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) [io.netty-netty-common-4.1.63.Final.jar:4.1.63.Final] May 25 15:25:35 ip-10-0-0-179 pulsar[39322]: #011at java.lang.Thread.run(Thread.java:829) [?:?] May 25 15:25:35 ip-10-0-0-179 pulsar[39322]: Caused by: org.apache.pulsar.common.util.FutureUtil$LowOverheadTimeoutException: Failed to load topic within timeout May 25 15:25:35 ip-10-0-0-179 pulsar[39322]: #011at org.apache.pulsar.broker.service.BrokerService.futureWithDeadline(...)(Unknown Source) ~[org.apache.pulsar-pulsar-broker-2.8.0-SNAPSHOT.jar:2.8.0-SNAPSHOT] May 25 15:25:38 ip-10-0-0-179 pulsar[39322]: 15:25:38.761 [pulsar-web-40-5] INFO org.eclipse.jetty.server.RequestLog - 10.0.0.156 - - [25/May/2021:15:25:38 +0000] "GET /metrics HTTP/1.1" 302 0 "-" "Prometheus/2.21.0" 0 May 25 15:25:38 ip-10-0-0-179 pulsar[39322]: 15:25:38.764 [prometheus-stats-41-1] INFO org.eclipse.jetty.server.RequestLog - 10.0.0.156 - - [25/May/2021:15:25:38 +0000] "GET /metrics/ HTTP/1.1" 200 43157 "http://10.0.0.179:8080/metrics" "Prometheus/2.21.0" 2 May 25 15:25:40 ip-10-0-0-179 systemd[1]: Started Session 115 of user ubuntu. May 25 15:25:48 ip-10-0-0-179 pulsar[39322]: 15:25:48.761 [pulsar-web-40-5] INFO org.eclipse.jetty.server.RequestLog - 10.0.0.156 - - [25/May/2021:15:25:48 +0000] "GET /metrics HTTP/1.1" 302 0 "-" "Prometheus/2.21.0" 0 May 25 15:25:48 ip-10-0-0-179 pulsar[39322]: 15:25:48.764 [prometheus-stats-41-1] INFO org.eclipse.jetty.server.RequestLog - 10.0.0.156 - - [25/May/2021:15:25:48 +0000] "GET /metrics/ HTTP/1.1" 200 43157 "http://10.0.0.179:8080/metrics" "Prometheus/2.21.0" 2 May 25 15:25:53 ip-10-0-0-179 pulsar[39322]: 15:25:53.534 [pulsar-io-4-10] INFO org.apache.pulsar.broker.service.ServerCnx - New connection from /127.0.0.1:59376 May 25 15:25:53 ip-10-0-0-179 pulsar[39322]: 15:25:53.686 [pulsar-io-4-11] INFO org.apache.pulsar.broker.service.ServerCnx - New connection from /10.0.0.179:46882 May 25 15:25:53 ip-10-0-0-179 pulsar[39322]: 15:25:53.690 [pulsar-io-4-11] INFO org.apache.pulsar.broker.service.ServerCnx - [/10.0.0.179:46882][persistent://public/default/test-partition-0] Creating producer. producerId=0 May 25 15:25:53 ip-10-0-0-179 pulsar[39322]: 15:25:53.690 [pulsar-ordered-OrderedExecutor-3-0] WARN org.apache.bookkeeper.mledger.impl.ManagedLedgerFactoryImpl - [public/default/persistent/test-partition-0] Managed ledger has been pending in initialize state more than 78110 milliseconds, remove it from cache to retry ... May 25 15:25:53 ip-10-0-0-179 pulsar[39322]: 15:25:53.690 [pulsar-ordered-OrderedExecutor-3-0] INFO org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl - Opening managed ledger public/default/persistent/test-partition-0 May 25 15:25:53 ip-10-0-0-179 pulsar[39322]: 15:25:53.691 [bookkeeper-ml-scheduler-OrderedScheduler-6-0] INFO org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl - [public/default/persistent/test-partition-0] Creating ledger, metadata: {component=[109, 97, 110, 97, 103, 101, 100, 45, 108, 101, 100, 103, 101, 114], pulsar/managed-ledger=[112, 117, 98, 108, 105, 99, 47, 100, 101, 102, 97, 117, 108, 116, 47, 112, 101, 114, 115, 105, 115, 116, 101, 110, 116, 47, 116, 101, 115, 116, 45, 112, 97, 114, 116, 105, 116, 105, 111, 110, 45, 48], application=[112, 117, 108, 115, 97, 114]} - metadata ops timeout : 60 seconds May 25 15:25:53 ip-10-0-0-179 pulsar[39322]: 15:25:53.692 [bookkeeper-ml-scheduler-OrderedScheduler-6-0] WARN org.apache.bookkeeper.client.BookieWatcherImpl - New ensemble: [10.0.0.179:3181, 10.0.0.104:3181] is not adhering to Placement Policy. quarantinedBookies: [] May 25 15:25:53 ip-10-0-0-179 pulsar[39322]: 15:25:53.696 [main-EventThread] INFO org.apache.bookkeeper.client.LedgerCreateOp - Ensemble: [10.0.0.179:3181, 10.0.0.104:3181] for ledger: 43 ``` This will affect all topics of the broker and all the 3 brokers are in this state. I have make a thread dump and heap dump: [heap.hprof.zip](https://github.com/apache/pulsar/files/6540692/heap.hprof.zip) [jstack_v2.log](https://github.com/apache/pulsar/files/6540699/jstack_v2.log) **To Reproduce** It's difficult to reproduce **Additional context** current master branch -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org