fracasula commented on issue #7682:
URL: https://github.com/apache/pulsar/issues/7682#issuecomment-694085248


   Hi guys, about the Golang deadlock, I'm able to replicate it with the latest 
image too (i.e. `streamnative/pulsar-all:2.6.1-sn-3`). It still shows that the 
client is stucks acking but it may not just be a client issue like suggested by 
@alexku7. By further developing my test I was able to overlap the logs of 
client and server and I can say that I consistenly get this on the server side 
each time the client gets stuck:
   
   ```
   time="2020-09-17T09:15:46+02:00" level=info msg="Error reading from 
connection" error="Short read when reading frame size: read tcp 
[::1]:48104->[::1]:32771: use of closed network connection" 
local_addr="[::1]:48104" remote_addr="pulsar://localhost:32771"
   07:15:46.482 [pulsar-io-51-16] INFO  
org.apache.pulsar.broker.service.ServerCnx - Closed connection from 
/172.17.0.1:38668
   07:15:46.482 [pulsar-io-51-16] INFO  
org.apache.pulsar.broker.service.persistent.PersistentDispatcherMultipleConsumers
 - Removed consumer 
Consumer{subscription=PersistentSubscription{topic=persistent://public/default/topic-name-61b9697b6f087aca,
 name=subscription-name-756c1ab9019bbcdc}, consumerId=1, 
consumerName=consumer-name-249de6d4895d0e15, address=/172.17.0.1:38668} with 
pending 200 acks
   07:15:46.486 [bookkeeper-ml-workers-OrderedExecutor-0-0] INFO  
org.apache.bookkeeper.mledger.impl.ManagedCursorImpl - 
[public/default/persistent/topic-name-61b9697b6f087aca-subscription-name-756c1ab9019bbcdc]
 Rewind from 13:700 to 13:400
       consumer_partition_integration_test.go:112: [2020-09-17 
09:15:46.62477658 +0200 CEST m=+181.898034767] Trying to ack 13:426:0
       consumer_partition_integration_test.go:112: [2020-09-16 
18:16:12.446798277 +0200 CEST m=+171.004720824] Trying to ack 13:1318:0
       consumer_partition_integration_test.go:112: [2020-09-16 
18:16:12.44681232 +0200 CEST m=+171.004734868] Trying to ack 13:1332:0
       consumer_partition_integration_test.go:112: [2020-09-16 
18:16:12.44682636 +0200 CEST m=+171.004748901] Trying to ack 13:1334:0
       consumer_partition_integration_test.go:112: [2020-09-16 
18:16:12.446840804 +0200 CEST m=+171.004763351] Trying to ack 13:1337:0
       consumer_partition_integration_test.go:112: [2020-09-16 
18:16:12.446756504 +0200 CEST m=+171.004679062] Trying to ack 13:1324:0
       consumer_partition_integration_test.go:112: [2020-09-16 
18:16:12.446845301 +0200 CEST m=+171.004767844] Trying to ack 13:1338:0
       consumer_partition_integration_test.go:112: [2020-09-16 
18:16:12.446863899 +0200 CEST m=+171.004786443] Trying to ack 13:1340:0
       consumer_partition_integration_test.go:112: [2020-09-16 
18:16:12.446876519 +0200 CEST m=+171.004799061] Trying to ack 13:1341:0
       consumer_partition_integration_test.go:112: [2020-09-16 
18:16:12.446783149 +0200 CEST m=+171.004705702] Trying to ack 13:1328:0
       consumer_partition_integration_test.go:112: [2020-09-16 
18:16:12.446772341 +0200 CEST m=+171.004694903] Trying to ack 13:1322:0
       consumer_partition_integration_test.go:112: [2020-09-16 
18:16:12.446789742 +0200 CEST m=+171.004712290] Trying to ack 13:1321:0
       consumer_partition_integration_test.go:112: [2020-09-16 
18:16:12.446902955 +0200 CEST m=+171.004825503] Trying to ack 13:1345:0
       consumer_partition_integration_test.go:112: [2020-09-16 
18:16:12.44692216 +0200 CEST m=+171.004844705] Trying to ack 13:1347:0
       consumer_partition_integration_test.go:112: [2020-09-16 
18:16:12.446938844 +0200 CEST m=+171.004861396] Trying to ack 13:1348:0
       consumer_partition_integration_test.go:112: [2020-09-16 
18:16:12.446957473 +0200 CEST m=+171.004880021] Trying to ack 13:1351:0
       consumer_partition_integration_test.go:112: [2020-09-16 
18:16:12.446980957 +0200 CEST m=+171.004903502] Trying to ack 13:1349:0
       consumer_partition_integration_test.go:112: [2020-09-16 
18:16:12.446985213 +0200 CEST m=+171.004907763] Trying to ack 13:1353:0
       consumer_partition_integration_test.go:112: [2020-09-16 
18:16:12.446798504 +0200 CEST m=+171.004721046] Trying to ack 13:1323:0
       consumer_partition_integration_test.go:112: [2020-09-16 
18:16:12.446791901 +0200 CEST m=+171.004714458] Trying to ack 13:1326:0
   ...
   ```
   
   Do you think that's something that could cause the consumer to hang forever 
waiting for those pending acks to be processed?
   
   I ran it multiple times, here's the output the second time:
   
   ```
   08:33:20.866 [pulsar-io-51-7] INFO  
org.apache.pulsar.broker.service.ServerCnx - [/172.17.0.1:40022] Created 
subscription on topic persistent://public/default/topic-name-61b9697b6f087aca / 
subscription-name-2da7ed37505244dc
   08:33:20.960 [pulsar-io-51-7] INFO  
org.apache.pulsar.broker.service.ServerCnx - Closed connection from 
/172.17.0.1:40022
   08:33:20.960 [pulsar-io-51-7] INFO  
org.apache.pulsar.broker.service.persistent.PersistentDispatcherMultipleConsumers
 - Removed consumer 
Consumer{subscription=PersistentSubscription{topic=persistent://public/default/topic-name-61b9697b6f087aca,
 name=subscription-name-2da7ed37505244dc}, consumerId=1, 
consumerName=consumer-name-748c24a16b99354a, address=/172.17.0.1:40022} with 
pending 148 ack
   ```
   
   Third:
   
   ```
   08:34:45.073 [pulsar-io-51-8] INFO  
org.apache.pulsar.broker.service.ServerCnx - New connection from 
/172.17.0.1:40030
   08:34:45.074 [pulsar-io-51-8] INFO  
org.apache.pulsar.broker.service.ServerCnx - [/172.17.0.1:40030] Subscribing on 
topic persistent://public/default/topic-name-61b9697b6f087aca / 
subscription-name-3afa9641ff625519
   08:34:45.074 [pulsar-io-51-8] INFO  
org.apache.bookkeeper.mledger.impl.ManagedCursorImpl - 
[public/default/persistent/topic-name-61b9697b6f087aca] Cursor 
subscription-name-3afa9641ff625519 recovered to position 41:99999
   08:34:45.074 [pulsar-io-51-8] INFO  
org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl - 
[public/default/persistent/topic-name-61b9697b6f087aca] Creating ledger, 
metadata: {component=[109, 97, 110, 97, 103, 101, 100, 45, 108, 101, 100, 103, 
101, 114], pulsar/managed-ledger=[112, 117, 98, 108, 105, 99, 47, 100, 101, 
102, 97, 117, 108, 116, 47, 112, 101, 114, 115, 105, 115, 116, 101, 110, 116, 
47, 116, 111, 112, 105, 99, 45, 110, 97, 109, 101, 45, 54, 49, 98, 57, 54, 57, 
55, 98, 54, 102, 48, 56, 55, 97, 99, 97], pulsar/cursor=[115, 117, 98, 115, 99, 
114, 105, 112, 116, 105, 111, 110, 45, 110, 97, 109, 101, 45, 51, 97, 102, 97, 
57, 54, 52, 49, 102, 102, 54, 50, 53, 53, 49, 57], application=[112, 117, 108, 
115, 97, 114]} - metadata ops timeout : 60 seconds
   08:34:45.079 [main-EventThread] INFO  
org.apache.bookkeeper.client.LedgerCreateOp - Ensemble: [127.0.0.1:3181] for 
ledger: 45
   08:34:45.081 [BookKeeperClientWorker-OrderedExecutor-5-0] INFO  
org.apache.bookkeeper.mledger.impl.MetaStoreImpl - 
[public/default/persistent/topic-name-61b9697b6f087aca] 
[subscription-name-3afa9641ff625519] Updating cursor info ledgerId=45 
mark-delete=41:99999
   08:34:45.084 [bookkeeper-ml-workers-OrderedExecutor-0-0] INFO  
org.apache.bookkeeper.mledger.impl.ManagedCursorImpl - 
[public/default/persistent/topic-name-61b9697b6f087aca] Updated cursor 
subscription-name-3afa9641ff625519 with ledger id 45 md-position=41:99999 
rd-position=41:100000
   08:34:45.084 [bookkeeper-ml-workers-OrderedExecutor-0-0] INFO  
org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl - 
[public/default/persistent/topic-name-61b9697b6f087aca] Opened new cursor: 
ManagedCursorImpl{ledger=public/default/persistent/topic-name-61b9697b6f087aca, 
name=subscription-name-3afa9641ff625519, ackPos=41:99999, readPos=41:100000}
   08:34:45.084 [bookkeeper-ml-workers-OrderedExecutor-0-0] INFO  
org.apache.bookkeeper.mledger.impl.ManagedCursorImpl - 
[public/default/persistent/topic-name-61b9697b6f087aca-subscription-name-3afa9641ff625519]
 Rewind from 41:0 to 41:0
   08:34:45.085 [bookkeeper-ml-workers-OrderedExecutor-0-0] INFO  
org.apache.pulsar.broker.service.persistent.PersistentTopic - 
[persistent://public/default/topic-name-61b9697b6f087aca] There are no 
replicated subscriptions on the topic
   08:34:45.085 [bookkeeper-ml-workers-OrderedExecutor-0-0] INFO  
org.apache.pulsar.broker.service.persistent.PersistentTopic - 
[persistent://public/default/topic-name-61b9697b6f087aca][subscription-name-3afa9641ff625519]
 Created new subscription for 1
   08:34:45.085 [bookkeeper-ml-workers-OrderedExecutor-0-0] INFO  
org.apache.pulsar.broker.service.ServerCnx - [/172.17.0.1:40030] Created 
subscription on topic persistent://public/default/topic-name-61b9697b6f087aca / 
subscription-name-3afa9641ff625519
   08:34:45.086 [pulsar-io-51-8] INFO  
org.apache.pulsar.broker.service.ServerCnx - Closed connection from 
/172.17.0.1:40030
   08:34:45.086 [pulsar-io-51-8] INFO  
org.apache.pulsar.broker.service.persistent.PersistentDispatcherMultipleConsumers
 - Removed consumer 
Consumer{subscription=PersistentSubscription{topic=persistent://public/default/topic-name-61b9697b6f087aca,
 name=subscription-name-3afa9641ff625519}, consumerId=1, 
consumerName=consumer-name-3f0be6cd39d4812b, address=/172.17.0.1:40030} with 
pending 0 acks
   08:34:45.087 [bookkeeper-ml-workers-OrderedExecutor-0-0] INFO  
org.apache.bookkeeper.mledger.impl.ManagedCursorImpl - 
[public/default/persistent/topic-name-61b9697b6f087aca-subscription-name-3afa9641ff625519]
 Rewind from 41:1 to 41:0
   08:34:45.187 [pulsar-io-51-9] INFO  
org.apache.pulsar.broker.service.ServerCnx - New connection from 
/172.17.0.1:40034
   08:34:45.190 [pulsar-io-51-9] INFO  
org.apache.pulsar.broker.service.ServerCnx - [/172.17.0.1:40034] Subscribing on 
topic persistent://public/default/topic-name-61b9697b6f087aca / 
subscription-name-3afa9641ff625519
   08:34:45.190 [pulsar-io-51-9] INFO  
org.apache.bookkeeper.mledger.impl.ManagedCursorImpl - 
[public/default/persistent/topic-name-61b9697b6f087aca-subscription-name-3afa9641ff625519]
 Rewind from 41:0 to 41:0
   08:34:45.190 [pulsar-io-51-9] INFO  
org.apache.pulsar.broker.service.persistent.PersistentTopic - 
[persistent://public/default/topic-name-61b9697b6f087aca] There are no 
replicated subscriptions on the topic
   08:34:45.191 [pulsar-io-51-9] INFO  
org.apache.pulsar.broker.service.persistent.PersistentTopic - 
[persistent://public/default/topic-name-61b9697b6f087aca][subscription-name-3afa9641ff625519]
 Created new subscription for 1
   08:34:45.191 [pulsar-io-51-9] INFO  
org.apache.pulsar.broker.service.ServerCnx - [/172.17.0.1:40034] Created 
subscription on topic persistent://public/default/topic-name-61b9697b6f087aca / 
subscription-name-3afa9641ff625519
   08:34:45.286 [pulsar-io-51-9] INFO  
org.apache.pulsar.broker.service.ServerCnx - Closed connection from 
/172.17.0.1:40034
   08:34:45.286 [pulsar-io-51-9] INFO  
org.apache.pulsar.broker.service.persistent.PersistentDispatcherMultipleConsumers
 - Removed consumer 
Consumer{subscription=PersistentSubscription{topic=persistent://public/default/topic-name-61b9697b6f087aca,
 name=subscription-name-3afa9641ff625519}, consumerId=1, 
consumerName=consumer-name-3f0be6cd39d4812b, address=/172.17.0.1:40034} with 
pending 64 acks
   08:34:45.286 [bookkeeper-ml-workers-OrderedExecutor-0-0] INFO  
org.apache.bookkeeper.mledger.impl.ManagedCursorImpl - 
[public/default/persistent/topic-name-61b9697b6f087aca-subscription-name-3afa9641ff625519]
 Rewind from 41:3000 to 41:2836
   08:34:50.409 [pulsar-web-69-13] INFO  org.eclipse.jetty.server.RequestLog - 
127.0.0.1 - - [17/Sep/2020:08:34:50 +0000] "GET 
/admin/v2/persistent/public/functions/coordinate/stats?getPreciseBacklog=false 
HTTP/1.1" 200 1283 "-" "Pulsar-Java-v2.6.1-sn-3" 7
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to