zbentley commented on issue #10721: URL: https://github.com/apache/pulsar/issues/10721#issuecomment-909623281
Sorry for taking so long to come back to this. I was able to reproduce the issue on 2.8.0 with the `2.8.0.post0` Python client (MacOS 10.11, Python 3.6.9). Better yet, I was able to reproduce it against standalone broker with the below snippet, no partitioned topic or multi-node ensemble required. Additionally, another symptom of this bug is that if the below snippet is launched after the broker is SIGSTOPped, it hangs (does not respect `operation_timeout_secs`) on startup, after printing `Connected to Broker`. ``` import sys import pulsar clients = [] def logflush(msg): print(msg) sys.stdout.flush() sys.stderr.flush() def get_producer(): try: cl = pulsar.Client( 'pulsar://localhost:6650', operation_timeout_seconds=1, ) clients.append(cl) # intentionally leak clients on reconnect to prevent segfaults per https://github.com/apache/pulsar/issues/6463 rv = cl.create_producer('foobar') logflush('Created producer') return rv except Exception as e: logflush('error creating producer: {}'.format(e)) raise producer = get_producer() sent = 0 while True: try: producer.send(b'foo') sent += 1 if sent % 1000 == 0: logflush('Sent: {}'.format(sent)) except Exception as e: logflush('Got exception: {}'.format(e)) producer = get_producer() ``` When I `kill -SIGSTOP` the standalone broker's PID, the Python client hangs. I waited for 120sec for any logs to be emitted and attached them below, and also grabbed a `bt all` stack report from LLDB. Here are those: [logs.txt](https://github.com/apache/pulsar/files/7086726/logs.txt) [stacks.txt](https://github.com/apache/pulsar/files/7086727/stacks.txt) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pulsar.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org