zbentley commented on issue #10721:
URL: https://github.com/apache/pulsar/issues/10721#issuecomment-850644626


   I ran the below code:
   
   ```
   #!/usr/bin/env python
   import sys
   import tempfile
   
   import pulsar
   
   
   log_config = tempfile.NamedTemporaryFile(delete=False)
   log_config.write("""
   <?xml version="1.0" encoding="UTF-8" ?>
   
   <log4j:configuration xmlns:log4j="http://jakarta.apache.org/log4j/";>
       <appender name="FileAppender" class="org.apache.log4j.FileAppender">
           <param name="file" value="LogFile.log" />
           <param name="append" value="true" />
           <layout class="org.apache.log4j.PatternLayout">
               <param name="ConversionPattern" value="%d %-5p %C{2} (%F:%L) - 
%m%n" />
           </layout>
       </appender>
   
       <root>
           <priority value="all" />
           <appender-ref ref="FileAppender" />
       </root>
   </log4j:configuration>
   """.encode('utf-8'))
   log_config.flush()
   
   
   def get_producer():
       return pulsar.Client(
           'pulsar://pulsar-blt-broker.broker-load-test-blt:6650',
           operation_timeout_seconds=10,
           log_conf_file_path=log_config.name,
       ).create_producer(
           'foobar',
           block_if_queue_full=True,
           send_timeout_millis=1000,
           batching_enabled=False,
       )
   
   
   producer = get_producer()
   
   sent = 0
   while True:
       try:
           producer.send(
               b'test_payload',
               disable_replication=True,
           )
           sent += 1
           if sent % 1000 == 0:
               print('Sent: ', sent)
               sys.stdout.flush()
               sys.stderr.flush()
       except Exception as e:
           if 'pulsar' in str(e).lower():
               print("Reconnecting to producer")
               sys.stdout.flush()
               sys.stderr.flush()
               producer = get_producer()
           else:
               raise
   
   ```
   
   And then started killing brokers in the 5-node cluster I was running, via 
`SIGSTOP`.
   
   In fairly short order, that code stopped emitting "Sent: " updates, and 
printed `Reconnecting to producer`, at which point it got stuck.
   
   It remained stuck for 15 minutes of observation, and only became unstuck 
when I restored the SIGSTOPped brokers to service.
   
   Here are the logs: 
   [logs.txt](https://github.com/apache/pulsar/files/6563104/logs.txt)
   
   Here is a `strace -fp` output:
   [strace.txt](https://github.com/apache/pulsar/files/6563107/strace.txt)
   
   Here are the thread stacks:
   [stacks.txt](https://github.com/apache/pulsar/files/6563110/stacks.txt)
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to