Github user HeartSaVioR commented on the pull request:

    https://github.com/apache/storm/pull/532#issuecomment-158578419
  
    @revans2 
    Sorry to respond late.
    I agree that we can set default value of TOPOLOGY_SHELLBOLT_MAX_PENDING to 
reasonable value  (not off) when deadlock bug you stated is fixed. Then 
ShellBolt could play well with ABP. 
    I'm curious that ShellBolt could play with ABP when 
TOPOLOGY_SHELLBOLT_MAX_PENDING is off cause ShellBolt always tries to buffer 
pending writes ASAP regardless of pressure of subprocess.
    
    In order to resolve deadlock, I think we can have another queue which only 
stores pending messages for taskid which is unbounded (not affected to 
TOPOLOGY_SHELLBOLT_MAX_PENDING). 
    And we introduce priorities for kinds of ShellBolt message, heartbeat > 
taskid > process, and let BoltWriterRunnable always tries to send message for 
higher priority. 
    In result, only Thread which runs execute() could be blocked, and it is 
unblocked as soon as pending taskids are all sent.
    
    It doesn't end up starvation because heartbeat is triggered for every 1 
sec, and taskids messages are never generated while subprocess doesn't process 
any requests.
    
    Additional points of this approach are,
    1. Subprocess can finish emitting new tuple regardless of count of added 
pending messages before adding taskids. We can reduce waiting time for taskids.
    2. It can also greatly reduce size of subprocess's pending_commands queue, 
which could reduce out of memory issue for subprocess. It is more important 
than ShellBolt's OOME because subprocess's memory issue could make machine hang 
or down.
    3. 2. also reduce wait time for processing heartbeat message, which has 
been a headache issue.
    
    I'll close this pull request and work on new approach.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

Reply via email to