[
https://issues.apache.org/jira/browse/STORM-756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15020184#comment-15020184
]
ASF GitHub Bot commented on STORM-756:
--------------------------------------
Github user HeartSaVioR commented on the pull request:
https://github.com/apache/storm/pull/532#issuecomment-158578419
@revans2
Sorry to respond late.
I agree that we can set default value of TOPOLOGY_SHELLBOLT_MAX_PENDING to
reasonable value (not off) when deadlock bug you stated is fixed. Then
ShellBolt could play well with ABP.
I'm curious that ShellBolt could play with ABP when
TOPOLOGY_SHELLBOLT_MAX_PENDING is off cause ShellBolt always tries to buffer
pending writes ASAP regardless of pressure of subprocess.
In order to resolve deadlock, I think we can have another queue which only
stores pending messages for taskid which is unbounded (not affected to
TOPOLOGY_SHELLBOLT_MAX_PENDING).
And we introduce priorities for kinds of ShellBolt message, heartbeat >
taskid > process, and let BoltWriterRunnable always tries to send message for
higher priority.
In result, only Thread which runs execute() could be blocked, and it is
unblocked as soon as pending taskids are all sent.
It doesn't end up starvation because heartbeat is triggered for every 1
sec, and taskids messages are never generated while subprocess doesn't process
any requests.
Additional points of this approach are,
1. Subprocess can finish emitting new tuple regardless of count of added
pending messages before adding taskids. We can reduce waiting time for taskids.
2. It can also greatly reduce size of subprocess's pending_commands queue,
which could reduce out of memory issue for subprocess. It is more important
than ShellBolt's OOME because subprocess's memory issue could make machine hang
or down.
3. 2. also reduce wait time for processing heartbeat message, which has
been a headache issue.
I'll close this pull request and work on new approach.
> [multilang] Introduce overflow control mechanism
> ------------------------------------------------
>
> Key: STORM-756
> URL: https://issues.apache.org/jira/browse/STORM-756
> Project: Apache Storm
> Issue Type: Improvement
> Components: storm-multilang
> Affects Versions: 0.10.0, 0.9.4, 0.11.0
> Reporter: Jungtaek Lim
> Assignee: Jungtaek Lim
>
> It's from STORM-738,
> https://issues.apache.org/jira/browse/STORM-738?focusedCommentId=14394106&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14394106
> A. ShellBolt side control
> We can modify ShellBolt to have sent tuple ids list, and stop sending tuples
> when list exceeds configured max value. In order to achieve this, subprocess
> should notify "tuple id is complete" to ShellBolt.
> * It introduces new commands for multi-lang, "proceed" (or better name)
> * ShellBolt stores in-progress-of-processing tuples list.
> * Its overhead could be big, subprocess should always notify to ShellBolt
> when any tuples are processed.
> B. subprocess side control
> We can modify subprocess to check pending queue after reading tuple.
> If it exceeds configured max value, subprocess can request "delay" to
> ShellBolt for slowing down.
> When ShellBolt receives "delay", BoltWriterRunnable should stop polling
> pending queue and continue polling later.
> How long ShellBolt wait for resending? Its unit would be "delay time" or
> "tuple count". I don't know which is better yet.
> * It introduces new commands for multi-lang, "delay" (or better name)
> * I don't think it would be introduced soon, but subprocess can request delay
> based on own statistics. (ex. pending tuple count * average tuple processed
> time for time unit, average pending tuple count for count unit)
> ** We can leave when and how much to request "delay" to user. User can make
> his/her own algorithm to control flooding.
> In my opinion B seems to more natural cause current issue is by subprocess
> side so it would be better to let subprocess overcome it.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)