[
https://issues.apache.org/jira/browse/STORM-3602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ethan Li updated STORM-3602:
----------------------------
Affects Version/s: 2.0.0
2.1.0
> loadaware shuffle can overload local worker
> -------------------------------------------
>
> Key: STORM-3602
> URL: https://issues.apache.org/jira/browse/STORM-3602
> Project: Apache Storm
> Issue Type: Bug
> Affects Versions: 2.0.0, 2.1.0
> Reporter: Aaron Gresch
> Assignee: Aaron Gresch
> Priority: Major
> Labels: pull-request-available
> Fix For: 2.2.0, 2.1.1
>
> Time Spent: 1h
> Remaining Estimate: 0h
>
> We were seeing a worker overloaded and tuples timing out with loadaware
> shuffle enabled. From investigating, we found that the code allows switching
> from Host local to Worker local if the load average is lower than the low
> water mark. It really should be checking the load on the worker instead.
>
> What's happening is the worker is overloaded with tons of idle host local
> tasks, so it switches to HOST_LOCAL. Then the calculation across all the
> host tasks is below the low water mark and it immediately switches back to
> the overloaded worker local task.
>
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)