GitHub user narendly opened a pull request:

    https://github.com/apache/helix/pull/275

    PR

    

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/narendly/helix master

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/helix/pull/275.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #275
    
----
commit e7b960c22896c08337292d20f674f20a7f1391d0
Author: Hunter Lee <hulee@...>
Date:   2018-10-27T01:32:16Z

    [HELIX-762] TASK: Change LOG mode from info to debug
    
    In production, it was observed that some users were running thousands of 
tasks, and since AssignableInstance leaves a line of log for each task assigned 
or released, the amount of log that was being generated was too much, and it 
was too verbose.
    Changelist:
    1. Change the logging mode from info to debug in AssignableInstance and 
AssignableInstanceManager

commit e492d9f663d8edad0f344208cc8affc6828708a3
Author: Hunter Lee <hulee@...>
Date:   2018-10-27T01:49:52Z

    [HELIX-763] Task:Ignore tasks whose workflow and job are inactive
    
    It was discovered that by manual testing, there were task states in INIT 
and RUNNING, and they were occupying a thread count even though their parent 
job or workflow was in an inactive state (terminal or stopped). This was 
happening when the capacities were being rebuilt from scratch, which could have 
caused a thread leak.
    Changelist:
    1. Add a check in buildAssignableInstances() so that it ignores workflows 
and jobs whose states are inactive states (that is, their tasks cannot be 
occupying a thread on Participants)

commit d33d9efea25fe9d2bbbb9e84a4ce7614b544ef2d
Author: Hunter Lee <hulee@...>
Date:   2018-10-27T02:03:47Z

    [HELIX-764] TASK: Fix LiveInstanceCurrentState change flag
    
    Previously, existsLiveInstanceOrCurrentStateChange was getting reset in 
ClusterDataCache when its getter was called. This was problematic because if 
there were multiple jobs or multiple workflows, whoever calls this getter would 
get the correct flag value, and the ensuing callers would get a false because 
the flag would have been reset. This RB fixes that bug by reseting the flat 
right in the beginning of refresh() call in ClusterDataCache, which allows all 
callers during that pipeline would get the same, correct value.
    Changelist:
    1. Change the getter so that it does not reset the flag; instead, reset the 
flag in the beginning of refresh()

commit 930a4b7ae7eb63be0a751a593ba630ae55fb2cfb
Author: Hunter Lee <hulee@...>
Date:   2018-10-27T02:06:42Z

    [HELIX-765] TASK: Build quota profile from scratch every rebalance
    
    It has been reported that instances have a full quota despite no tasks 
existing in their CURRENTSTATES. The cause of this is not clear, so making 
ClusterDataCache trigger a refresh of all AssignableInstances will ensure that 
there aren't situations where it looks like there has been a thread leak. 
Optimizations will be implemented if necessary.
    Changelist:
    1. Make AssignableInstanceManager build all AssignableInstances from 
scratch every rebalance

commit 5033785c231af363953367f65f77513911b753f5
Author: Hunter Lee <hulee@...>
Date:   2018-10-27T02:08:02Z

    [HELIX-766] TASK: Add logging functionality in AssignableInstanceManager
    
    In order to debug task-related inquiries and issues, we realized that it 
would be very helpful if we logged there was a log recording the current quota 
capacity of all AssignableInstances. This is for cases where we see jobs whose 
tasks are not getting assigned so that we could quickly rule out the 
possibility of bugs in quota-based scheduling.
    Changelist:
        1. Add a method that logs current quota profile in a JSON format with 
an option flag of only displaying when there are quota types whose capacities 
are full
        2. Add info logs in AssignableInstanceManager

----


---

Reply via email to