GitHub user narendly opened a pull request:
https://github.com/apache/helix/pull/288
PR
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/narendly/helix master
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/helix/pull/288.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #288
----
commit 3844ad60034b029f3bbd916f629a7969117c1b26
Author: narendly <narendly@...>
Date: 2018-11-01T23:54:48Z
[HELIX-782] TASK: Make TaskDriver use ZKClient's create when creating
workflows
TaskDriver should use create() but currently is using set(), which just
overwrites ZNodes that are in ZK. This is undesirable and we need to fix it,
especially in the wake of ZNode restructuring.
AC:
1. Make TaskDriver use create() instead of set()
2. Add an integration test:
TestWorkflowCreation:testWorkflowCreationNoDuplicates()
commit 3d9c03064a5c26a9ed9ad674567674f2d9eca160
Author: narendly <narendly@...>
Date: 2018-11-01T23:55:59Z
TASK: Fix JobQueue's job state-related bug
The bug was observed in
TestTaskRebalancerStopResume:stopAndResumeNamedQueue(), which was being
unstable.
It was observed that for JobQueues with multiple jobs, the second job would
get marked as IN_PROGRESS even though the first job hadn't completed/failed,
especially when the queue was being stopped and resumed. This was due to a bug
in getIncompleteJobCount() because it was not counting jobs in STOPPING state.
This was fixed and another check was added right before JobDispatcher marks a
job as STOPPED so that it would not mark it STOPPED if the job state is
NOT_STARTED.
Changelist:
1. Fix getIncompleteJobCount()
2. Add a check so that we don't mark NOT_STARTED jobs as STOPPED
commit befb1036f8d8be2729a800d3dde88fc1362a6489
Author: narendly <narendly@...>
Date: 2018-11-01T23:57:33Z
[HELIX-784] TASK: Fix a bug in getExpiredJobs
getExpiredJobs(), when the job config is null, would just continue instead
of adding it to expiredJobs so that the job cleanup/purge would be re-tried.
This could possibly cause purge failures to leave a lot of jobs un-purged with
just the job config missing in ZK. This RB fixes this.
Changelist:
1. Add the job name to expiredJobs if the job config does not exist in ZK
2. Add a more detailed description in the error log
3. Add an integration test for two task-related stages:
TaskPersistDataStage and TaskGarbageCollectionStage in TestTaskStage.java
----
---