[
https://issues.apache.org/jira/browse/OOZIE-3612?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
yuanyimeng updated OOZIE-3612:
------------------------------
Description:
This should rarely happens but it just happened in out environment twice. The
process is listed below.
# we have long hang actions which last serveral hours (This is and error
sitution, task execute quickly in normal time). the action type is developed by
ourself by extend the Executor.
# These actions's ActionCheckXCommand is put into the callable queue by
composited into an array with max ten element
# Before really put into the thread pool for execute, they will be filtered if
element is already existed , the existence is identified by the
uniqueCallables. So the real CompositeCallable in queue may actually have
element less then 10.
# When this CompositeCallable is poll form queue. Before execute it will check
the concurreny for the action_check type. if the concurrency is reached, it
will be requeued.
# In the requeue procedure , the ActionCheckService happens to already put
these ActionCheckXCommand in queue, so the CompositeCallable is filtered with 0
element.
# In the finally procedure, the CallableEnd method will need to descrease the
counter of this type . But IndexOutOfBounds happens when called on the empty
CompositeCallable's getType method, which cause the counter will never be
descreased
# If it happens for the maxComcurrency time, the counter itself will exceed
the maxConcurrency and this type can never be taked outside of the queue, they
will lived in queue forever, which cause the workflow hang.
The pic where the exception happened is attached. Hopes it describe clearly.
was:
This should rarely happens but it just happened in out environment twice. The
process is listed below.
# we have long hang actions which last serveral hours (This is and error
sitution, task execute quickly in normal time). the action type is developed by
ourself by extend the Executor.
# These actions's ActionCheckXCommand is put into the callable queue by
composited into an array with max ten element
# Before really put into the thread pool for execute, they will be filtered if
element is already existed , the existence is identified by the
uniqueCallables. So the real CompositeCallable in queue may actually have
element less then 10.
# When this CompositeCallable is poll form queue. Before execute it will check
the concurreny for the action_check type. if the concurrency is reached, it
will be requeued.
# In the requeue procedure , the ActionCheckService happens to already put
these ActionCheckXCommand in queue, so the CompositeCallable is filtered with 0
element.
# In the finally procedure, the CallableEnd method will need to descrease the
counter of this type . But IndexOutOfBounds happens when called on the empty
CompositeCallable's getType method, which cause the counter will never be
descreased
# If it happens for the maxComcurrency time, the counter itself will exceed
the maxConcurrency and this type can never be taked outside of the queue, they
will lived in queue forever, which cause the workflow hang.
The pic where the exception happened is attached. Hopes it describe it
clearly.
> CompositeCallable became empty after requeue and cause
> IndexOutOfBoundException in getType() method, which cause counters for this
> type in activeCallables never be descreased and exceed concurrency
> -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
> Key: OOZIE-3612
> URL: https://issues.apache.org/jira/browse/OOZIE-3612
> Project: Oozie
> Issue Type: Bug
> Components: action, workflow
> Affects Versions: 4.2.0
> Environment: We use oozie 4.2.0. but in the newest release this
> problems seems also exists
> Reporter: yuanyimeng
> Priority: Minor
> Attachments: callableRun.png,
> counternotdescreaedifgettypeexceptionhappen.png,
> eception_in_composite_callable.png, exceptionlog.png
>
>
> This should rarely happens but it just happened in out environment twice. The
> process is listed below.
> # we have long hang actions which last serveral hours (This is and error
> sitution, task execute quickly in normal time). the action type is developed
> by ourself by extend the Executor.
> # These actions's ActionCheckXCommand is put into the callable queue by
> composited into an array with max ten element
> # Before really put into the thread pool for execute, they will be filtered
> if element is already existed , the existence is identified by the
> uniqueCallables. So the real CompositeCallable in queue may actually have
> element less then 10.
> # When this CompositeCallable is poll form queue. Before execute it will
> check the concurreny for the action_check type. if the concurrency is
> reached, it will be requeued.
> # In the requeue procedure , the ActionCheckService happens to already put
> these ActionCheckXCommand in queue, so the CompositeCallable is filtered with
> 0 element.
> # In the finally procedure, the CallableEnd method will need to descrease
> the counter of this type . But IndexOutOfBounds happens when called on the
> empty CompositeCallable's getType method, which cause the counter will never
> be descreased
> # If it happens for the maxComcurrency time, the counter itself will exceed
> the maxConcurrency and this type can never be taked outside of the queue,
> they will lived in queue forever, which cause the workflow hang.
>
> The pic where the exception happened is attached. Hopes it describe clearly.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)