[ 
https://issues.apache.org/jira/browse/TEZ-4103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16986418#comment-16986418
 ] 

Jonathan Turner Eagles commented on TEZ-4103:
---------------------------------------------

I think we need a few improvements before this patch is ready.

I can see this patch goes through great effort to centralize logging into the 
ProgressHelper. However, it adds IMHO unnecessarily complex code by using 
lambdas in log statements as well as separates the condition checking and 
logging from its origin of error. Unless I'm missing the necessity, I think 
this code becomes much simpler with a simple if isDebugEnabled() check followed 
by parameterized LOG.debug statement. Once this is done we can remove the 
logDebug helper methods.

I also wondered about the thread monitoring. Can you help me to understand why 
a catch (Throwable) wasn't sufficient. As per 
https://stackoverflow.com/a/24902026. Seems like (though I am not positive) we 
have created a thread to monitor the other thread.

Functionally, it isn't incorrect to use a LogicalInput that isn't 
AbstractLogicalInput. While I like logging the non-compliant class as 
speculative execution is very limited in that scenario, is it too excessive to 
log that condition every time?

> Progress in DAG, Vertex, and tasks is incorrect
> -----------------------------------------------
>
>                 Key: TEZ-4103
>                 URL: https://issues.apache.org/jira/browse/TEZ-4103
>             Project: Apache Tez
>          Issue Type: Bug
>            Reporter: Ahmed Hussein
>            Assignee: Ahmed Hussein
>            Priority: Major
>         Attachments: TEZ-4103.001.patch, TEZ-4103.002.patch, 
> TEZ-4103.003.patch
>
>
> Looking at the progress code, there some few issues that could lead to some 
> problems calculating the progress.
>  There are some cases when the progress never reach 1.0.
>  This is a list of issues that need to be fixed in the progress code:
>  * After TEZ-3982, since values are skipped in the In some cases, the 
> progress of DAG or a vertex may never reach 1.0f. this is in both 
> "{{DAGImpl.java}}" and "{{ProgressHelper.java}}"
>  * {{ProgressHelper}} schedules a service to update the progress, dubbed 
> `{{ProgressHelper.monitorProgress}}`. According to Java Documentation:
> {quote}If any execution of the task encounters an exception,
>  subsequent executions are suppressed.
>  Otherwise, the task will only terminate via cancellation
>  or termination of the executor.
> {quote}
> In other words, if the service dies, there is no way to catch that in the 
> code and the progress will never be updated.
>  * The `{{SimpleProcessor.inputMap}}` is not thread-safe. They are 
> initialized as `{{LinkedHashMap}}` and there is no synchronization on the 
> field objects in the map. This could be problematic in concurrent context.
>  * `{{VertexImpl.getProgress()}}` does not check the range of the progress 
> calculated in `{{VertexImpl.computeProgress()}}`
>   



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to