[ https://issues.apache.org/jira/browse/TEZ-4103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ahmed Hussein updated TEZ-4103: ------------------------------- Attachment: TEZ-4103.005.patch > Progress in DAG, Vertex, and tasks is incorrect > ----------------------------------------------- > > Key: TEZ-4103 > URL: https://issues.apache.org/jira/browse/TEZ-4103 > Project: Apache Tez > Issue Type: Bug > Reporter: Ahmed Hussein > Assignee: Ahmed Hussein > Priority: Major > Attachments: TEZ-4103.001.patch, TEZ-4103.002.patch, > TEZ-4103.003.patch, TEZ-4103.004.patch, TEZ-4103.005.patch > > > Looking at the progress code, there some few issues that could lead to some > problems calculating the progress. > There are some cases when the progress never reach 1.0. > This is a list of issues that need to be fixed in the progress code: > * After TEZ-3982, since values are skipped in the In some cases, the > progress of DAG or a vertex may never reach 1.0f. this is in both > "{{DAGImpl.java}}" and "{{ProgressHelper.java}}" > * {{ProgressHelper}} schedules a service to update the progress, dubbed > `{{ProgressHelper.monitorProgress}}`. According to Java Documentation: > {quote}If any execution of the task encounters an exception, > subsequent executions are suppressed. > Otherwise, the task will only terminate via cancellation > or termination of the executor. > {quote} > In other words, if the service dies, there is no way to catch that in the > code and the progress will never be updated. > * The `{{SimpleProcessor.inputMap}}` is not thread-safe. They are > initialized as `{{LinkedHashMap}}` and there is no synchronization on the > field objects in the map. This could be problematic in concurrent context. > * `{{VertexImpl.getProgress()}}` does not check the range of the progress > calculated in `{{VertexImpl.computeProgress()}}` > -- This message was sent by Atlassian Jira (v8.3.4#803005)