[jira] [Commented] (TEZ-4103) Progress in DAG, Vertex, and tasks is incorrect

2019-12-03 Thread Jonathan Turner Eagles (Jira)


[ 
https://issues.apache.org/jira/browse/TEZ-4103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16987349#comment-16987349
 ] 

Jonathan Turner Eagles commented on TEZ-4103:
-

Couple of very minor misspellings and word clarity
* periodicMonitoTaskRef should be periodicMonitorTaskRef
* inpuProgress should be inputProgress
* Vertex Progress should be Vertex progress
* "correct range" might be more clear as "valid range"

There were a number of functions in ProgressHelper that were moved as well as 
changes that weren't necessary but it's probably better to clean them up as you 
did.

Let changes below debug lines to parameterized LOG.debug lines instead.
http://www.slf4j.org/faq.html#logging_performance
{code}
+  LOG.debug(String.format(
+  "Incorrect progress in progress helper in processor=%s, "
+  + "inpuProgress=%f, inputsSize=%d, invalidInput=%d",
+  processorName, inputProgress, inputs.size(),
+  invalidSnapshot));

+LOG.debug(String.format("Vertex Progress not in correct range: "
++ "v=%s, progress=%f", v.getName(), vertexProgress));

+  LOG.debug(String.format(
+  "vertex=%s, task=%s: Progress not in correct range=%f",
+  getName(), task.getTaskId().toString(), taskProg));
+}
{code}

As far as design goes, there are a couple of things.
1) DAGImpl now validates that all VertexImpl progress values are in the valid 
range. The change also validates the VetexImpl inputs so that only valid 
progress values are returned in getProgress. So we are checking the validity 
twice.
2) TaskAttemptImpl.getProgress() is the input to the speculator and if 
Processor uses ProgressHelper then it is protected, if not, unfiltered data is 
still sent to the speculator. Does it make more sense to move all or part of 
the range change to TaskAttemptImpl?
3) Logging is defaulted to debug. Does it make more sense to always log the 
first invalid progress or perhaps every X invalid values? Especially useful to 
track down buggy ReaderProgress.

 One 
 pre-existing issue that seems wrong to me is that progress is modified with 
only a read lock instead of with a write lock (write locks are automatically 
retrieved during state machine transitions).



> Progress in DAG, Vertex, and tasks is incorrect
> ---
>
> Key: TEZ-4103
> URL: https://issues.apache.org/jira/browse/TEZ-4103
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Ahmed Hussein
>Assignee: Ahmed Hussein
>Priority: Major
> Attachments: TEZ-4103.001.patch, TEZ-4103.002.patch, 
> TEZ-4103.003.patch, TEZ-4103.004.patch
>
>
> Looking at the progress code, there some few issues that could lead to some 
> problems calculating the progress.
>  There are some cases when the progress never reach 1.0.
>  This is a list of issues that need to be fixed in the progress code:
>  * After TEZ-3982, since values are skipped in the In some cases, the 
> progress of DAG or a vertex may never reach 1.0f. this is in both 
> "{{DAGImpl.java}}" and "{{ProgressHelper.java}}"
>  * {{ProgressHelper}} schedules a service to update the progress, dubbed 
> `{{ProgressHelper.monitorProgress}}`. According to Java Documentation:
> {quote}If any execution of the task encounters an exception,
>  subsequent executions are suppressed.
>  Otherwise, the task will only terminate via cancellation
>  or termination of the executor.
> {quote}
> In other words, if the service dies, there is no way to catch that in the 
> code and the progress will never be updated.
>  * The `{{SimpleProcessor.inputMap}}` is not thread-safe. They are 
> initialized as `{{LinkedHashMap}}` and there is no synchronization on the 
> field objects in the map. This could be problematic in concurrent context.
>  * `{{VertexImpl.getProgress()}}` does not check the range of the progress 
> calculated in `{{VertexImpl.computeProgress()}}`
>   



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (TEZ-4103) Progress in DAG, Vertex, and tasks is incorrect

2019-12-03 Thread TezQA (Jira)


[ 
https://issues.apache.org/jira/browse/TEZ-4103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16987042#comment-16987042
 ] 

TezQA commented on TEZ-4103:


| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 10m  
8s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
21s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  3m 
41s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
18s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
19s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
15s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
25s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
21s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
19s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
18s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
18s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
23s{color} | {color:green} tez-api: The patch generated 0 new + 2 unchanged - 5 
fixed = 2 total (was 7) {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
21s{color} | {color:green} The patch passed checkstyle in tez-runtime-internals 
{color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
36s{color} | {color:green} tez-dag: The patch generated 0 new + 379 unchanged - 
1 fixed = 379 total (was 380) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
37s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
26s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  1m 
50s{color} | {color:green} tez-api in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
48s{color} | {color:green} tez-runtime-internals in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  3m 
50s{color} | {color:green} tez-dag in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
59s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 38m  4s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.5 Server=19.03.5 Image:yetus/tez:d4a62deee |
| JIRA Issue | TEZ-4103 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12987381/TEZ-4103.004.patch |
| Optional Tests |  dupname  asflicense  javac  javadoc  unit  findbugs  
checkstyle  compile  |
| uname | Linux 9f1f451382b0 4.15.0-60-generic #67-Ubuntu SMP Thu Aug 22 
16:55:30 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | master / 1af0897 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_222 |
| findbugs | v3.0.1 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-TEZ-Build/209/testReport/ |
| Max. process+thread count | 230 (vs. ulimit of 5500) |
| modules | C: tez-api tez-runtime-internals tez-dag U: . |
| Console output | 
https://builds.apache.org/job/PreCommit-TEZ-Build/209/console |
| 

[jira] [Updated] (TEZ-4103) Progress in DAG, Vertex, and tasks is incorrect

2019-12-03 Thread Ahmed Hussein (Jira)


 [ 
https://issues.apache.org/jira/browse/TEZ-4103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmed Hussein updated TEZ-4103:
---
Attachment: TEZ-4103.004.patch

> Progress in DAG, Vertex, and tasks is incorrect
> ---
>
> Key: TEZ-4103
> URL: https://issues.apache.org/jira/browse/TEZ-4103
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Ahmed Hussein
>Assignee: Ahmed Hussein
>Priority: Major
> Attachments: TEZ-4103.001.patch, TEZ-4103.002.patch, 
> TEZ-4103.003.patch, TEZ-4103.004.patch
>
>
> Looking at the progress code, there some few issues that could lead to some 
> problems calculating the progress.
>  There are some cases when the progress never reach 1.0.
>  This is a list of issues that need to be fixed in the progress code:
>  * After TEZ-3982, since values are skipped in the In some cases, the 
> progress of DAG or a vertex may never reach 1.0f. this is in both 
> "{{DAGImpl.java}}" and "{{ProgressHelper.java}}"
>  * {{ProgressHelper}} schedules a service to update the progress, dubbed 
> `{{ProgressHelper.monitorProgress}}`. According to Java Documentation:
> {quote}If any execution of the task encounters an exception,
>  subsequent executions are suppressed.
>  Otherwise, the task will only terminate via cancellation
>  or termination of the executor.
> {quote}
> In other words, if the service dies, there is no way to catch that in the 
> code and the progress will never be updated.
>  * The `{{SimpleProcessor.inputMap}}` is not thread-safe. They are 
> initialized as `{{LinkedHashMap}}` and there is no synchronization on the 
> field objects in the map. This could be problematic in concurrent context.
>  * `{{VertexImpl.getProgress()}}` does not check the range of the progress 
> calculated in `{{VertexImpl.computeProgress()}}`
>   



--
This message was sent by Atlassian Jira
(v8.3.4#803005)