[jira] [Commented] (TEZ-3958) Add internal vertex priority information into the tez dag.dot debug information

2018-08-27 Thread TezQA (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-3958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16594457#comment-16594457
 ] 

TezQA commented on TEZ-3958:


{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12937362/TEZ-3958.5.patch
  against master revision 261bbdd.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 3.0.1) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-TEZ-Build/2901//testReport/
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/2901//console

This message is automatically generated.


> Add internal vertex priority information into the tez dag.dot debug 
> information
> ---
>
> Key: TEZ-3958
> URL: https://issues.apache.org/jira/browse/TEZ-3958
> Project: Apache Tez
>  Issue Type: Improvement
>Reporter: Gopal V
>Assignee: Jaume M
>Priority: Major
> Attachments: TEZ-3958.1.patch, TEZ-3958.2.patch, TEZ-3958.3.patch, 
> TEZ-3958.4.patch, TEZ-3958.5.patch
>
>
> Adding the actual vertex priority as computed by Tez into the debug dag.dot 
> file would allows the debugging of task pre-emption issues when the DAG is no 
> longer a tree.
> There are pre-emption issues with isomerization of Tez DAGs, where the a 
> R-isomer dag with mirror rotation runs at a different speed than the L-isomer 
> dag, due to priorities at the same level changing due to the vertex-id order.
> Since the problem is hard to debug through, it would be good to record the 
> computed priority in the DAG .dot file in the logging directories.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Success: TEZ-3958 PreCommit Build #2901

2018-08-27 Thread Apache Jenkins Server
Jira: https://issues.apache.org/jira/browse/TEZ-3958
Build: https://builds.apache.org/job/PreCommit-TEZ-Build/2901/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 381.08 KB...]
[INFO] hadoop-shim-2.8  SUCCESS [  0.996 s]
[INFO] tez-dist ... SUCCESS [ 47.256 s]
[INFO] Tez 0.10.0-SNAPSHOT  SUCCESS [  0.040 s]
[INFO] 
[INFO] BUILD SUCCESS
[INFO] 
[INFO] Total time: 58:34 min
[INFO] Finished at: 2018-08-28T03:16:51Z
[INFO] 




{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12937362/TEZ-3958.5.patch
  against master revision 261bbdd.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 3.0.1) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-TEZ-Build/2901//testReport/
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/2901//console

This message is automatically generated.


==
==
Adding comment to Jira.
==
==




==
==
Finished build.
==
==


Archiving artifacts
[Fast Archiver] Compressed 4.10 MB of artifacts by 22.9% relative to #2900
[description-setter] Description set: TEZ-3958
Recording test results
Email was triggered for: Success
Sending email for trigger: Success



###
## FAILED TESTS (if any) 
##
All tests passed

[jira] [Commented] (TEZ-3980) ShuffleRunner: the wake loop needs to check for shutdown

2018-08-27 Thread Gunther Hagleitner (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-3980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16594400#comment-16594400
 ] 

Gunther Hagleitner commented on TEZ-3980:
-

+1

> ShuffleRunner: the wake loop needs to check for shutdown
> 
>
> Key: TEZ-3980
> URL: https://issues.apache.org/jira/browse/TEZ-3980
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Gopal V
>Assignee: Gopal V
>Priority: Major
> Attachments: TEZ-3980.1.patch
>
>
> In the ShuffleRunner threads, there's a loop which does not terminate if the 
> task threads get killed.
> {code}
>   while ((runningFetchers.size() >= numFetchers || 
> pendingHosts.isEmpty())
>   && numCompletedInputs.get() < numInputs) {
> inputContext.notifyProgress();
> boolean ret = wakeLoop.await(1000, TimeUnit.MILLISECONDS);
>   }
> {code}
> The wakeLoop signal does not exit this out of the loop and is missing a break 
> for shut-down.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (TEZ-3958) Add internal vertex priority information into the tez dag.dot debug information

2018-08-27 Thread Gopal V (JIRA)


 [ 
https://issues.apache.org/jira/browse/TEZ-3958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated TEZ-3958:
-
Attachment: TEZ-3958.5.patch

> Add internal vertex priority information into the tez dag.dot debug 
> information
> ---
>
> Key: TEZ-3958
> URL: https://issues.apache.org/jira/browse/TEZ-3958
> Project: Apache Tez
>  Issue Type: Improvement
>Reporter: Gopal V
>Assignee: Gopal V
>Priority: Major
> Attachments: TEZ-3958.1.patch, TEZ-3958.2.patch, TEZ-3958.3.patch, 
> TEZ-3958.4.patch, TEZ-3958.5.patch
>
>
> Adding the actual vertex priority as computed by Tez into the debug dag.dot 
> file would allows the debugging of task pre-emption issues when the DAG is no 
> longer a tree.
> There are pre-emption issues with isomerization of Tez DAGs, where the a 
> R-isomer dag with mirror rotation runs at a different speed than the L-isomer 
> dag, due to priorities at the same level changing due to the vertex-id order.
> Since the problem is hard to debug through, it would be good to record the 
> computed priority in the DAG .dot file in the logging directories.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (TEZ-3958) Add internal vertex priority information into the tez dag.dot debug information

2018-08-27 Thread Gopal V (JIRA)


 [ 
https://issues.apache.org/jira/browse/TEZ-3958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V reassigned TEZ-3958:


Assignee: Jaume M  (was: Gopal V)

> Add internal vertex priority information into the tez dag.dot debug 
> information
> ---
>
> Key: TEZ-3958
> URL: https://issues.apache.org/jira/browse/TEZ-3958
> Project: Apache Tez
>  Issue Type: Improvement
>Reporter: Gopal V
>Assignee: Jaume M
>Priority: Major
> Attachments: TEZ-3958.1.patch, TEZ-3958.2.patch, TEZ-3958.3.patch, 
> TEZ-3958.4.patch, TEZ-3958.5.patch
>
>
> Adding the actual vertex priority as computed by Tez into the debug dag.dot 
> file would allows the debugging of task pre-emption issues when the DAG is no 
> longer a tree.
> There are pre-emption issues with isomerization of Tez DAGs, where the a 
> R-isomer dag with mirror rotation runs at a different speed than the L-isomer 
> dag, due to priorities at the same level changing due to the vertex-id order.
> Since the problem is hard to debug through, it would be good to record the 
> computed priority in the DAG .dot file in the logging directories.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Success: TEZ-3985 PreCommit Build #2900

2018-08-27 Thread Apache Jenkins Server
Jira: https://issues.apache.org/jira/browse/TEZ-3985
Build: https://builds.apache.org/job/PreCommit-TEZ-Build/2900/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 382.55 KB...]
[INFO] hadoop-shim-impls .. SUCCESS [  0.041 s]
[INFO] hadoop-shim-2.8  SUCCESS [  0.943 s]
[INFO] tez-dist ... SUCCESS [ 44.763 s]
[INFO] Tez 0.10.0-SNAPSHOT  SUCCESS [  0.048 s]
[INFO] 
[INFO] BUILD SUCCESS
[INFO] 
[INFO] Total time: 58:58 min
[INFO] Finished at: 2018-08-28T01:49:42Z
[INFO] 




{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12937355/TEZ-3985.1.patch
  against master revision 261bbdd.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 3.0.1) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-TEZ-Build/2900//testReport/
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/2900//console

This message is automatically generated.


==
==
Adding comment to Jira.
==
==




==
==
Finished build.
==
==


Archiving artifacts
[description-setter] Description set: TEZ-3985
Recording test results
Email was triggered for: Success
Sending email for trigger: Success



###
## FAILED TESTS (if any) 
##
All tests passed

[jira] [Commented] (TEZ-3985) Correctness: Throw a clear exception for DMEs sent during cleanup

2018-08-27 Thread TezQA (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-3985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16594396#comment-16594396
 ] 

TezQA commented on TEZ-3985:


{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12937355/TEZ-3985.1.patch
  against master revision 261bbdd.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 3.0.1) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-TEZ-Build/2900//testReport/
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/2900//console

This message is automatically generated.


> Correctness: Throw a clear exception for DMEs sent during cleanup
> -
>
> Key: TEZ-3985
> URL: https://issues.apache.org/jira/browse/TEZ-3985
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Gopal V
>Assignee: Jaume M
>Priority: Major
> Attachments: TEZ-3985.1.patch
>
>
> If a DME is sent during cleanup, that implies that the .close() of the 
> LogicalIOProcessorRuntimeTask did not succeed and therefore these events are 
> an error condition.
> These events should not be sent and more importantly should be received by 
> the AM.
> Throw a clear exception, in case of this & allow the developers to locate the 
> extraneous event from the backtrace.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (TEZ-3958) Add internal vertex priority information into the tez dag.dot debug information

2018-08-27 Thread Gopal V (JIRA)


 [ 
https://issues.apache.org/jira/browse/TEZ-3958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V reassigned TEZ-3958:


Assignee: Gopal V  (was: Jaume M)

> Add internal vertex priority information into the tez dag.dot debug 
> information
> ---
>
> Key: TEZ-3958
> URL: https://issues.apache.org/jira/browse/TEZ-3958
> Project: Apache Tez
>  Issue Type: Improvement
>Reporter: Gopal V
>Assignee: Gopal V
>Priority: Major
> Attachments: TEZ-3958.1.patch, TEZ-3958.2.patch, TEZ-3958.3.patch, 
> TEZ-3958.4.patch
>
>
> Adding the actual vertex priority as computed by Tez into the debug dag.dot 
> file would allows the debugging of task pre-emption issues when the DAG is no 
> longer a tree.
> There are pre-emption issues with isomerization of Tez DAGs, where the a 
> R-isomer dag with mirror rotation runs at a different speed than the L-isomer 
> dag, due to priorities at the same level changing due to the vertex-id order.
> Since the problem is hard to debug through, it would be good to record the 
> computed priority in the DAG .dot file in the logging directories.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (TEZ-3985) Correctness: Throw a clear exception for DMEs sent during cleanup

2018-08-27 Thread Jaume M (JIRA)


 [ 
https://issues.apache.org/jira/browse/TEZ-3985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jaume M reassigned TEZ-3985:


Assignee: Jaume M

> Correctness: Throw a clear exception for DMEs sent during cleanup
> -
>
> Key: TEZ-3985
> URL: https://issues.apache.org/jira/browse/TEZ-3985
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Gopal V
>Assignee: Jaume M
>Priority: Major
> Attachments: TEZ-3985.1.patch
>
>
> If a DME is sent during cleanup, that implies that the .close() of the 
> LogicalIOProcessorRuntimeTask did not succeed and therefore these events are 
> an error condition.
> These events should not be sent and more importantly should be received by 
> the AM.
> Throw a clear exception, in case of this & allow the developers to locate the 
> extraneous event from the backtrace.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (TEZ-3985) Correctness: Throw a clear exception for DMEs sent during cleanup

2018-08-27 Thread Gopal V (JIRA)
Gopal V created TEZ-3985:


 Summary: Correctness: Throw a clear exception for DMEs sent during 
cleanup
 Key: TEZ-3985
 URL: https://issues.apache.org/jira/browse/TEZ-3985
 Project: Apache Tez
  Issue Type: Bug
Reporter: Gopal V


If a DME is sent during cleanup, that implies that the .close() of the 
LogicalIOProcessorRuntimeTask did not succeed and therefore these events are an 
error condition.

These events should not be sent and more importantly should be received by the 
AM.

Throw a clear exception, in case of this & allow the developers to locate the 
extraneous event from the backtrace.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (TEZ-3984) Shuffle: Out of Band DME event sending causes errors

2018-08-27 Thread Gopal V (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-3984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16594304#comment-16594304
 ] 

Gopal V commented on TEZ-3984:
--

Specific sequence of events is - input throws exception.

{code}
2018-08-27T17:25:15,579  WARN [TezTR-437616_7273_9_0_0_0 
(1520459437616_7273_9_00_00_0)] runtime.LogicalIOProcessorRuntimeTask: 
Ignoring exception when closing input calls(cleanup). Exception 
class=java.io.IOException, message ...
{code}

Output gets closed for memory recovery 

{code}
2018-08-27T17:25:15,579  INFO [TezTR-437616_7273_9_0_0_0 
(1520459437616_7273_9_00_00_0)] impl.PipelinedSorter: Reducer 2: Starting 
flush of map output
{code}

Sorter pushes event to the output context directly

{code}
2018-08-27T17:25:15,990  INFO [TezTR-437616_7273_9_0_0_0 
(1520459437616_7273_9_00_00_0)] impl.PipelinedSorter: Reducer 2: Adding 
spill event for spill (final update=true), spillId=0
{code}

And the Reducer 2 gets the event routed to it.

> Shuffle: Out of Band DME event sending causes errors
> 
>
> Key: TEZ-3984
> URL: https://issues.apache.org/jira/browse/TEZ-3984
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.8.4, 0.9.1, 0.10.0
>Reporter: Gopal V
>Priority: Critical
>  Labels: correctness
>
> In case of a task Input throwing an exception, the outputs are also closed in 
> the LogicalIOProcessorRuntimeTask.cleanup().
> Cleanup ignore all the events returned by output close, however if any output 
> tries to send an event out of band by directly calling 
> outputContext.sendEvents(events), then those events can reach the AM before 
> the task failure is reported.
> This can cause correctness issues with shuffle since zero sized events can be 
> sent out due to an input failure and downstream tasks may never reattempt a 
> fetch from the valid attempt.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (TEZ-3984) Shuffle: Out of Band DME event sending causes errors

2018-08-27 Thread Gopal V (JIRA)


 [ 
https://issues.apache.org/jira/browse/TEZ-3984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated TEZ-3984:
-
Labels: correctness  (was: )

> Shuffle: Out of Band DME event sending causes errors
> 
>
> Key: TEZ-3984
> URL: https://issues.apache.org/jira/browse/TEZ-3984
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.8.4, 0.9.1, 0.10.0
>Reporter: Gopal V
>Priority: Critical
>  Labels: correctness
>
> In case of a task Input throwing an exception, the outputs are also closed in 
> the LogicalIOProcessorRuntimeTask.cleanup().
> Cleanup ignore all the events returned by output close, however if any output 
> tries to send an event out of band by directly calling 
> outputContext.sendEvents(events), then those events can reach the AM before 
> the task failure is reported.
> This can cause correctness issues with shuffle since zero sized events can be 
> sent out due to an input failure and downstream tasks may never reattempt a 
> fetch from the valid attempt.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (TEZ-3984) Shuffle: Out of Band DME event sending causes errors

2018-08-27 Thread Gopal V (JIRA)
Gopal V created TEZ-3984:


 Summary: Shuffle: Out of Band DME event sending causes errors
 Key: TEZ-3984
 URL: https://issues.apache.org/jira/browse/TEZ-3984
 Project: Apache Tez
  Issue Type: Bug
Affects Versions: 0.9.1, 0.8.4, 0.10.0
Reporter: Gopal V


In case of a task Input throwing an exception, the outputs are also closed in 
the LogicalIOProcessorRuntimeTask.cleanup().

Cleanup ignore all the events returned by output close, however if any output 
tries to send an event out of band by directly calling 
outputContext.sendEvents(events), then those events can reach the AM before the 
task failure is reported.

This can cause correctness issues with shuffle since zero sized events can be 
sent out due to an input failure and downstream tasks may never reattempt a 
fetch from the valid attempt.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)