[jira] [Assigned] (TEZ-3846) Tez AM may not clean up properly on an internal error

2017-09-28 Thread Zhiyuan Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-3846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhiyuan Yang reassigned TEZ-3846:
-

Assignee: Zhiyuan Yang

> Tez AM may not clean up properly on an internal error
> -
>
> Key: TEZ-3846
> URL: https://issues.apache.org/jira/browse/TEZ-3846
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Zhiyuan Yang
>
> Normally, in Hive we blindly reopen the session on any submit error; however 
> I accidentally broke that, and while investigating noticed a new error before 
> reopen that claims that session where a DAG has failed is still running a 
> DAG. Looks like it should either clean up, or if we assume OOM is not 
> clean-up-able, die completely.
> {noformat}
> 2017-09-28T01:07:12,352  INFO [3d4e3f44-40c5-431a-b3de-801d60c1c579 main] 
> client.TezClient: Submitted dag to TezSession, 
> sessionName=HIVE-35a0e5c9-ce27-4b27-824c-ce9bc0fe104d, 
> applicationId=application_1506585924598_0001, 
> dagId=dag_1506585924598_0001_53, dagName=SELECT count(1) FROM (
> ...
> 2017-09-28T01:07:25,787 ERROR [3d4e3f44-40c5-431a-b3de-801d60c1c579 main] 
> SessionState: Status: Failed
> 2017-09-28T01:07:25,787 ERROR [3d4e3f44-40c5-431a-b3de-801d60c1c579 main] 
> SessionState: Vertex failed, vertexName=Map 61, 
> vertexId=vertex_1506585924598_0001_53_01, diagnostics=[Vertex 
> vertex_1506585924598_0001_53_01 [Map 61] killed/failed due 
> to:ROOT_INPUT_INIT_FAILURE, Vertex Input: src initializer failed, 
> vertex=vertex_1506585924598_0001_53_01 [Map 61], java.lang.OutOfMemoryError: 
> GC overhead limit exceeded
> 2017-09-28T01:07:25,787 ERROR [3d4e3f44-40c5-431a-b3de-801d60c1c579 main] 
> SessionState: Invalid event V_INTERNAL_ERROR on Vertex 
> vertex_1506585924598_0001_53_00 [Map 60]
> 2017-09-28T01:07:25,787 DEBUG [3d4e3f44-40c5-431a-b3de-801d60c1c579 main] 
> log.PerfLogger:  end=1506586045787 duration=13435 
> from=org.apache.hadoop.hive.ql.exec.tez.monitoring.TezJobMonitor>
> ... [reuse]
> 2017-09-28T01:07:28,459  INFO [11108166-069e-43d7-9e21-25b9214d01a4 main] 
> client.TezClient: Submitting dag to TezSession, 
> sessionName=HIVE-35a0e5c9-ce27-4b27-824c-ce9bc0fe104d, 
> applicationId=application_1506585924598_0001, dagName=insert overwrite table 
> orc_ppd_staging s...s(Stage-1), callerContext={ context=HIVE, 
> callerType=HIVE_QUERY_ID, 
> callerId=hiveptest_20170928010728_58f19d98-85da-4fad-83a7-7bf3aa0252a7 }
> 2017-09-28T01:07:35,259  INFO [11108166-069e-43d7-9e21-25b9214d01a4 main] 
> exec.Task: Dag submit failed due to App master already running a DAG
> {noformat}
> Session continues living and failing like that multiple times.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (TEZ-3846) Tez AM may not clean up properly on an internal error

2017-09-28 Thread Zhiyuan Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16185082#comment-16185082
 ] 

Zhiyuan Yang commented on TEZ-3846:
---

I'll take a look soon.

> Tez AM may not clean up properly on an internal error
> -
>
> Key: TEZ-3846
> URL: https://issues.apache.org/jira/browse/TEZ-3846
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Zhiyuan Yang
>
> Normally, in Hive we blindly reopen the session on any submit error; however 
> I accidentally broke that, and while investigating noticed a new error before 
> reopen that claims that session where a DAG has failed is still running a 
> DAG. Looks like it should either clean up, or if we assume OOM is not 
> clean-up-able, die completely.
> {noformat}
> 2017-09-28T01:07:12,352  INFO [3d4e3f44-40c5-431a-b3de-801d60c1c579 main] 
> client.TezClient: Submitted dag to TezSession, 
> sessionName=HIVE-35a0e5c9-ce27-4b27-824c-ce9bc0fe104d, 
> applicationId=application_1506585924598_0001, 
> dagId=dag_1506585924598_0001_53, dagName=SELECT count(1) FROM (
> ...
> 2017-09-28T01:07:25,787 ERROR [3d4e3f44-40c5-431a-b3de-801d60c1c579 main] 
> SessionState: Status: Failed
> 2017-09-28T01:07:25,787 ERROR [3d4e3f44-40c5-431a-b3de-801d60c1c579 main] 
> SessionState: Vertex failed, vertexName=Map 61, 
> vertexId=vertex_1506585924598_0001_53_01, diagnostics=[Vertex 
> vertex_1506585924598_0001_53_01 [Map 61] killed/failed due 
> to:ROOT_INPUT_INIT_FAILURE, Vertex Input: src initializer failed, 
> vertex=vertex_1506585924598_0001_53_01 [Map 61], java.lang.OutOfMemoryError: 
> GC overhead limit exceeded
> 2017-09-28T01:07:25,787 ERROR [3d4e3f44-40c5-431a-b3de-801d60c1c579 main] 
> SessionState: Invalid event V_INTERNAL_ERROR on Vertex 
> vertex_1506585924598_0001_53_00 [Map 60]
> 2017-09-28T01:07:25,787 DEBUG [3d4e3f44-40c5-431a-b3de-801d60c1c579 main] 
> log.PerfLogger:  end=1506586045787 duration=13435 
> from=org.apache.hadoop.hive.ql.exec.tez.monitoring.TezJobMonitor>
> ... [reuse]
> 2017-09-28T01:07:28,459  INFO [11108166-069e-43d7-9e21-25b9214d01a4 main] 
> client.TezClient: Submitting dag to TezSession, 
> sessionName=HIVE-35a0e5c9-ce27-4b27-824c-ce9bc0fe104d, 
> applicationId=application_1506585924598_0001, dagName=insert overwrite table 
> orc_ppd_staging s...s(Stage-1), callerContext={ context=HIVE, 
> callerType=HIVE_QUERY_ID, 
> callerId=hiveptest_20170928010728_58f19d98-85da-4fad-83a7-7bf3aa0252a7 }
> 2017-09-28T01:07:35,259  INFO [11108166-069e-43d7-9e21-25b9214d01a4 main] 
> exec.Task: Dag submit failed due to App master already running a DAG
> {noformat}
> Session continues living and failing like that multiple times.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (TEZ-3846) Tez AM may not clean up properly on an internal error

2017-09-28 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-3846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated TEZ-3846:
--
Summary: Tez AM may not clean up properly on an internal error  (was: Tez 
session may not clean up on internal error)

> Tez AM may not clean up properly on an internal error
> -
>
> Key: TEZ-3846
> URL: https://issues.apache.org/jira/browse/TEZ-3846
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>
> Normally, in Hive we blindly reopen the session on any error; however I 
> accidentally broke that, and while investigating noticed a new error before 
> reopen that claims that session where a DAG has failed is still running a 
> DAG. Looks like it should either clean up, or if we assume OOM is not 
> clean-up-able, die completely.
> {noformat}
> 2017-09-28T01:07:12,352  INFO [3d4e3f44-40c5-431a-b3de-801d60c1c579 main] 
> client.TezClient: Submitted dag to TezSession, 
> sessionName=HIVE-35a0e5c9-ce27-4b27-824c-ce9bc0fe104d, 
> applicationId=application_1506585924598_0001, 
> dagId=dag_1506585924598_0001_53, dagName=SELECT count(1) FROM (
> ...
> 2017-09-28T01:07:25,787 ERROR [3d4e3f44-40c5-431a-b3de-801d60c1c579 main] 
> SessionState: Status: Failed
> 2017-09-28T01:07:25,787 ERROR [3d4e3f44-40c5-431a-b3de-801d60c1c579 main] 
> SessionState: Vertex failed, vertexName=Map 61, 
> vertexId=vertex_1506585924598_0001_53_01, diagnostics=[Vertex 
> vertex_1506585924598_0001_53_01 [Map 61] killed/failed due 
> to:ROOT_INPUT_INIT_FAILURE, Vertex Input: src initializer failed, 
> vertex=vertex_1506585924598_0001_53_01 [Map 61], java.lang.OutOfMemoryError: 
> GC overhead limit exceeded
> 2017-09-28T01:07:25,787 ERROR [3d4e3f44-40c5-431a-b3de-801d60c1c579 main] 
> SessionState: Invalid event V_INTERNAL_ERROR on Vertex 
> vertex_1506585924598_0001_53_00 [Map 60]
> 2017-09-28T01:07:25,787 DEBUG [3d4e3f44-40c5-431a-b3de-801d60c1c579 main] 
> log.PerfLogger:  end=1506586045787 duration=13435 
> from=org.apache.hadoop.hive.ql.exec.tez.monitoring.TezJobMonitor>
> ... [reuse]
> 2017-09-28T01:07:28,459  INFO [11108166-069e-43d7-9e21-25b9214d01a4 main] 
> client.TezClient: Submitting dag to TezSession, 
> sessionName=HIVE-35a0e5c9-ce27-4b27-824c-ce9bc0fe104d, 
> applicationId=application_1506585924598_0001, dagName=insert overwrite table 
> orc_ppd_staging s...s(Stage-1), callerContext={ context=HIVE, 
> callerType=HIVE_QUERY_ID, 
> callerId=hiveptest_20170928010728_58f19d98-85da-4fad-83a7-7bf3aa0252a7 }
> 2017-09-28T01:07:35,259  INFO [11108166-069e-43d7-9e21-25b9214d01a4 main] 
> exec.Task: Dag submit failed due to App master already running a DAG
> {noformat}
> Session continues living and failing like that multiple times.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (TEZ-3846) Tez session may not clean up on internal error

2017-09-28 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16185046#comment-16185046
 ] 

Sergey Shelukhin commented on TEZ-3846:
---

cc [~aplusplus] [~sseth]

> Tez session may not clean up on internal error
> --
>
> Key: TEZ-3846
> URL: https://issues.apache.org/jira/browse/TEZ-3846
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>
> Normally, in Hive we blindly reopen the session on any error; however I 
> accidentally broke that, and while investigating noticed a new error before 
> reopen that claims that session where a DAG has failed is still running a 
> DAG. Looks like it should either clean up, or if we assume OOM is not 
> clean-up-able, die completely.
> {noformat}
> 2017-09-28T01:07:12,352  INFO [3d4e3f44-40c5-431a-b3de-801d60c1c579 main] 
> client.TezClient: Submitted dag to TezSession, 
> sessionName=HIVE-35a0e5c9-ce27-4b27-824c-ce9bc0fe104d, 
> applicationId=application_1506585924598_0001, 
> dagId=dag_1506585924598_0001_53, dagName=SELECT count(1) FROM (
> ...
> 2017-09-28T01:07:25,787 ERROR [3d4e3f44-40c5-431a-b3de-801d60c1c579 main] 
> SessionState: Status: Failed
> 2017-09-28T01:07:25,787 ERROR [3d4e3f44-40c5-431a-b3de-801d60c1c579 main] 
> SessionState: Vertex failed, vertexName=Map 61, 
> vertexId=vertex_1506585924598_0001_53_01, diagnostics=[Vertex 
> vertex_1506585924598_0001_53_01 [Map 61] killed/failed due 
> to:ROOT_INPUT_INIT_FAILURE, Vertex Input: src initializer failed, 
> vertex=vertex_1506585924598_0001_53_01 [Map 61], java.lang.OutOfMemoryError: 
> GC overhead limit exceeded
> 2017-09-28T01:07:25,787 ERROR [3d4e3f44-40c5-431a-b3de-801d60c1c579 main] 
> SessionState: Invalid event V_INTERNAL_ERROR on Vertex 
> vertex_1506585924598_0001_53_00 [Map 60]
> 2017-09-28T01:07:25,787 DEBUG [3d4e3f44-40c5-431a-b3de-801d60c1c579 main] 
> log.PerfLogger:  end=1506586045787 duration=13435 
> from=org.apache.hadoop.hive.ql.exec.tez.monitoring.TezJobMonitor>
> ... [reuse]
> 2017-09-28T01:07:28,459  INFO [11108166-069e-43d7-9e21-25b9214d01a4 main] 
> client.TezClient: Submitting dag to TezSession, 
> sessionName=HIVE-35a0e5c9-ce27-4b27-824c-ce9bc0fe104d, 
> applicationId=application_1506585924598_0001, dagName=insert overwrite table 
> orc_ppd_staging s...s(Stage-1), callerContext={ context=HIVE, 
> callerType=HIVE_QUERY_ID, 
> callerId=hiveptest_20170928010728_58f19d98-85da-4fad-83a7-7bf3aa0252a7 }
> 2017-09-28T01:07:35,259  INFO [11108166-069e-43d7-9e21-25b9214d01a4 main] 
> exec.Task: Dag submit failed due to App master already running a DAG
> {noformat}
> Session continues living and failing like that multiple times.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (TEZ-3846) Tez AM may not clean up properly on an internal error

2017-09-28 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-3846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated TEZ-3846:
--
Description: 
Normally, in Hive we blindly reopen the session on any submit error; however I 
accidentally broke that, and while investigating noticed a new error before 
reopen that claims that session where a DAG has failed is still running a DAG. 
Looks like it should either clean up, or if we assume OOM is not clean-up-able, 
die completely.
{noformat}
2017-09-28T01:07:12,352  INFO [3d4e3f44-40c5-431a-b3de-801d60c1c579 main] 
client.TezClient: Submitted dag to TezSession, 
sessionName=HIVE-35a0e5c9-ce27-4b27-824c-ce9bc0fe104d, 
applicationId=application_1506585924598_0001, dagId=dag_1506585924598_0001_53, 
dagName=SELECT count(1) FROM (
...
2017-09-28T01:07:25,787 ERROR [3d4e3f44-40c5-431a-b3de-801d60c1c579 main] 
SessionState: Status: Failed
2017-09-28T01:07:25,787 ERROR [3d4e3f44-40c5-431a-b3de-801d60c1c579 main] 
SessionState: Vertex failed, vertexName=Map 61, 
vertexId=vertex_1506585924598_0001_53_01, diagnostics=[Vertex 
vertex_1506585924598_0001_53_01 [Map 61] killed/failed due 
to:ROOT_INPUT_INIT_FAILURE, Vertex Input: src initializer failed, 
vertex=vertex_1506585924598_0001_53_01 [Map 61], java.lang.OutOfMemoryError: GC 
overhead limit exceeded
2017-09-28T01:07:25,787 ERROR [3d4e3f44-40c5-431a-b3de-801d60c1c579 main] 
SessionState: Invalid event V_INTERNAL_ERROR on Vertex 
vertex_1506585924598_0001_53_00 [Map 60]
2017-09-28T01:07:25,787 DEBUG [3d4e3f44-40c5-431a-b3de-801d60c1c579 main] 
log.PerfLogger: 
... [reuse]
2017-09-28T01:07:28,459  INFO [11108166-069e-43d7-9e21-25b9214d01a4 main] 
client.TezClient: Submitting dag to TezSession, 
sessionName=HIVE-35a0e5c9-ce27-4b27-824c-ce9bc0fe104d, 
applicationId=application_1506585924598_0001, dagName=insert overwrite table 
orc_ppd_staging s...s(Stage-1), callerContext={ context=HIVE, 
callerType=HIVE_QUERY_ID, 
callerId=hiveptest_20170928010728_58f19d98-85da-4fad-83a7-7bf3aa0252a7 }
2017-09-28T01:07:35,259  INFO [11108166-069e-43d7-9e21-25b9214d01a4 main] 
exec.Task: Dag submit failed due to App master already running a DAG
{noformat}
Session continues living and failing like that multiple times.

  was:
Normally, in Hive we blindly reopen the session on any error; however I 
accidentally broke that, and while investigating noticed a new error before 
reopen that claims that session where a DAG has failed is still running a DAG. 
Looks like it should either clean up, or if we assume OOM is not clean-up-able, 
die completely.
{noformat}
2017-09-28T01:07:12,352  INFO [3d4e3f44-40c5-431a-b3de-801d60c1c579 main] 
client.TezClient: Submitted dag to TezSession, 
sessionName=HIVE-35a0e5c9-ce27-4b27-824c-ce9bc0fe104d, 
applicationId=application_1506585924598_0001, dagId=dag_1506585924598_0001_53, 
dagName=SELECT count(1) FROM (
...
2017-09-28T01:07:25,787 ERROR [3d4e3f44-40c5-431a-b3de-801d60c1c579 main] 
SessionState: Status: Failed
2017-09-28T01:07:25,787 ERROR [3d4e3f44-40c5-431a-b3de-801d60c1c579 main] 
SessionState: Vertex failed, vertexName=Map 61, 
vertexId=vertex_1506585924598_0001_53_01, diagnostics=[Vertex 
vertex_1506585924598_0001_53_01 [Map 61] killed/failed due 
to:ROOT_INPUT_INIT_FAILURE, Vertex Input: src initializer failed, 
vertex=vertex_1506585924598_0001_53_01 [Map 61], java.lang.OutOfMemoryError: GC 
overhead limit exceeded
2017-09-28T01:07:25,787 ERROR [3d4e3f44-40c5-431a-b3de-801d60c1c579 main] 
SessionState: Invalid event V_INTERNAL_ERROR on Vertex 
vertex_1506585924598_0001_53_00 [Map 60]
2017-09-28T01:07:25,787 DEBUG [3d4e3f44-40c5-431a-b3de-801d60c1c579 main] 
log.PerfLogger: 
... [reuse]
2017-09-28T01:07:28,459  INFO [11108166-069e-43d7-9e21-25b9214d01a4 main] 
client.TezClient: Submitting dag to TezSession, 
sessionName=HIVE-35a0e5c9-ce27-4b27-824c-ce9bc0fe104d, 
applicationId=application_1506585924598_0001, dagName=insert overwrite table 
orc_ppd_staging s...s(Stage-1), callerContext={ context=HIVE, 
callerType=HIVE_QUERY_ID, 
callerId=hiveptest_20170928010728_58f19d98-85da-4fad-83a7-7bf3aa0252a7 }
2017-09-28T01:07:35,259  INFO [11108166-069e-43d7-9e21-25b9214d01a4 main] 
exec.Task: Dag submit failed due to App master already running a DAG
{noformat}
Session continues living and failing like that multiple times.


> Tez AM may not clean up properly on an internal error
> -
>
> Key: TEZ-3846
> URL: https://issues.apache.org/jira/browse/TEZ-3846
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>
> Normally, in Hive we blindly reopen the session on any submit error; however 
> I accidentally broke that, and while investigating noticed a new error before 
> reopen that claims that session where a DAG has failed is still running a 
> DAG. Looks like it should either clean up, or if we assume OOM is 

[jira] [Commented] (TEZ-3845) Tez UI Cleanup Stats Table

2017-09-28 Thread Kuhu Shukla (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16184921#comment-16184921
 ] 

Kuhu Shukla commented on TEZ-3845:
--

Looks good to me. +1. Thanks [~jeagles] for reporting the issue and the patch.

> Tez UI Cleanup Stats Table
> --
>
> Key: TEZ-3845
> URL: https://issues.apache.org/jira/browse/TEZ-3845
> Project: Apache Tez
>  Issue Type: Bug
>  Components: UI
>Reporter: Jonathan Eagles
>Assignee: Jonathan Eagles
> Attachments: after_stats.png, before_stats.png, TEZ-3845.001.patch
>
>
> Removed redundant status (for example: Succeeded Tasks: 10 Succeeded)
> Made total tasks links
> Added killed/failed task attempts available on the dag/index/ page
> Reordered Stats to be consistent across all pages.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (TEZ-3845) Tez UI Cleanup Stats Table

2017-09-28 Thread Kuhu Shukla (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-3845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kuhu Shukla resolved TEZ-3845.
--
   Resolution: Fixed
Fix Version/s: 0.9.1

Committed to master branch.

> Tez UI Cleanup Stats Table
> --
>
> Key: TEZ-3845
> URL: https://issues.apache.org/jira/browse/TEZ-3845
> Project: Apache Tez
>  Issue Type: Bug
>  Components: UI
>Reporter: Jonathan Eagles
>Assignee: Jonathan Eagles
> Fix For: 0.9.1
>
> Attachments: after_stats.png, before_stats.png, TEZ-3845.001.patch
>
>
> Removed redundant status (for example: Succeeded Tasks: 10 Succeeded)
> Made total tasks links
> Added killed/failed task attempts available on the dag/index/ page
> Reordered Stats to be consistent across all pages.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (TEZ-3845) Tez UI Cleanup Stats Table

2017-09-28 Thread Jonathan Eagles (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-3845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Eagles updated TEZ-3845:
-
Attachment: TEZ-3845.001.patch

> Tez UI Cleanup Stats Table
> --
>
> Key: TEZ-3845
> URL: https://issues.apache.org/jira/browse/TEZ-3845
> Project: Apache Tez
>  Issue Type: Bug
>  Components: UI
>Reporter: Jonathan Eagles
>Assignee: Jonathan Eagles
> Attachments: after_stats.png, before_stats.png, TEZ-3845.001.patch
>
>
> Removed redundant status (for example: Succeeded Tasks: 10 Succeeded)
> Made total tasks links
> Added killed/failed task attempts available on the dag/index/ page
> Reordered Stats to be consistent across all pages.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (TEZ-3845) Tez UI Cleanup Stats Table

2017-09-28 Thread Jonathan Eagles (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16184889#comment-16184889
 ] 

Jonathan Eagles commented on TEZ-3845:
--

!before_stats.png!
!after_stats.png!

> Tez UI Cleanup Stats Table
> --
>
> Key: TEZ-3845
> URL: https://issues.apache.org/jira/browse/TEZ-3845
> Project: Apache Tez
>  Issue Type: Bug
>  Components: UI
>Reporter: Jonathan Eagles
>Assignee: Jonathan Eagles
> Attachments: after_stats.png, before_stats.png
>
>
> Removed redundant status (for example: Succeeded Tasks: 10 Succeeded)
> Made total tasks links
> Added killed/failed task attempts available on the dag/index/ page
> Reordered Stats to be consistent across all pages.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (TEZ-3845) Tez UI Cleanup Stats Table

2017-09-28 Thread Jonathan Eagles (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-3845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Eagles updated TEZ-3845:
-
Attachment: after_stats.png
before_stats.png

> Tez UI Cleanup Stats Table
> --
>
> Key: TEZ-3845
> URL: https://issues.apache.org/jira/browse/TEZ-3845
> Project: Apache Tez
>  Issue Type: Bug
>  Components: UI
>Reporter: Jonathan Eagles
>Assignee: Jonathan Eagles
> Attachments: after_stats.png, before_stats.png
>
>
> Removed redundant status (for example: Succeeded Tasks: 10 Succeeded)
> Made total tasks links
> Added killed/failed task attempts available on the dag/index/ page
> Reordered Stats to be consistent across all pages.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (TEZ-3833) Tasks should report codec errors during shuffle as fetch failures

2017-09-28 Thread Kuhu Shukla (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-3833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kuhu Shukla updated TEZ-3833:
-
Attachment: TEZ-3833.005.patch

Thanks [~jlowe] for the helpful feedback. I have removed the check in the 
updated patch.

> Tasks should report codec errors during shuffle as fetch failures
> -
>
> Key: TEZ-3833
> URL: https://issues.apache.org/jira/browse/TEZ-3833
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
> Attachments: TEZ-3833.001.patch, TEZ-3833.002.patch, 
> TEZ-3833.003.patch, TEZ-3833.004.patch, TEZ-3833.005.patch
>
>
> Do the equivalent of https://issues.apache.org/jira/browse/MAPREDUCE-6633 so 
> that compression errors do not prove fatal for the DAG/tasks.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (TEZ-3845) Tez UI Cleanup Stats Table

2017-09-28 Thread Jonathan Eagles (JIRA)
Jonathan Eagles created TEZ-3845:


 Summary: Tez UI Cleanup Stats Table
 Key: TEZ-3845
 URL: https://issues.apache.org/jira/browse/TEZ-3845
 Project: Apache Tez
  Issue Type: Bug
  Components: UI
Reporter: Jonathan Eagles
Assignee: Jonathan Eagles


Removed redundant status (for example: Succeeded Tasks: 10 Succeeded)
Made total tasks links
Added killed/failed task attempts available on the dag/index/ page
Reordered Stats to be consistent across all pages.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (TEZ-3833) Tasks should report codec errors during shuffle as fetch failures

2017-09-28 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16184873#comment-16184873
 ] 

Jason Lowe commented on TEZ-3833:
-

I'm still confused why we want to treat InternalErrors like timeouts.  Seems 
like we will do some bad things in some cases if we do.  For example if we are 
trying to fetch 5 maps from a node and get an InternalError then we should 
blame the current map not all 5 maps, whereas if we are getting a connection 
timeout then we do want to associate that failure to connect with all 5 maps.

Therefore I think we simply need to remove the instanceof check for 
InternalError.  That will cause them to be treated like a regular I/O error 
which seems more appropriate.


> Tasks should report codec errors during shuffle as fetch failures
> -
>
> Key: TEZ-3833
> URL: https://issues.apache.org/jira/browse/TEZ-3833
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
> Attachments: TEZ-3833.001.patch, TEZ-3833.002.patch, 
> TEZ-3833.003.patch, TEZ-3833.004.patch
>
>
> Do the equivalent of https://issues.apache.org/jira/browse/MAPREDUCE-6633 so 
> that compression errors do not prove fatal for the DAG/tasks.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (TEZ-3841) Proposal: Simulator mode

2017-09-28 Thread Solal Pirelli (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16184729#comment-16184729
 ] 

Solal Pirelli commented on TEZ-3841:


I'll add the Apache licenses later - weird though that the bot only points out 
2 of the >2 files I added (none of which have the header).

Since I attached the same patch twice (I forgot `--no-prefix` in the first 
one), but it resulted in two unrelated test failures, I guess those tests are 
flaky?

> Proposal: Simulator mode
> 
>
> Key: TEZ-3841
> URL: https://issues.apache.org/jira/browse/TEZ-3841
> Project: Apache Tez
>  Issue Type: New Feature
>Reporter: Solal Pirelli
> Attachments: TEZ-3841.patch
>
>
> Early work on a new feature proposal: a "simulator" mode in which vertices 
> are not actually executed, but instead use a simplified "fake" processor 
> (which is configurable, and by default does nothing) to let a developer see 
> how certain workloads will be handled.
> For instance, one might want to check what happens if a vertex has each of 
> its 1000 tasks send a bunch of events - does this scale? Or, what if a 
> specific vertex fails 2% of the time - how does this impact overall graph 
> execution? Are 2 nodes with 10 containers per node enough, or should one 
> invest in a third node?
> My current implementation is pretty simple: mimic the "uber" stuff to add a 
> new "fake" mode with a custom task scheduler and container launcher. It adds 
> the following configuration values:
> * Boolean to enable fake mode
> * Number of nodes in fake mode
> * Number of containers per mode in fake mode
> * Class to run in fake mode - must inherit a new class `FakeProcessor`, with 
> a single method `run` that takes the vertex name, task index and task 
> attempt, and returns a list of events. Throwing an exception causes the task 
> to fail.
> I'm currently working on adding a "chaos monkey" kind of service which 
> randomly kills tasks, pre-empts containers, etc., but would appreciate some 
> feedback on what's already done first. :)
> P.S.: I have zero experience with using JIRA or contributing to Apache 
> projects; if there is a more formal procedure for suggesting a new feature, 
> please point me to it.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (TEZ-3841) Proposal: Simulator mode

2017-09-28 Thread TezQA (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16184706#comment-16184706
 ] 

TezQA commented on TEZ-3841:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12889545/TEZ-3841.patch
  against master revision bc08b19.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 3.0.1) warnings.

{color:red}-1 release audit{color}.  The applied patch generated 2 
release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in :
   org.apache.tez.analyzer.TestAnalyzer

Test results: 
https://builds.apache.org/job/PreCommit-TEZ-Build/2647//testReport/
Release audit warnings: 
https://builds.apache.org/job/PreCommit-TEZ-Build/2647//artifact/patchprocess/patchReleaseAuditProblems.txt
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/2647//console

This message is automatically generated.

> Proposal: Simulator mode
> 
>
> Key: TEZ-3841
> URL: https://issues.apache.org/jira/browse/TEZ-3841
> Project: Apache Tez
>  Issue Type: New Feature
>Reporter: Solal Pirelli
> Attachments: TEZ-3841.patch
>
>
> Early work on a new feature proposal: a "simulator" mode in which vertices 
> are not actually executed, but instead use a simplified "fake" processor 
> (which is configurable, and by default does nothing) to let a developer see 
> how certain workloads will be handled.
> For instance, one might want to check what happens if a vertex has each of 
> its 1000 tasks send a bunch of events - does this scale? Or, what if a 
> specific vertex fails 2% of the time - how does this impact overall graph 
> execution? Are 2 nodes with 10 containers per node enough, or should one 
> invest in a third node?
> My current implementation is pretty simple: mimic the "uber" stuff to add a 
> new "fake" mode with a custom task scheduler and container launcher. It adds 
> the following configuration values:
> * Boolean to enable fake mode
> * Number of nodes in fake mode
> * Number of containers per mode in fake mode
> * Class to run in fake mode - must inherit a new class `FakeProcessor`, with 
> a single method `run` that takes the vertex name, task index and task 
> attempt, and returns a list of events. Throwing an exception causes the task 
> to fail.
> I'm currently working on adding a "chaos monkey" kind of service which 
> randomly kills tasks, pre-empts containers, etc., but would appreciate some 
> feedback on what's already done first. :)
> P.S.: I have zero experience with using JIRA or contributing to Apache 
> projects; if there is a more formal procedure for suggesting a new feature, 
> please point me to it.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


Failed: TEZ-3841 PreCommit Build #2647

2017-09-28 Thread Apache Jenkins Server
Jira: https://issues.apache.org/jira/browse/TEZ-3841
Build: https://builds.apache.org/job/PreCommit-TEZ-Build/2647/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 332.54 KB...]
[ERROR] [Help 2] 
http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException
[ERROR] 
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR]   mvn  -rf :tez-job-analyzer
[INFO] Build failures were ignored.




{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12889545/TEZ-3841.patch
  against master revision bc08b19.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 3.0.1) warnings.

{color:red}-1 release audit{color}.  The applied patch generated 2 
release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in :
   org.apache.tez.analyzer.TestAnalyzer

Test results: 
https://builds.apache.org/job/PreCommit-TEZ-Build/2647//testReport/
Release audit warnings: 
https://builds.apache.org/job/PreCommit-TEZ-Build/2647//artifact/patchprocess/patchReleaseAuditProblems.txt
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/2647//console

This message is automatically generated.


==
==
Adding comment to Jira.
==
==


Comment added.
afbd3f559b4d6dd5480e5f18767504be710ba0fe logged out


==
==
Finished build.
==
==


Build step 'Execute shell' marked build as failure
Archiving artifacts
[description-setter] Could not determine description.
Recording test results
Email was triggered for: Failure - Any
Sending email for trigger: Failure - Any



###
## FAILED TESTS (if any) 
##
1 tests failed.
FAILED:  org.apache.tez.analyzer.TestAnalyzer.testWithATS

Error Message:
null

Stack Trace:
java.lang.AssertionError: null
at org.junit.Assert.fail(Assert.java:86)
at org.junit.Assert.assertTrue(Assert.java:41)
at org.junit.Assert.assertTrue(Assert.java:52)
at 
org.apache.tez.analyzer.TestAnalyzer.getDagInfo(TestAnalyzer.java:264)
at org.apache.tez.analyzer.TestAnalyzer.verify(TestAnalyzer.java:251)
at org.apache.tez.analyzer.TestAnalyzer.runTests(TestAnalyzer.java:390)
at 
org.apache.tez.analyzer.TestAnalyzer.testWithATS(TestAnalyzer.java:354)

[jira] [Updated] (TEZ-3252) [Umbrella] Enable support for Hadoop-3.x

2017-09-28 Thread Jonathan Eagles (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-3252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Eagles updated TEZ-3252:
-
Fix Version/s: (was: 0.9.0)
   0.9.1

> [Umbrella] Enable support for Hadoop-3.x 
> -
>
> Key: TEZ-3252
> URL: https://issues.apache.org/jira/browse/TEZ-3252
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Hitesh Shah
> Fix For: 0.9.1
>
> Attachments: TEZ-3252.patch
>
>
> Placeholder umbrella to track the various issues/tasks discovered to get full 
> stable functionality against hadoop-3.x once it is released in a stable form. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (TEZ-3843) Tez UI Vertex/Tasks log links for running tasks are missing

2017-09-28 Thread Jonathan Eagles (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-3843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Eagles resolved TEZ-3843.
--
   Resolution: Fixed
Fix Version/s: 0.9.1

> Tez UI Vertex/Tasks log links for running tasks are missing
> ---
>
> Key: TEZ-3843
> URL: https://issues.apache.org/jira/browse/TEZ-3843
> Project: Apache Tez
>  Issue Type: Bug
>  Components: UI
>Reporter: Jonathan Eagles
>Assignee: Jonathan Eagles
> Fix For: 0.9.1
>
> Attachments: relatedentities.png, TEZ-3843.001.patch
>
>
> task serialization mistakenly getting list of attempts under 
> otherinfo.relatedentities. relatedentities is a top level property are 
> serialization should reflect this.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (TEZ-3841) Proposal: Simulator mode

2017-09-28 Thread TezQA (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16184638#comment-16184638
 ] 

TezQA commented on TEZ-3841:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12889536/TEZ-3841.patch
  against master revision bc08b19.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 3.0.1) warnings.

{color:red}-1 release audit{color}.  The applied patch generated 2 
release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in :
   org.apache.tez.test.TestFaultTolerance

Test results: 
https://builds.apache.org/job/PreCommit-TEZ-Build/2646//testReport/
Release audit warnings: 
https://builds.apache.org/job/PreCommit-TEZ-Build/2646//artifact/patchprocess/patchReleaseAuditProblems.txt
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/2646//console

This message is automatically generated.

> Proposal: Simulator mode
> 
>
> Key: TEZ-3841
> URL: https://issues.apache.org/jira/browse/TEZ-3841
> Project: Apache Tez
>  Issue Type: New Feature
>Reporter: Solal Pirelli
> Attachments: TEZ-3841.patch
>
>
> Early work on a new feature proposal: a "simulator" mode in which vertices 
> are not actually executed, but instead use a simplified "fake" processor 
> (which is configurable, and by default does nothing) to let a developer see 
> how certain workloads will be handled.
> For instance, one might want to check what happens if a vertex has each of 
> its 1000 tasks send a bunch of events - does this scale? Or, what if a 
> specific vertex fails 2% of the time - how does this impact overall graph 
> execution? Are 2 nodes with 10 containers per node enough, or should one 
> invest in a third node?
> My current implementation is pretty simple: mimic the "uber" stuff to add a 
> new "fake" mode with a custom task scheduler and container launcher. It adds 
> the following configuration values:
> * Boolean to enable fake mode
> * Number of nodes in fake mode
> * Number of containers per mode in fake mode
> * Class to run in fake mode - must inherit a new class `FakeProcessor`, with 
> a single method `run` that takes the vertex name, task index and task 
> attempt, and returns a list of events. Throwing an exception causes the task 
> to fail.
> I'm currently working on adding a "chaos monkey" kind of service which 
> randomly kills tasks, pre-empts containers, etc., but would appreciate some 
> feedback on what's already done first. :)
> P.S.: I have zero experience with using JIRA or contributing to Apache 
> projects; if there is a more formal procedure for suggesting a new feature, 
> please point me to it.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


Failed: TEZ-3841 PreCommit Build #2646

2017-09-28 Thread Apache Jenkins Server
Jira: https://issues.apache.org/jira/browse/TEZ-3841
Build: https://builds.apache.org/job/PreCommit-TEZ-Build/2646/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 332.76 KB...]
[ERROR] [Help 2] 
http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException
[ERROR] 
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR]   mvn  -rf :tez-tests
[INFO] Build failures were ignored.




{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12889536/TEZ-3841.patch
  against master revision bc08b19.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 3.0.1) warnings.

{color:red}-1 release audit{color}.  The applied patch generated 2 
release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in :
   org.apache.tez.test.TestFaultTolerance

Test results: 
https://builds.apache.org/job/PreCommit-TEZ-Build/2646//testReport/
Release audit warnings: 
https://builds.apache.org/job/PreCommit-TEZ-Build/2646//artifact/patchprocess/patchReleaseAuditProblems.txt
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/2646//console

This message is automatically generated.


==
==
Adding comment to Jira.
==
==


Comment added.
1b7bcaf96ee9b35cbbfb5ea6c0946bf7218278e7 logged out


==
==
Finished build.
==
==


Build step 'Execute shell' marked build as failure
Archiving artifacts
[description-setter] Could not determine description.
Recording test results
Email was triggered for: Failure - Any
Sending email for trigger: Failure - Any



###
## FAILED TESTS (if any) 
##
1 tests failed.
FAILED:  org.apache.tez.test.TestFaultTolerance.testBasicInputFailureWithoutExit

Error Message:
expected: but was:

Stack Trace:
java.lang.AssertionError: expected: but was:
at org.junit.Assert.fail(Assert.java:88)
at org.junit.Assert.failNotEquals(Assert.java:743)
at org.junit.Assert.assertEquals(Assert.java:118)
at org.junit.Assert.assertEquals(Assert.java:144)
at 
org.apache.tez.test.TestFaultTolerance.runDAGAndVerify(TestFaultTolerance.java:159)
at 
org.apache.tez.test.TestFaultTolerance.runDAGAndVerify(TestFaultTolerance.java:142)
at 
org.apache.tez.test.TestFaultTolerance.runDAGAndVerify(TestFaultTolerance.java:138)
at 
org.apache.tez.test.TestFaultTolerance.testBasicInputFailureWithoutExit(TestFaultTolerance.java:334)

[jira] [Updated] (TEZ-3841) Proposal: Simulator mode

2017-09-28 Thread Solal Pirelli (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-3841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Solal Pirelli updated TEZ-3841:
---
Attachment: (was: TEZ-3841.patch)

> Proposal: Simulator mode
> 
>
> Key: TEZ-3841
> URL: https://issues.apache.org/jira/browse/TEZ-3841
> Project: Apache Tez
>  Issue Type: New Feature
>Reporter: Solal Pirelli
> Attachments: TEZ-3841.patch
>
>
> Early work on a new feature proposal: a "simulator" mode in which vertices 
> are not actually executed, but instead use a simplified "fake" processor 
> (which is configurable, and by default does nothing) to let a developer see 
> how certain workloads will be handled.
> For instance, one might want to check what happens if a vertex has each of 
> its 1000 tasks send a bunch of events - does this scale? Or, what if a 
> specific vertex fails 2% of the time - how does this impact overall graph 
> execution? Are 2 nodes with 10 containers per node enough, or should one 
> invest in a third node?
> My current implementation is pretty simple: mimic the "uber" stuff to add a 
> new "fake" mode with a custom task scheduler and container launcher. It adds 
> the following configuration values:
> * Boolean to enable fake mode
> * Number of nodes in fake mode
> * Number of containers per mode in fake mode
> * Class to run in fake mode - must inherit a new class `FakeProcessor`, with 
> a single method `run` that takes the vertex name, task index and task 
> attempt, and returns a list of events. Throwing an exception causes the task 
> to fail.
> I'm currently working on adding a "chaos monkey" kind of service which 
> randomly kills tasks, pre-empts containers, etc., but would appreciate some 
> feedback on what's already done first. :)
> P.S.: I have zero experience with using JIRA or contributing to Apache 
> projects; if there is a more formal procedure for suggesting a new feature, 
> please point me to it.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (TEZ-3841) Proposal: Simulator mode

2017-09-28 Thread Solal Pirelli (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-3841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Solal Pirelli updated TEZ-3841:
---
Attachment: TEZ-3841.patch

> Proposal: Simulator mode
> 
>
> Key: TEZ-3841
> URL: https://issues.apache.org/jira/browse/TEZ-3841
> Project: Apache Tez
>  Issue Type: New Feature
>Reporter: Solal Pirelli
> Attachments: TEZ-3841.patch
>
>
> Early work on a new feature proposal: a "simulator" mode in which vertices 
> are not actually executed, but instead use a simplified "fake" processor 
> (which is configurable, and by default does nothing) to let a developer see 
> how certain workloads will be handled.
> For instance, one might want to check what happens if a vertex has each of 
> its 1000 tasks send a bunch of events - does this scale? Or, what if a 
> specific vertex fails 2% of the time - how does this impact overall graph 
> execution? Are 2 nodes with 10 containers per node enough, or should one 
> invest in a third node?
> My current implementation is pretty simple: mimic the "uber" stuff to add a 
> new "fake" mode with a custom task scheduler and container launcher. It adds 
> the following configuration values:
> * Boolean to enable fake mode
> * Number of nodes in fake mode
> * Number of containers per mode in fake mode
> * Class to run in fake mode - must inherit a new class `FakeProcessor`, with 
> a single method `run` that takes the vertex name, task index and task 
> attempt, and returns a list of events. Throwing an exception causes the task 
> to fail.
> I'm currently working on adding a "chaos monkey" kind of service which 
> randomly kills tasks, pre-empts containers, etc., but would appreciate some 
> feedback on what's already done first. :)
> P.S.: I have zero experience with using JIRA or contributing to Apache 
> projects; if there is a more formal procedure for suggesting a new feature, 
> please point me to it.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (TEZ-3841) Proposal: Simulator mode

2017-09-28 Thread Solal Pirelli (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-3841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Solal Pirelli updated TEZ-3841:
---
Attachment: TEZ-3841.patch

> Proposal: Simulator mode
> 
>
> Key: TEZ-3841
> URL: https://issues.apache.org/jira/browse/TEZ-3841
> Project: Apache Tez
>  Issue Type: New Feature
>Reporter: Solal Pirelli
> Attachments: TEZ-3841.patch
>
>
> Early work on a new feature proposal: a "simulator" mode in which vertices 
> are not actually executed, but instead use a simplified "fake" processor 
> (which is configurable, and by default does nothing) to let a developer see 
> how certain workloads will be handled.
> For instance, one might want to check what happens if a vertex has each of 
> its 1000 tasks send a bunch of events - does this scale? Or, what if a 
> specific vertex fails 2% of the time - how does this impact overall graph 
> execution? Are 2 nodes with 10 containers per node enough, or should one 
> invest in a third node?
> My current implementation is pretty simple: mimic the "uber" stuff to add a 
> new "fake" mode with a custom task scheduler and container launcher. It adds 
> the following configuration values:
> * Boolean to enable fake mode
> * Number of nodes in fake mode
> * Number of containers per mode in fake mode
> * Class to run in fake mode - must inherit a new class `FakeProcessor`, with 
> a single method `run` that takes the vertex name, task index and task 
> attempt, and returns a list of events. Throwing an exception causes the task 
> to fail.
> I'm currently working on adding a "chaos monkey" kind of service which 
> randomly kills tasks, pre-empts containers, etc., but would appreciate some 
> feedback on what's already done first. :)
> P.S.: I have zero experience with using JIRA or contributing to Apache 
> projects; if there is a more formal procedure for suggesting a new feature, 
> please point me to it.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (TEZ-3841) Proposal: Simulator mode

2017-09-28 Thread Solal Pirelli (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-3841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Solal Pirelli updated TEZ-3841:
---
Attachment: (was: tez-fake-mode.v2.patch)

> Proposal: Simulator mode
> 
>
> Key: TEZ-3841
> URL: https://issues.apache.org/jira/browse/TEZ-3841
> Project: Apache Tez
>  Issue Type: New Feature
>Reporter: Solal Pirelli
>
> Early work on a new feature proposal: a "simulator" mode in which vertices 
> are not actually executed, but instead use a simplified "fake" processor 
> (which is configurable, and by default does nothing) to let a developer see 
> how certain workloads will be handled.
> For instance, one might want to check what happens if a vertex has each of 
> its 1000 tasks send a bunch of events - does this scale? Or, what if a 
> specific vertex fails 2% of the time - how does this impact overall graph 
> execution? Are 2 nodes with 10 containers per node enough, or should one 
> invest in a third node?
> My current implementation is pretty simple: mimic the "uber" stuff to add a 
> new "fake" mode with a custom task scheduler and container launcher. It adds 
> the following configuration values:
> * Boolean to enable fake mode
> * Number of nodes in fake mode
> * Number of containers per mode in fake mode
> * Class to run in fake mode - must inherit a new class `FakeProcessor`, with 
> a single method `run` that takes the vertex name, task index and task 
> attempt, and returns a list of events. Throwing an exception causes the task 
> to fail.
> I'm currently working on adding a "chaos monkey" kind of service which 
> randomly kills tasks, pre-empts containers, etc., but would appreciate some 
> feedback on what's already done first. :)
> P.S.: I have zero experience with using JIRA or contributing to Apache 
> projects; if there is a more formal procedure for suggesting a new feature, 
> please point me to it.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (TEZ-3841) Proposal: Simulator mode

2017-09-28 Thread Solal Pirelli (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-3841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Solal Pirelli updated TEZ-3841:
---
Attachment: (was: tez-fake-mode.patch)

> Proposal: Simulator mode
> 
>
> Key: TEZ-3841
> URL: https://issues.apache.org/jira/browse/TEZ-3841
> Project: Apache Tez
>  Issue Type: New Feature
>Reporter: Solal Pirelli
>
> Early work on a new feature proposal: a "simulator" mode in which vertices 
> are not actually executed, but instead use a simplified "fake" processor 
> (which is configurable, and by default does nothing) to let a developer see 
> how certain workloads will be handled.
> For instance, one might want to check what happens if a vertex has each of 
> its 1000 tasks send a bunch of events - does this scale? Or, what if a 
> specific vertex fails 2% of the time - how does this impact overall graph 
> execution? Are 2 nodes with 10 containers per node enough, or should one 
> invest in a third node?
> My current implementation is pretty simple: mimic the "uber" stuff to add a 
> new "fake" mode with a custom task scheduler and container launcher. It adds 
> the following configuration values:
> * Boolean to enable fake mode
> * Number of nodes in fake mode
> * Number of containers per mode in fake mode
> * Class to run in fake mode - must inherit a new class `FakeProcessor`, with 
> a single method `run` that takes the vertex name, task index and task 
> attempt, and returns a list of events. Throwing an exception causes the task 
> to fail.
> I'm currently working on adding a "chaos monkey" kind of service which 
> randomly kills tasks, pre-empts containers, etc., but would appreciate some 
> feedback on what's already done first. :)
> P.S.: I have zero experience with using JIRA or contributing to Apache 
> projects; if there is a more formal procedure for suggesting a new feature, 
> please point me to it.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (TEZ-3833) Tasks should report codec errors during shuffle as fetch failures

2017-09-28 Thread TezQA (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16184473#comment-16184473
 ] 

TezQA commented on TEZ-3833:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12889517/TEZ-3833.004.patch
  against master revision 8f61c51.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 3.0.1) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-TEZ-Build/2645//testReport/
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/2645//console

This message is automatically generated.

> Tasks should report codec errors during shuffle as fetch failures
> -
>
> Key: TEZ-3833
> URL: https://issues.apache.org/jira/browse/TEZ-3833
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
> Attachments: TEZ-3833.001.patch, TEZ-3833.002.patch, 
> TEZ-3833.003.patch, TEZ-3833.004.patch
>
>
> Do the equivalent of https://issues.apache.org/jira/browse/MAPREDUCE-6633 so 
> that compression errors do not prove fatal for the DAG/tasks.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


Failed: TEZ-3833 PreCommit Build #2645

2017-09-28 Thread Apache Jenkins Server
Jira: https://issues.apache.org/jira/browse/TEZ-3833
Build: https://builds.apache.org/job/PreCommit-TEZ-Build/2645/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 339.75 KB...]
[INFO] Total time: 55:49 min
[INFO] Finished at: 2017-09-28T17:05:49Z
[INFO] Final Memory: 90M/1315M
[INFO] 




{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12889517/TEZ-3833.004.patch
  against master revision 8f61c51.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 3.0.1) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-TEZ-Build/2645//testReport/
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/2645//console

This message is automatically generated.


==
==
Adding comment to Jira.
==
==


Comment added.
693baee550ba2908396f374a3b5c15f36872c1c8 logged out


==
==
Finished build.
==
==


Build step 'Execute shell' marked build as failure
Archiving artifacts
[Fast Archiver] Compressed 3.51 MB of artifacts by 13.3% relative to #2640
[description-setter] Could not determine description.
Recording test results
Email was triggered for: Failure - Any
Sending email for trigger: Failure - Any



###
## FAILED TESTS (if any) 
##
All tests passed

Failed: TEZ-3830 PreCommit Build #2644

2017-09-28 Thread Apache Jenkins Server
Jira: https://issues.apache.org/jira/browse/TEZ-3830
Build: https://builds.apache.org/job/PreCommit-TEZ-Build/2644/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 339.75 KB...]
[INFO] Total time: 56:36 min
[INFO] Finished at: 2017-09-28T16:31:58Z
[INFO] Final Memory: 83M/1412M
[INFO] 




{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12889504/TEZ-3830.001.patch
  against master revision 8f61c51.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 3.0.1) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-TEZ-Build/2644//testReport/
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/2644//console

This message is automatically generated.


==
==
Adding comment to Jira.
==
==


Comment added.
7cb51436852445e1d9d5486b16061a40e12e0e67 logged out


==
==
Finished build.
==
==


Build step 'Execute shell' marked build as failure
Archiving artifacts
[Fast Archiver] Compressed 3.51 MB of artifacts by 32.0% relative to #2640
[description-setter] Could not determine description.
Recording test results
Email was triggered for: Failure - Any
Sending email for trigger: Failure - Any



###
## FAILED TESTS (if any) 
##
All tests passed

[jira] [Commented] (TEZ-3830) HistoryEventTimelineConversion should not hard code the Task state.

2017-09-28 Thread TezQA (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16184402#comment-16184402
 ] 

TezQA commented on TEZ-3830:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12889504/TEZ-3830.001.patch
  against master revision 8f61c51.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 3.0.1) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-TEZ-Build/2644//testReport/
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/2644//console

This message is automatically generated.

> HistoryEventTimelineConversion should not hard code the Task state.
> ---
>
> Key: TEZ-3830
> URL: https://issues.apache.org/jira/browse/TEZ-3830
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
> Attachments: TEZ-3830.001.patch
>
>
> TaskStartedEvent can have the state of the task so that the HistoryConversion 
> does not require task state to be hardcoded to SCHEDULED.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (TEZ-3833) Tasks should report codec errors during shuffle as fetch failures

2017-09-28 Thread Kuhu Shukla (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-3833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kuhu Shukla updated TEZ-3833:
-
Attachment: TEZ-3833.004.patch

Thank you [~jlowe]! Updated patch.

> Tasks should report codec errors during shuffle as fetch failures
> -
>
> Key: TEZ-3833
> URL: https://issues.apache.org/jira/browse/TEZ-3833
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
> Attachments: TEZ-3833.001.patch, TEZ-3833.002.patch, 
> TEZ-3833.003.patch, TEZ-3833.004.patch
>
>
> Do the equivalent of https://issues.apache.org/jira/browse/MAPREDUCE-6633 so 
> that compression errors do not prove fatal for the DAG/tasks.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (TEZ-3830) HistoryEventTimelineConversion should not hard code the Task state.

2017-09-28 Thread Kuhu Shukla (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-3830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kuhu Shukla updated TEZ-3830:
-
Attachment: TEZ-3830.001.patch

Sorry about the delay [~jeagles]. Here is v1 patch that adds TaskState to 
TaskStartedEvent.

> HistoryEventTimelineConversion should not hard code the Task state.
> ---
>
> Key: TEZ-3830
> URL: https://issues.apache.org/jira/browse/TEZ-3830
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
> Attachments: TEZ-3830.001.patch
>
>
> TaskStartedEvent can have the state of the task so that the HistoryConversion 
> does not require task state to be hardcoded to SCHEDULED.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (TEZ-3843) Tez UI Vertex/Tasks log links for running tasks are missing

2017-09-28 Thread Sreenath Somarajapuram (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16184077#comment-16184077
 ] 

Sreenath Somarajapuram commented on TEZ-3843:
-

Thanks [~jeagles].
+1 LGTM

> Tez UI Vertex/Tasks log links for running tasks are missing
> ---
>
> Key: TEZ-3843
> URL: https://issues.apache.org/jira/browse/TEZ-3843
> Project: Apache Tez
>  Issue Type: Bug
>  Components: UI
>Reporter: Jonathan Eagles
>Assignee: Jonathan Eagles
> Attachments: relatedentities.png, TEZ-3843.001.patch
>
>
> task serialization mistakenly getting list of attempts under 
> otherinfo.relatedentities. relatedentities is a top level property are 
> serialization should reflect this.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)