[jira] [Assigned] (TEZ-3846) Tez AM may not clean up properly on an internal error
[ https://issues.apache.org/jira/browse/TEZ-3846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhiyuan Yang reassigned TEZ-3846: - Assignee: Zhiyuan Yang > Tez AM may not clean up properly on an internal error > - > > Key: TEZ-3846 > URL: https://issues.apache.org/jira/browse/TEZ-3846 > Project: Apache Tez > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Zhiyuan Yang > > Normally, in Hive we blindly reopen the session on any submit error; however > I accidentally broke that, and while investigating noticed a new error before > reopen that claims that session where a DAG has failed is still running a > DAG. Looks like it should either clean up, or if we assume OOM is not > clean-up-able, die completely. > {noformat} > 2017-09-28T01:07:12,352 INFO [3d4e3f44-40c5-431a-b3de-801d60c1c579 main] > client.TezClient: Submitted dag to TezSession, > sessionName=HIVE-35a0e5c9-ce27-4b27-824c-ce9bc0fe104d, > applicationId=application_1506585924598_0001, > dagId=dag_1506585924598_0001_53, dagName=SELECT count(1) FROM ( > ... > 2017-09-28T01:07:25,787 ERROR [3d4e3f44-40c5-431a-b3de-801d60c1c579 main] > SessionState: Status: Failed > 2017-09-28T01:07:25,787 ERROR [3d4e3f44-40c5-431a-b3de-801d60c1c579 main] > SessionState: Vertex failed, vertexName=Map 61, > vertexId=vertex_1506585924598_0001_53_01, diagnostics=[Vertex > vertex_1506585924598_0001_53_01 [Map 61] killed/failed due > to:ROOT_INPUT_INIT_FAILURE, Vertex Input: src initializer failed, > vertex=vertex_1506585924598_0001_53_01 [Map 61], java.lang.OutOfMemoryError: > GC overhead limit exceeded > 2017-09-28T01:07:25,787 ERROR [3d4e3f44-40c5-431a-b3de-801d60c1c579 main] > SessionState: Invalid event V_INTERNAL_ERROR on Vertex > vertex_1506585924598_0001_53_00 [Map 60] > 2017-09-28T01:07:25,787 DEBUG [3d4e3f44-40c5-431a-b3de-801d60c1c579 main] > log.PerfLogger: end=1506586045787 duration=13435 > from=org.apache.hadoop.hive.ql.exec.tez.monitoring.TezJobMonitor> > ... [reuse] > 2017-09-28T01:07:28,459 INFO [11108166-069e-43d7-9e21-25b9214d01a4 main] > client.TezClient: Submitting dag to TezSession, > sessionName=HIVE-35a0e5c9-ce27-4b27-824c-ce9bc0fe104d, > applicationId=application_1506585924598_0001, dagName=insert overwrite table > orc_ppd_staging s...s(Stage-1), callerContext={ context=HIVE, > callerType=HIVE_QUERY_ID, > callerId=hiveptest_20170928010728_58f19d98-85da-4fad-83a7-7bf3aa0252a7 } > 2017-09-28T01:07:35,259 INFO [11108166-069e-43d7-9e21-25b9214d01a4 main] > exec.Task: Dag submit failed due to App master already running a DAG > {noformat} > Session continues living and failing like that multiple times. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (TEZ-3846) Tez AM may not clean up properly on an internal error
[ https://issues.apache.org/jira/browse/TEZ-3846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16185082#comment-16185082 ] Zhiyuan Yang commented on TEZ-3846: --- I'll take a look soon. > Tez AM may not clean up properly on an internal error > - > > Key: TEZ-3846 > URL: https://issues.apache.org/jira/browse/TEZ-3846 > Project: Apache Tez > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Zhiyuan Yang > > Normally, in Hive we blindly reopen the session on any submit error; however > I accidentally broke that, and while investigating noticed a new error before > reopen that claims that session where a DAG has failed is still running a > DAG. Looks like it should either clean up, or if we assume OOM is not > clean-up-able, die completely. > {noformat} > 2017-09-28T01:07:12,352 INFO [3d4e3f44-40c5-431a-b3de-801d60c1c579 main] > client.TezClient: Submitted dag to TezSession, > sessionName=HIVE-35a0e5c9-ce27-4b27-824c-ce9bc0fe104d, > applicationId=application_1506585924598_0001, > dagId=dag_1506585924598_0001_53, dagName=SELECT count(1) FROM ( > ... > 2017-09-28T01:07:25,787 ERROR [3d4e3f44-40c5-431a-b3de-801d60c1c579 main] > SessionState: Status: Failed > 2017-09-28T01:07:25,787 ERROR [3d4e3f44-40c5-431a-b3de-801d60c1c579 main] > SessionState: Vertex failed, vertexName=Map 61, > vertexId=vertex_1506585924598_0001_53_01, diagnostics=[Vertex > vertex_1506585924598_0001_53_01 [Map 61] killed/failed due > to:ROOT_INPUT_INIT_FAILURE, Vertex Input: src initializer failed, > vertex=vertex_1506585924598_0001_53_01 [Map 61], java.lang.OutOfMemoryError: > GC overhead limit exceeded > 2017-09-28T01:07:25,787 ERROR [3d4e3f44-40c5-431a-b3de-801d60c1c579 main] > SessionState: Invalid event V_INTERNAL_ERROR on Vertex > vertex_1506585924598_0001_53_00 [Map 60] > 2017-09-28T01:07:25,787 DEBUG [3d4e3f44-40c5-431a-b3de-801d60c1c579 main] > log.PerfLogger: end=1506586045787 duration=13435 > from=org.apache.hadoop.hive.ql.exec.tez.monitoring.TezJobMonitor> > ... [reuse] > 2017-09-28T01:07:28,459 INFO [11108166-069e-43d7-9e21-25b9214d01a4 main] > client.TezClient: Submitting dag to TezSession, > sessionName=HIVE-35a0e5c9-ce27-4b27-824c-ce9bc0fe104d, > applicationId=application_1506585924598_0001, dagName=insert overwrite table > orc_ppd_staging s...s(Stage-1), callerContext={ context=HIVE, > callerType=HIVE_QUERY_ID, > callerId=hiveptest_20170928010728_58f19d98-85da-4fad-83a7-7bf3aa0252a7 } > 2017-09-28T01:07:35,259 INFO [11108166-069e-43d7-9e21-25b9214d01a4 main] > exec.Task: Dag submit failed due to App master already running a DAG > {noformat} > Session continues living and failing like that multiple times. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (TEZ-3846) Tez AM may not clean up properly on an internal error
[ https://issues.apache.org/jira/browse/TEZ-3846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated TEZ-3846: -- Summary: Tez AM may not clean up properly on an internal error (was: Tez session may not clean up on internal error) > Tez AM may not clean up properly on an internal error > - > > Key: TEZ-3846 > URL: https://issues.apache.org/jira/browse/TEZ-3846 > Project: Apache Tez > Issue Type: Bug >Reporter: Sergey Shelukhin > > Normally, in Hive we blindly reopen the session on any error; however I > accidentally broke that, and while investigating noticed a new error before > reopen that claims that session where a DAG has failed is still running a > DAG. Looks like it should either clean up, or if we assume OOM is not > clean-up-able, die completely. > {noformat} > 2017-09-28T01:07:12,352 INFO [3d4e3f44-40c5-431a-b3de-801d60c1c579 main] > client.TezClient: Submitted dag to TezSession, > sessionName=HIVE-35a0e5c9-ce27-4b27-824c-ce9bc0fe104d, > applicationId=application_1506585924598_0001, > dagId=dag_1506585924598_0001_53, dagName=SELECT count(1) FROM ( > ... > 2017-09-28T01:07:25,787 ERROR [3d4e3f44-40c5-431a-b3de-801d60c1c579 main] > SessionState: Status: Failed > 2017-09-28T01:07:25,787 ERROR [3d4e3f44-40c5-431a-b3de-801d60c1c579 main] > SessionState: Vertex failed, vertexName=Map 61, > vertexId=vertex_1506585924598_0001_53_01, diagnostics=[Vertex > vertex_1506585924598_0001_53_01 [Map 61] killed/failed due > to:ROOT_INPUT_INIT_FAILURE, Vertex Input: src initializer failed, > vertex=vertex_1506585924598_0001_53_01 [Map 61], java.lang.OutOfMemoryError: > GC overhead limit exceeded > 2017-09-28T01:07:25,787 ERROR [3d4e3f44-40c5-431a-b3de-801d60c1c579 main] > SessionState: Invalid event V_INTERNAL_ERROR on Vertex > vertex_1506585924598_0001_53_00 [Map 60] > 2017-09-28T01:07:25,787 DEBUG [3d4e3f44-40c5-431a-b3de-801d60c1c579 main] > log.PerfLogger: end=1506586045787 duration=13435 > from=org.apache.hadoop.hive.ql.exec.tez.monitoring.TezJobMonitor> > ... [reuse] > 2017-09-28T01:07:28,459 INFO [11108166-069e-43d7-9e21-25b9214d01a4 main] > client.TezClient: Submitting dag to TezSession, > sessionName=HIVE-35a0e5c9-ce27-4b27-824c-ce9bc0fe104d, > applicationId=application_1506585924598_0001, dagName=insert overwrite table > orc_ppd_staging s...s(Stage-1), callerContext={ context=HIVE, > callerType=HIVE_QUERY_ID, > callerId=hiveptest_20170928010728_58f19d98-85da-4fad-83a7-7bf3aa0252a7 } > 2017-09-28T01:07:35,259 INFO [11108166-069e-43d7-9e21-25b9214d01a4 main] > exec.Task: Dag submit failed due to App master already running a DAG > {noformat} > Session continues living and failing like that multiple times. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (TEZ-3846) Tez session may not clean up on internal error
[ https://issues.apache.org/jira/browse/TEZ-3846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16185046#comment-16185046 ] Sergey Shelukhin commented on TEZ-3846: --- cc [~aplusplus] [~sseth] > Tez session may not clean up on internal error > -- > > Key: TEZ-3846 > URL: https://issues.apache.org/jira/browse/TEZ-3846 > Project: Apache Tez > Issue Type: Bug >Reporter: Sergey Shelukhin > > Normally, in Hive we blindly reopen the session on any error; however I > accidentally broke that, and while investigating noticed a new error before > reopen that claims that session where a DAG has failed is still running a > DAG. Looks like it should either clean up, or if we assume OOM is not > clean-up-able, die completely. > {noformat} > 2017-09-28T01:07:12,352 INFO [3d4e3f44-40c5-431a-b3de-801d60c1c579 main] > client.TezClient: Submitted dag to TezSession, > sessionName=HIVE-35a0e5c9-ce27-4b27-824c-ce9bc0fe104d, > applicationId=application_1506585924598_0001, > dagId=dag_1506585924598_0001_53, dagName=SELECT count(1) FROM ( > ... > 2017-09-28T01:07:25,787 ERROR [3d4e3f44-40c5-431a-b3de-801d60c1c579 main] > SessionState: Status: Failed > 2017-09-28T01:07:25,787 ERROR [3d4e3f44-40c5-431a-b3de-801d60c1c579 main] > SessionState: Vertex failed, vertexName=Map 61, > vertexId=vertex_1506585924598_0001_53_01, diagnostics=[Vertex > vertex_1506585924598_0001_53_01 [Map 61] killed/failed due > to:ROOT_INPUT_INIT_FAILURE, Vertex Input: src initializer failed, > vertex=vertex_1506585924598_0001_53_01 [Map 61], java.lang.OutOfMemoryError: > GC overhead limit exceeded > 2017-09-28T01:07:25,787 ERROR [3d4e3f44-40c5-431a-b3de-801d60c1c579 main] > SessionState: Invalid event V_INTERNAL_ERROR on Vertex > vertex_1506585924598_0001_53_00 [Map 60] > 2017-09-28T01:07:25,787 DEBUG [3d4e3f44-40c5-431a-b3de-801d60c1c579 main] > log.PerfLogger: end=1506586045787 duration=13435 > from=org.apache.hadoop.hive.ql.exec.tez.monitoring.TezJobMonitor> > ... [reuse] > 2017-09-28T01:07:28,459 INFO [11108166-069e-43d7-9e21-25b9214d01a4 main] > client.TezClient: Submitting dag to TezSession, > sessionName=HIVE-35a0e5c9-ce27-4b27-824c-ce9bc0fe104d, > applicationId=application_1506585924598_0001, dagName=insert overwrite table > orc_ppd_staging s...s(Stage-1), callerContext={ context=HIVE, > callerType=HIVE_QUERY_ID, > callerId=hiveptest_20170928010728_58f19d98-85da-4fad-83a7-7bf3aa0252a7 } > 2017-09-28T01:07:35,259 INFO [11108166-069e-43d7-9e21-25b9214d01a4 main] > exec.Task: Dag submit failed due to App master already running a DAG > {noformat} > Session continues living and failing like that multiple times. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (TEZ-3846) Tez AM may not clean up properly on an internal error
[ https://issues.apache.org/jira/browse/TEZ-3846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated TEZ-3846: -- Description: Normally, in Hive we blindly reopen the session on any submit error; however I accidentally broke that, and while investigating noticed a new error before reopen that claims that session where a DAG has failed is still running a DAG. Looks like it should either clean up, or if we assume OOM is not clean-up-able, die completely. {noformat} 2017-09-28T01:07:12,352 INFO [3d4e3f44-40c5-431a-b3de-801d60c1c579 main] client.TezClient: Submitted dag to TezSession, sessionName=HIVE-35a0e5c9-ce27-4b27-824c-ce9bc0fe104d, applicationId=application_1506585924598_0001, dagId=dag_1506585924598_0001_53, dagName=SELECT count(1) FROM ( ... 2017-09-28T01:07:25,787 ERROR [3d4e3f44-40c5-431a-b3de-801d60c1c579 main] SessionState: Status: Failed 2017-09-28T01:07:25,787 ERROR [3d4e3f44-40c5-431a-b3de-801d60c1c579 main] SessionState: Vertex failed, vertexName=Map 61, vertexId=vertex_1506585924598_0001_53_01, diagnostics=[Vertex vertex_1506585924598_0001_53_01 [Map 61] killed/failed due to:ROOT_INPUT_INIT_FAILURE, Vertex Input: src initializer failed, vertex=vertex_1506585924598_0001_53_01 [Map 61], java.lang.OutOfMemoryError: GC overhead limit exceeded 2017-09-28T01:07:25,787 ERROR [3d4e3f44-40c5-431a-b3de-801d60c1c579 main] SessionState: Invalid event V_INTERNAL_ERROR on Vertex vertex_1506585924598_0001_53_00 [Map 60] 2017-09-28T01:07:25,787 DEBUG [3d4e3f44-40c5-431a-b3de-801d60c1c579 main] log.PerfLogger: ... [reuse] 2017-09-28T01:07:28,459 INFO [11108166-069e-43d7-9e21-25b9214d01a4 main] client.TezClient: Submitting dag to TezSession, sessionName=HIVE-35a0e5c9-ce27-4b27-824c-ce9bc0fe104d, applicationId=application_1506585924598_0001, dagName=insert overwrite table orc_ppd_staging s...s(Stage-1), callerContext={ context=HIVE, callerType=HIVE_QUERY_ID, callerId=hiveptest_20170928010728_58f19d98-85da-4fad-83a7-7bf3aa0252a7 } 2017-09-28T01:07:35,259 INFO [11108166-069e-43d7-9e21-25b9214d01a4 main] exec.Task: Dag submit failed due to App master already running a DAG {noformat} Session continues living and failing like that multiple times. was: Normally, in Hive we blindly reopen the session on any error; however I accidentally broke that, and while investigating noticed a new error before reopen that claims that session where a DAG has failed is still running a DAG. Looks like it should either clean up, or if we assume OOM is not clean-up-able, die completely. {noformat} 2017-09-28T01:07:12,352 INFO [3d4e3f44-40c5-431a-b3de-801d60c1c579 main] client.TezClient: Submitted dag to TezSession, sessionName=HIVE-35a0e5c9-ce27-4b27-824c-ce9bc0fe104d, applicationId=application_1506585924598_0001, dagId=dag_1506585924598_0001_53, dagName=SELECT count(1) FROM ( ... 2017-09-28T01:07:25,787 ERROR [3d4e3f44-40c5-431a-b3de-801d60c1c579 main] SessionState: Status: Failed 2017-09-28T01:07:25,787 ERROR [3d4e3f44-40c5-431a-b3de-801d60c1c579 main] SessionState: Vertex failed, vertexName=Map 61, vertexId=vertex_1506585924598_0001_53_01, diagnostics=[Vertex vertex_1506585924598_0001_53_01 [Map 61] killed/failed due to:ROOT_INPUT_INIT_FAILURE, Vertex Input: src initializer failed, vertex=vertex_1506585924598_0001_53_01 [Map 61], java.lang.OutOfMemoryError: GC overhead limit exceeded 2017-09-28T01:07:25,787 ERROR [3d4e3f44-40c5-431a-b3de-801d60c1c579 main] SessionState: Invalid event V_INTERNAL_ERROR on Vertex vertex_1506585924598_0001_53_00 [Map 60] 2017-09-28T01:07:25,787 DEBUG [3d4e3f44-40c5-431a-b3de-801d60c1c579 main] log.PerfLogger: ... [reuse] 2017-09-28T01:07:28,459 INFO [11108166-069e-43d7-9e21-25b9214d01a4 main] client.TezClient: Submitting dag to TezSession, sessionName=HIVE-35a0e5c9-ce27-4b27-824c-ce9bc0fe104d, applicationId=application_1506585924598_0001, dagName=insert overwrite table orc_ppd_staging s...s(Stage-1), callerContext={ context=HIVE, callerType=HIVE_QUERY_ID, callerId=hiveptest_20170928010728_58f19d98-85da-4fad-83a7-7bf3aa0252a7 } 2017-09-28T01:07:35,259 INFO [11108166-069e-43d7-9e21-25b9214d01a4 main] exec.Task: Dag submit failed due to App master already running a DAG {noformat} Session continues living and failing like that multiple times. > Tez AM may not clean up properly on an internal error > - > > Key: TEZ-3846 > URL: https://issues.apache.org/jira/browse/TEZ-3846 > Project: Apache Tez > Issue Type: Bug >Reporter: Sergey Shelukhin > > Normally, in Hive we blindly reopen the session on any submit error; however > I accidentally broke that, and while investigating noticed a new error before > reopen that claims that session where a DAG has failed is still running a > DAG. Looks like it should either clean up, or if we assume OOM is
[jira] [Commented] (TEZ-3845) Tez UI Cleanup Stats Table
[ https://issues.apache.org/jira/browse/TEZ-3845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16184921#comment-16184921 ] Kuhu Shukla commented on TEZ-3845: -- Looks good to me. +1. Thanks [~jeagles] for reporting the issue and the patch. > Tez UI Cleanup Stats Table > -- > > Key: TEZ-3845 > URL: https://issues.apache.org/jira/browse/TEZ-3845 > Project: Apache Tez > Issue Type: Bug > Components: UI >Reporter: Jonathan Eagles >Assignee: Jonathan Eagles > Attachments: after_stats.png, before_stats.png, TEZ-3845.001.patch > > > Removed redundant status (for example: Succeeded Tasks: 10 Succeeded) > Made total tasks links > Added killed/failed task attempts available on the dag/index/ page > Reordered Stats to be consistent across all pages. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Resolved] (TEZ-3845) Tez UI Cleanup Stats Table
[ https://issues.apache.org/jira/browse/TEZ-3845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kuhu Shukla resolved TEZ-3845. -- Resolution: Fixed Fix Version/s: 0.9.1 Committed to master branch. > Tez UI Cleanup Stats Table > -- > > Key: TEZ-3845 > URL: https://issues.apache.org/jira/browse/TEZ-3845 > Project: Apache Tez > Issue Type: Bug > Components: UI >Reporter: Jonathan Eagles >Assignee: Jonathan Eagles > Fix For: 0.9.1 > > Attachments: after_stats.png, before_stats.png, TEZ-3845.001.patch > > > Removed redundant status (for example: Succeeded Tasks: 10 Succeeded) > Made total tasks links > Added killed/failed task attempts available on the dag/index/ page > Reordered Stats to be consistent across all pages. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (TEZ-3845) Tez UI Cleanup Stats Table
[ https://issues.apache.org/jira/browse/TEZ-3845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Eagles updated TEZ-3845: - Attachment: TEZ-3845.001.patch > Tez UI Cleanup Stats Table > -- > > Key: TEZ-3845 > URL: https://issues.apache.org/jira/browse/TEZ-3845 > Project: Apache Tez > Issue Type: Bug > Components: UI >Reporter: Jonathan Eagles >Assignee: Jonathan Eagles > Attachments: after_stats.png, before_stats.png, TEZ-3845.001.patch > > > Removed redundant status (for example: Succeeded Tasks: 10 Succeeded) > Made total tasks links > Added killed/failed task attempts available on the dag/index/ page > Reordered Stats to be consistent across all pages. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (TEZ-3845) Tez UI Cleanup Stats Table
[ https://issues.apache.org/jira/browse/TEZ-3845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16184889#comment-16184889 ] Jonathan Eagles commented on TEZ-3845: -- !before_stats.png! !after_stats.png! > Tez UI Cleanup Stats Table > -- > > Key: TEZ-3845 > URL: https://issues.apache.org/jira/browse/TEZ-3845 > Project: Apache Tez > Issue Type: Bug > Components: UI >Reporter: Jonathan Eagles >Assignee: Jonathan Eagles > Attachments: after_stats.png, before_stats.png > > > Removed redundant status (for example: Succeeded Tasks: 10 Succeeded) > Made total tasks links > Added killed/failed task attempts available on the dag/index/ page > Reordered Stats to be consistent across all pages. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (TEZ-3845) Tez UI Cleanup Stats Table
[ https://issues.apache.org/jira/browse/TEZ-3845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Eagles updated TEZ-3845: - Attachment: after_stats.png before_stats.png > Tez UI Cleanup Stats Table > -- > > Key: TEZ-3845 > URL: https://issues.apache.org/jira/browse/TEZ-3845 > Project: Apache Tez > Issue Type: Bug > Components: UI >Reporter: Jonathan Eagles >Assignee: Jonathan Eagles > Attachments: after_stats.png, before_stats.png > > > Removed redundant status (for example: Succeeded Tasks: 10 Succeeded) > Made total tasks links > Added killed/failed task attempts available on the dag/index/ page > Reordered Stats to be consistent across all pages. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (TEZ-3833) Tasks should report codec errors during shuffle as fetch failures
[ https://issues.apache.org/jira/browse/TEZ-3833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kuhu Shukla updated TEZ-3833: - Attachment: TEZ-3833.005.patch Thanks [~jlowe] for the helpful feedback. I have removed the check in the updated patch. > Tasks should report codec errors during shuffle as fetch failures > - > > Key: TEZ-3833 > URL: https://issues.apache.org/jira/browse/TEZ-3833 > Project: Apache Tez > Issue Type: Bug >Reporter: Kuhu Shukla >Assignee: Kuhu Shukla > Attachments: TEZ-3833.001.patch, TEZ-3833.002.patch, > TEZ-3833.003.patch, TEZ-3833.004.patch, TEZ-3833.005.patch > > > Do the equivalent of https://issues.apache.org/jira/browse/MAPREDUCE-6633 so > that compression errors do not prove fatal for the DAG/tasks. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (TEZ-3845) Tez UI Cleanup Stats Table
Jonathan Eagles created TEZ-3845: Summary: Tez UI Cleanup Stats Table Key: TEZ-3845 URL: https://issues.apache.org/jira/browse/TEZ-3845 Project: Apache Tez Issue Type: Bug Components: UI Reporter: Jonathan Eagles Assignee: Jonathan Eagles Removed redundant status (for example: Succeeded Tasks: 10 Succeeded) Made total tasks links Added killed/failed task attempts available on the dag/index/ page Reordered Stats to be consistent across all pages. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (TEZ-3833) Tasks should report codec errors during shuffle as fetch failures
[ https://issues.apache.org/jira/browse/TEZ-3833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16184873#comment-16184873 ] Jason Lowe commented on TEZ-3833: - I'm still confused why we want to treat InternalErrors like timeouts. Seems like we will do some bad things in some cases if we do. For example if we are trying to fetch 5 maps from a node and get an InternalError then we should blame the current map not all 5 maps, whereas if we are getting a connection timeout then we do want to associate that failure to connect with all 5 maps. Therefore I think we simply need to remove the instanceof check for InternalError. That will cause them to be treated like a regular I/O error which seems more appropriate. > Tasks should report codec errors during shuffle as fetch failures > - > > Key: TEZ-3833 > URL: https://issues.apache.org/jira/browse/TEZ-3833 > Project: Apache Tez > Issue Type: Bug >Reporter: Kuhu Shukla >Assignee: Kuhu Shukla > Attachments: TEZ-3833.001.patch, TEZ-3833.002.patch, > TEZ-3833.003.patch, TEZ-3833.004.patch > > > Do the equivalent of https://issues.apache.org/jira/browse/MAPREDUCE-6633 so > that compression errors do not prove fatal for the DAG/tasks. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (TEZ-3841) Proposal: Simulator mode
[ https://issues.apache.org/jira/browse/TEZ-3841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16184729#comment-16184729 ] Solal Pirelli commented on TEZ-3841: I'll add the Apache licenses later - weird though that the bot only points out 2 of the >2 files I added (none of which have the header). Since I attached the same patch twice (I forgot `--no-prefix` in the first one), but it resulted in two unrelated test failures, I guess those tests are flaky? > Proposal: Simulator mode > > > Key: TEZ-3841 > URL: https://issues.apache.org/jira/browse/TEZ-3841 > Project: Apache Tez > Issue Type: New Feature >Reporter: Solal Pirelli > Attachments: TEZ-3841.patch > > > Early work on a new feature proposal: a "simulator" mode in which vertices > are not actually executed, but instead use a simplified "fake" processor > (which is configurable, and by default does nothing) to let a developer see > how certain workloads will be handled. > For instance, one might want to check what happens if a vertex has each of > its 1000 tasks send a bunch of events - does this scale? Or, what if a > specific vertex fails 2% of the time - how does this impact overall graph > execution? Are 2 nodes with 10 containers per node enough, or should one > invest in a third node? > My current implementation is pretty simple: mimic the "uber" stuff to add a > new "fake" mode with a custom task scheduler and container launcher. It adds > the following configuration values: > * Boolean to enable fake mode > * Number of nodes in fake mode > * Number of containers per mode in fake mode > * Class to run in fake mode - must inherit a new class `FakeProcessor`, with > a single method `run` that takes the vertex name, task index and task > attempt, and returns a list of events. Throwing an exception causes the task > to fail. > I'm currently working on adding a "chaos monkey" kind of service which > randomly kills tasks, pre-empts containers, etc., but would appreciate some > feedback on what's already done first. :) > P.S.: I have zero experience with using JIRA or contributing to Apache > projects; if there is a more formal procedure for suggesting a new feature, > please point me to it. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (TEZ-3841) Proposal: Simulator mode
[ https://issues.apache.org/jira/browse/TEZ-3841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16184706#comment-16184706 ] TezQA commented on TEZ-3841: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12889545/TEZ-3841.patch against master revision bc08b19. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 3.0.1) warnings. {color:red}-1 release audit{color}. The applied patch generated 2 release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in : org.apache.tez.analyzer.TestAnalyzer Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/2647//testReport/ Release audit warnings: https://builds.apache.org/job/PreCommit-TEZ-Build/2647//artifact/patchprocess/patchReleaseAuditProblems.txt Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/2647//console This message is automatically generated. > Proposal: Simulator mode > > > Key: TEZ-3841 > URL: https://issues.apache.org/jira/browse/TEZ-3841 > Project: Apache Tez > Issue Type: New Feature >Reporter: Solal Pirelli > Attachments: TEZ-3841.patch > > > Early work on a new feature proposal: a "simulator" mode in which vertices > are not actually executed, but instead use a simplified "fake" processor > (which is configurable, and by default does nothing) to let a developer see > how certain workloads will be handled. > For instance, one might want to check what happens if a vertex has each of > its 1000 tasks send a bunch of events - does this scale? Or, what if a > specific vertex fails 2% of the time - how does this impact overall graph > execution? Are 2 nodes with 10 containers per node enough, or should one > invest in a third node? > My current implementation is pretty simple: mimic the "uber" stuff to add a > new "fake" mode with a custom task scheduler and container launcher. It adds > the following configuration values: > * Boolean to enable fake mode > * Number of nodes in fake mode > * Number of containers per mode in fake mode > * Class to run in fake mode - must inherit a new class `FakeProcessor`, with > a single method `run` that takes the vertex name, task index and task > attempt, and returns a list of events. Throwing an exception causes the task > to fail. > I'm currently working on adding a "chaos monkey" kind of service which > randomly kills tasks, pre-empts containers, etc., but would appreciate some > feedback on what's already done first. :) > P.S.: I have zero experience with using JIRA or contributing to Apache > projects; if there is a more formal procedure for suggesting a new feature, > please point me to it. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
Failed: TEZ-3841 PreCommit Build #2647
Jira: https://issues.apache.org/jira/browse/TEZ-3841 Build: https://builds.apache.org/job/PreCommit-TEZ-Build/2647/ ### ## LAST 60 LINES OF THE CONSOLE ### [...truncated 332.54 KB...] [ERROR] [Help 2] http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException [ERROR] [ERROR] After correcting the problems, you can resume the build with the command [ERROR] mvn -rf :tez-job-analyzer [INFO] Build failures were ignored. {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12889545/TEZ-3841.patch against master revision bc08b19. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 3.0.1) warnings. {color:red}-1 release audit{color}. The applied patch generated 2 release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in : org.apache.tez.analyzer.TestAnalyzer Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/2647//testReport/ Release audit warnings: https://builds.apache.org/job/PreCommit-TEZ-Build/2647//artifact/patchprocess/patchReleaseAuditProblems.txt Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/2647//console This message is automatically generated. == == Adding comment to Jira. == == Comment added. afbd3f559b4d6dd5480e5f18767504be710ba0fe logged out == == Finished build. == == Build step 'Execute shell' marked build as failure Archiving artifacts [description-setter] Could not determine description. Recording test results Email was triggered for: Failure - Any Sending email for trigger: Failure - Any ### ## FAILED TESTS (if any) ## 1 tests failed. FAILED: org.apache.tez.analyzer.TestAnalyzer.testWithATS Error Message: null Stack Trace: java.lang.AssertionError: null at org.junit.Assert.fail(Assert.java:86) at org.junit.Assert.assertTrue(Assert.java:41) at org.junit.Assert.assertTrue(Assert.java:52) at org.apache.tez.analyzer.TestAnalyzer.getDagInfo(TestAnalyzer.java:264) at org.apache.tez.analyzer.TestAnalyzer.verify(TestAnalyzer.java:251) at org.apache.tez.analyzer.TestAnalyzer.runTests(TestAnalyzer.java:390) at org.apache.tez.analyzer.TestAnalyzer.testWithATS(TestAnalyzer.java:354)
[jira] [Updated] (TEZ-3252) [Umbrella] Enable support for Hadoop-3.x
[ https://issues.apache.org/jira/browse/TEZ-3252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Eagles updated TEZ-3252: - Fix Version/s: (was: 0.9.0) 0.9.1 > [Umbrella] Enable support for Hadoop-3.x > - > > Key: TEZ-3252 > URL: https://issues.apache.org/jira/browse/TEZ-3252 > Project: Apache Tez > Issue Type: Bug >Reporter: Hitesh Shah > Fix For: 0.9.1 > > Attachments: TEZ-3252.patch > > > Placeholder umbrella to track the various issues/tasks discovered to get full > stable functionality against hadoop-3.x once it is released in a stable form. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Resolved] (TEZ-3843) Tez UI Vertex/Tasks log links for running tasks are missing
[ https://issues.apache.org/jira/browse/TEZ-3843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Eagles resolved TEZ-3843. -- Resolution: Fixed Fix Version/s: 0.9.1 > Tez UI Vertex/Tasks log links for running tasks are missing > --- > > Key: TEZ-3843 > URL: https://issues.apache.org/jira/browse/TEZ-3843 > Project: Apache Tez > Issue Type: Bug > Components: UI >Reporter: Jonathan Eagles >Assignee: Jonathan Eagles > Fix For: 0.9.1 > > Attachments: relatedentities.png, TEZ-3843.001.patch > > > task serialization mistakenly getting list of attempts under > otherinfo.relatedentities. relatedentities is a top level property are > serialization should reflect this. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (TEZ-3841) Proposal: Simulator mode
[ https://issues.apache.org/jira/browse/TEZ-3841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16184638#comment-16184638 ] TezQA commented on TEZ-3841: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12889536/TEZ-3841.patch against master revision bc08b19. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 3.0.1) warnings. {color:red}-1 release audit{color}. The applied patch generated 2 release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in : org.apache.tez.test.TestFaultTolerance Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/2646//testReport/ Release audit warnings: https://builds.apache.org/job/PreCommit-TEZ-Build/2646//artifact/patchprocess/patchReleaseAuditProblems.txt Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/2646//console This message is automatically generated. > Proposal: Simulator mode > > > Key: TEZ-3841 > URL: https://issues.apache.org/jira/browse/TEZ-3841 > Project: Apache Tez > Issue Type: New Feature >Reporter: Solal Pirelli > Attachments: TEZ-3841.patch > > > Early work on a new feature proposal: a "simulator" mode in which vertices > are not actually executed, but instead use a simplified "fake" processor > (which is configurable, and by default does nothing) to let a developer see > how certain workloads will be handled. > For instance, one might want to check what happens if a vertex has each of > its 1000 tasks send a bunch of events - does this scale? Or, what if a > specific vertex fails 2% of the time - how does this impact overall graph > execution? Are 2 nodes with 10 containers per node enough, or should one > invest in a third node? > My current implementation is pretty simple: mimic the "uber" stuff to add a > new "fake" mode with a custom task scheduler and container launcher. It adds > the following configuration values: > * Boolean to enable fake mode > * Number of nodes in fake mode > * Number of containers per mode in fake mode > * Class to run in fake mode - must inherit a new class `FakeProcessor`, with > a single method `run` that takes the vertex name, task index and task > attempt, and returns a list of events. Throwing an exception causes the task > to fail. > I'm currently working on adding a "chaos monkey" kind of service which > randomly kills tasks, pre-empts containers, etc., but would appreciate some > feedback on what's already done first. :) > P.S.: I have zero experience with using JIRA or contributing to Apache > projects; if there is a more formal procedure for suggesting a new feature, > please point me to it. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
Failed: TEZ-3841 PreCommit Build #2646
Jira: https://issues.apache.org/jira/browse/TEZ-3841 Build: https://builds.apache.org/job/PreCommit-TEZ-Build/2646/ ### ## LAST 60 LINES OF THE CONSOLE ### [...truncated 332.76 KB...] [ERROR] [Help 2] http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException [ERROR] [ERROR] After correcting the problems, you can resume the build with the command [ERROR] mvn -rf :tez-tests [INFO] Build failures were ignored. {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12889536/TEZ-3841.patch against master revision bc08b19. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 3.0.1) warnings. {color:red}-1 release audit{color}. The applied patch generated 2 release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in : org.apache.tez.test.TestFaultTolerance Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/2646//testReport/ Release audit warnings: https://builds.apache.org/job/PreCommit-TEZ-Build/2646//artifact/patchprocess/patchReleaseAuditProblems.txt Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/2646//console This message is automatically generated. == == Adding comment to Jira. == == Comment added. 1b7bcaf96ee9b35cbbfb5ea6c0946bf7218278e7 logged out == == Finished build. == == Build step 'Execute shell' marked build as failure Archiving artifacts [description-setter] Could not determine description. Recording test results Email was triggered for: Failure - Any Sending email for trigger: Failure - Any ### ## FAILED TESTS (if any) ## 1 tests failed. FAILED: org.apache.tez.test.TestFaultTolerance.testBasicInputFailureWithoutExit Error Message: expected: but was: Stack Trace: java.lang.AssertionError: expected: but was: at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.failNotEquals(Assert.java:743) at org.junit.Assert.assertEquals(Assert.java:118) at org.junit.Assert.assertEquals(Assert.java:144) at org.apache.tez.test.TestFaultTolerance.runDAGAndVerify(TestFaultTolerance.java:159) at org.apache.tez.test.TestFaultTolerance.runDAGAndVerify(TestFaultTolerance.java:142) at org.apache.tez.test.TestFaultTolerance.runDAGAndVerify(TestFaultTolerance.java:138) at org.apache.tez.test.TestFaultTolerance.testBasicInputFailureWithoutExit(TestFaultTolerance.java:334)
[jira] [Updated] (TEZ-3841) Proposal: Simulator mode
[ https://issues.apache.org/jira/browse/TEZ-3841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Solal Pirelli updated TEZ-3841: --- Attachment: (was: TEZ-3841.patch) > Proposal: Simulator mode > > > Key: TEZ-3841 > URL: https://issues.apache.org/jira/browse/TEZ-3841 > Project: Apache Tez > Issue Type: New Feature >Reporter: Solal Pirelli > Attachments: TEZ-3841.patch > > > Early work on a new feature proposal: a "simulator" mode in which vertices > are not actually executed, but instead use a simplified "fake" processor > (which is configurable, and by default does nothing) to let a developer see > how certain workloads will be handled. > For instance, one might want to check what happens if a vertex has each of > its 1000 tasks send a bunch of events - does this scale? Or, what if a > specific vertex fails 2% of the time - how does this impact overall graph > execution? Are 2 nodes with 10 containers per node enough, or should one > invest in a third node? > My current implementation is pretty simple: mimic the "uber" stuff to add a > new "fake" mode with a custom task scheduler and container launcher. It adds > the following configuration values: > * Boolean to enable fake mode > * Number of nodes in fake mode > * Number of containers per mode in fake mode > * Class to run in fake mode - must inherit a new class `FakeProcessor`, with > a single method `run` that takes the vertex name, task index and task > attempt, and returns a list of events. Throwing an exception causes the task > to fail. > I'm currently working on adding a "chaos monkey" kind of service which > randomly kills tasks, pre-empts containers, etc., but would appreciate some > feedback on what's already done first. :) > P.S.: I have zero experience with using JIRA or contributing to Apache > projects; if there is a more formal procedure for suggesting a new feature, > please point me to it. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (TEZ-3841) Proposal: Simulator mode
[ https://issues.apache.org/jira/browse/TEZ-3841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Solal Pirelli updated TEZ-3841: --- Attachment: TEZ-3841.patch > Proposal: Simulator mode > > > Key: TEZ-3841 > URL: https://issues.apache.org/jira/browse/TEZ-3841 > Project: Apache Tez > Issue Type: New Feature >Reporter: Solal Pirelli > Attachments: TEZ-3841.patch > > > Early work on a new feature proposal: a "simulator" mode in which vertices > are not actually executed, but instead use a simplified "fake" processor > (which is configurable, and by default does nothing) to let a developer see > how certain workloads will be handled. > For instance, one might want to check what happens if a vertex has each of > its 1000 tasks send a bunch of events - does this scale? Or, what if a > specific vertex fails 2% of the time - how does this impact overall graph > execution? Are 2 nodes with 10 containers per node enough, or should one > invest in a third node? > My current implementation is pretty simple: mimic the "uber" stuff to add a > new "fake" mode with a custom task scheduler and container launcher. It adds > the following configuration values: > * Boolean to enable fake mode > * Number of nodes in fake mode > * Number of containers per mode in fake mode > * Class to run in fake mode - must inherit a new class `FakeProcessor`, with > a single method `run` that takes the vertex name, task index and task > attempt, and returns a list of events. Throwing an exception causes the task > to fail. > I'm currently working on adding a "chaos monkey" kind of service which > randomly kills tasks, pre-empts containers, etc., but would appreciate some > feedback on what's already done first. :) > P.S.: I have zero experience with using JIRA or contributing to Apache > projects; if there is a more formal procedure for suggesting a new feature, > please point me to it. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (TEZ-3841) Proposal: Simulator mode
[ https://issues.apache.org/jira/browse/TEZ-3841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Solal Pirelli updated TEZ-3841: --- Attachment: TEZ-3841.patch > Proposal: Simulator mode > > > Key: TEZ-3841 > URL: https://issues.apache.org/jira/browse/TEZ-3841 > Project: Apache Tez > Issue Type: New Feature >Reporter: Solal Pirelli > Attachments: TEZ-3841.patch > > > Early work on a new feature proposal: a "simulator" mode in which vertices > are not actually executed, but instead use a simplified "fake" processor > (which is configurable, and by default does nothing) to let a developer see > how certain workloads will be handled. > For instance, one might want to check what happens if a vertex has each of > its 1000 tasks send a bunch of events - does this scale? Or, what if a > specific vertex fails 2% of the time - how does this impact overall graph > execution? Are 2 nodes with 10 containers per node enough, or should one > invest in a third node? > My current implementation is pretty simple: mimic the "uber" stuff to add a > new "fake" mode with a custom task scheduler and container launcher. It adds > the following configuration values: > * Boolean to enable fake mode > * Number of nodes in fake mode > * Number of containers per mode in fake mode > * Class to run in fake mode - must inherit a new class `FakeProcessor`, with > a single method `run` that takes the vertex name, task index and task > attempt, and returns a list of events. Throwing an exception causes the task > to fail. > I'm currently working on adding a "chaos monkey" kind of service which > randomly kills tasks, pre-empts containers, etc., but would appreciate some > feedback on what's already done first. :) > P.S.: I have zero experience with using JIRA or contributing to Apache > projects; if there is a more formal procedure for suggesting a new feature, > please point me to it. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (TEZ-3841) Proposal: Simulator mode
[ https://issues.apache.org/jira/browse/TEZ-3841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Solal Pirelli updated TEZ-3841: --- Attachment: (was: tez-fake-mode.v2.patch) > Proposal: Simulator mode > > > Key: TEZ-3841 > URL: https://issues.apache.org/jira/browse/TEZ-3841 > Project: Apache Tez > Issue Type: New Feature >Reporter: Solal Pirelli > > Early work on a new feature proposal: a "simulator" mode in which vertices > are not actually executed, but instead use a simplified "fake" processor > (which is configurable, and by default does nothing) to let a developer see > how certain workloads will be handled. > For instance, one might want to check what happens if a vertex has each of > its 1000 tasks send a bunch of events - does this scale? Or, what if a > specific vertex fails 2% of the time - how does this impact overall graph > execution? Are 2 nodes with 10 containers per node enough, or should one > invest in a third node? > My current implementation is pretty simple: mimic the "uber" stuff to add a > new "fake" mode with a custom task scheduler and container launcher. It adds > the following configuration values: > * Boolean to enable fake mode > * Number of nodes in fake mode > * Number of containers per mode in fake mode > * Class to run in fake mode - must inherit a new class `FakeProcessor`, with > a single method `run` that takes the vertex name, task index and task > attempt, and returns a list of events. Throwing an exception causes the task > to fail. > I'm currently working on adding a "chaos monkey" kind of service which > randomly kills tasks, pre-empts containers, etc., but would appreciate some > feedback on what's already done first. :) > P.S.: I have zero experience with using JIRA or contributing to Apache > projects; if there is a more formal procedure for suggesting a new feature, > please point me to it. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (TEZ-3841) Proposal: Simulator mode
[ https://issues.apache.org/jira/browse/TEZ-3841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Solal Pirelli updated TEZ-3841: --- Attachment: (was: tez-fake-mode.patch) > Proposal: Simulator mode > > > Key: TEZ-3841 > URL: https://issues.apache.org/jira/browse/TEZ-3841 > Project: Apache Tez > Issue Type: New Feature >Reporter: Solal Pirelli > > Early work on a new feature proposal: a "simulator" mode in which vertices > are not actually executed, but instead use a simplified "fake" processor > (which is configurable, and by default does nothing) to let a developer see > how certain workloads will be handled. > For instance, one might want to check what happens if a vertex has each of > its 1000 tasks send a bunch of events - does this scale? Or, what if a > specific vertex fails 2% of the time - how does this impact overall graph > execution? Are 2 nodes with 10 containers per node enough, or should one > invest in a third node? > My current implementation is pretty simple: mimic the "uber" stuff to add a > new "fake" mode with a custom task scheduler and container launcher. It adds > the following configuration values: > * Boolean to enable fake mode > * Number of nodes in fake mode > * Number of containers per mode in fake mode > * Class to run in fake mode - must inherit a new class `FakeProcessor`, with > a single method `run` that takes the vertex name, task index and task > attempt, and returns a list of events. Throwing an exception causes the task > to fail. > I'm currently working on adding a "chaos monkey" kind of service which > randomly kills tasks, pre-empts containers, etc., but would appreciate some > feedback on what's already done first. :) > P.S.: I have zero experience with using JIRA or contributing to Apache > projects; if there is a more formal procedure for suggesting a new feature, > please point me to it. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (TEZ-3833) Tasks should report codec errors during shuffle as fetch failures
[ https://issues.apache.org/jira/browse/TEZ-3833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16184473#comment-16184473 ] TezQA commented on TEZ-3833: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12889517/TEZ-3833.004.patch against master revision 8f61c51. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 3.0.1) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/2645//testReport/ Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/2645//console This message is automatically generated. > Tasks should report codec errors during shuffle as fetch failures > - > > Key: TEZ-3833 > URL: https://issues.apache.org/jira/browse/TEZ-3833 > Project: Apache Tez > Issue Type: Bug >Reporter: Kuhu Shukla >Assignee: Kuhu Shukla > Attachments: TEZ-3833.001.patch, TEZ-3833.002.patch, > TEZ-3833.003.patch, TEZ-3833.004.patch > > > Do the equivalent of https://issues.apache.org/jira/browse/MAPREDUCE-6633 so > that compression errors do not prove fatal for the DAG/tasks. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
Failed: TEZ-3833 PreCommit Build #2645
Jira: https://issues.apache.org/jira/browse/TEZ-3833 Build: https://builds.apache.org/job/PreCommit-TEZ-Build/2645/ ### ## LAST 60 LINES OF THE CONSOLE ### [...truncated 339.75 KB...] [INFO] Total time: 55:49 min [INFO] Finished at: 2017-09-28T17:05:49Z [INFO] Final Memory: 90M/1315M [INFO] {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12889517/TEZ-3833.004.patch against master revision 8f61c51. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 3.0.1) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/2645//testReport/ Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/2645//console This message is automatically generated. == == Adding comment to Jira. == == Comment added. 693baee550ba2908396f374a3b5c15f36872c1c8 logged out == == Finished build. == == Build step 'Execute shell' marked build as failure Archiving artifacts [Fast Archiver] Compressed 3.51 MB of artifacts by 13.3% relative to #2640 [description-setter] Could not determine description. Recording test results Email was triggered for: Failure - Any Sending email for trigger: Failure - Any ### ## FAILED TESTS (if any) ## All tests passed
Failed: TEZ-3830 PreCommit Build #2644
Jira: https://issues.apache.org/jira/browse/TEZ-3830 Build: https://builds.apache.org/job/PreCommit-TEZ-Build/2644/ ### ## LAST 60 LINES OF THE CONSOLE ### [...truncated 339.75 KB...] [INFO] Total time: 56:36 min [INFO] Finished at: 2017-09-28T16:31:58Z [INFO] Final Memory: 83M/1412M [INFO] {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12889504/TEZ-3830.001.patch against master revision 8f61c51. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 3.0.1) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/2644//testReport/ Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/2644//console This message is automatically generated. == == Adding comment to Jira. == == Comment added. 7cb51436852445e1d9d5486b16061a40e12e0e67 logged out == == Finished build. == == Build step 'Execute shell' marked build as failure Archiving artifacts [Fast Archiver] Compressed 3.51 MB of artifacts by 32.0% relative to #2640 [description-setter] Could not determine description. Recording test results Email was triggered for: Failure - Any Sending email for trigger: Failure - Any ### ## FAILED TESTS (if any) ## All tests passed
[jira] [Commented] (TEZ-3830) HistoryEventTimelineConversion should not hard code the Task state.
[ https://issues.apache.org/jira/browse/TEZ-3830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16184402#comment-16184402 ] TezQA commented on TEZ-3830: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12889504/TEZ-3830.001.patch against master revision 8f61c51. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 3.0.1) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/2644//testReport/ Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/2644//console This message is automatically generated. > HistoryEventTimelineConversion should not hard code the Task state. > --- > > Key: TEZ-3830 > URL: https://issues.apache.org/jira/browse/TEZ-3830 > Project: Apache Tez > Issue Type: Bug >Reporter: Kuhu Shukla >Assignee: Kuhu Shukla > Attachments: TEZ-3830.001.patch > > > TaskStartedEvent can have the state of the task so that the HistoryConversion > does not require task state to be hardcoded to SCHEDULED. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (TEZ-3833) Tasks should report codec errors during shuffle as fetch failures
[ https://issues.apache.org/jira/browse/TEZ-3833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kuhu Shukla updated TEZ-3833: - Attachment: TEZ-3833.004.patch Thank you [~jlowe]! Updated patch. > Tasks should report codec errors during shuffle as fetch failures > - > > Key: TEZ-3833 > URL: https://issues.apache.org/jira/browse/TEZ-3833 > Project: Apache Tez > Issue Type: Bug >Reporter: Kuhu Shukla >Assignee: Kuhu Shukla > Attachments: TEZ-3833.001.patch, TEZ-3833.002.patch, > TEZ-3833.003.patch, TEZ-3833.004.patch > > > Do the equivalent of https://issues.apache.org/jira/browse/MAPREDUCE-6633 so > that compression errors do not prove fatal for the DAG/tasks. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (TEZ-3830) HistoryEventTimelineConversion should not hard code the Task state.
[ https://issues.apache.org/jira/browse/TEZ-3830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kuhu Shukla updated TEZ-3830: - Attachment: TEZ-3830.001.patch Sorry about the delay [~jeagles]. Here is v1 patch that adds TaskState to TaskStartedEvent. > HistoryEventTimelineConversion should not hard code the Task state. > --- > > Key: TEZ-3830 > URL: https://issues.apache.org/jira/browse/TEZ-3830 > Project: Apache Tez > Issue Type: Bug >Reporter: Kuhu Shukla >Assignee: Kuhu Shukla > Attachments: TEZ-3830.001.patch > > > TaskStartedEvent can have the state of the task so that the HistoryConversion > does not require task state to be hardcoded to SCHEDULED. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (TEZ-3843) Tez UI Vertex/Tasks log links for running tasks are missing
[ https://issues.apache.org/jira/browse/TEZ-3843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16184077#comment-16184077 ] Sreenath Somarajapuram commented on TEZ-3843: - Thanks [~jeagles]. +1 LGTM > Tez UI Vertex/Tasks log links for running tasks are missing > --- > > Key: TEZ-3843 > URL: https://issues.apache.org/jira/browse/TEZ-3843 > Project: Apache Tez > Issue Type: Bug > Components: UI >Reporter: Jonathan Eagles >Assignee: Jonathan Eagles > Attachments: relatedentities.png, TEZ-3843.001.patch > > > task serialization mistakenly getting list of attempts under > otherinfo.relatedentities. relatedentities is a top level property are > serialization should reflect this. -- This message was sent by Atlassian JIRA (v6.4.14#64029)