[jira] [Commented] (TEZ-3841) Proposal: Simulator mode

2017-09-29 Thread TezQA (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16186730#comment-16186730
 ] 

TezQA commented on TEZ-3841:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12889785/TEZ-3841.patch
  against master revision 14cc282.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 3.0.1) warnings.

{color:red}-1 release audit{color}.  The applied patch generated 2 
release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-TEZ-Build/2650//testReport/
Release audit warnings: 
https://builds.apache.org/job/PreCommit-TEZ-Build/2650//artifact/patchprocess/patchReleaseAuditProblems.txt
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/2650//console

This message is automatically generated.

> Proposal: Simulator mode
> 
>
> Key: TEZ-3841
> URL: https://issues.apache.org/jira/browse/TEZ-3841
> Project: Apache Tez
>  Issue Type: New Feature
>Reporter: Solal Pirelli
> Attachments: TEZ-3841.patch
>
>
> Early work on a new feature proposal: a "simulator" mode in which vertices 
> are not actually executed, but instead use a simplified "fake" processor 
> (which is configurable, and by default does nothing) to let a developer see 
> how certain workloads will be handled.
> For instance, one might want to check what happens if a vertex has each of 
> its 1000 tasks send a bunch of events - does this scale? Or, what if a 
> specific vertex fails 2% of the time - how does this impact overall graph 
> execution? Are 2 nodes with 10 containers per node enough, or should one 
> invest in a third node?
> My current implementation is pretty simple: mimic the "uber" stuff to add a 
> new "fake" mode with a custom task scheduler and container launcher. It adds 
> the following configuration values:
> * Boolean to enable fake mode
> * Number of nodes in fake mode
> * Number of containers per mode in fake mode
> * Class to run in fake mode - must inherit a new class `FakeProcessor`, with 
> a single method `run` that takes the vertex name, task index and task 
> attempt, and returns a list of events. Throwing an exception causes the task 
> to fail.
> I'm currently working on adding a "chaos monkey" kind of service which 
> randomly kills tasks, pre-empts containers, etc., but would appreciate some 
> feedback on what's already done first. :)
> P.S.: I have zero experience with using JIRA or contributing to Apache 
> projects; if there is a more formal procedure for suggesting a new feature, 
> please point me to it.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


Failed: TEZ-3841 PreCommit Build #2650

2017-09-29 Thread Apache Jenkins Server
Jira: https://issues.apache.org/jira/browse/TEZ-3841
Build: https://builds.apache.org/job/PreCommit-TEZ-Build/2650/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 340.92 KB...]
[INFO] BUILD SUCCESS
[INFO] 
[INFO] Total time: 55:16 min
[INFO] Finished at: 2017-09-30T00:03:29Z
[INFO] Final Memory: 90M/1380M
[INFO] 




{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12889785/TEZ-3841.patch
  against master revision 14cc282.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 3.0.1) warnings.

{color:red}-1 release audit{color}.  The applied patch generated 2 
release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-TEZ-Build/2650//testReport/
Release audit warnings: 
https://builds.apache.org/job/PreCommit-TEZ-Build/2650//artifact/patchprocess/patchReleaseAuditProblems.txt
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/2650//console

This message is automatically generated.


==
==
Adding comment to Jira.
==
==


Comment added.
453950d9384d63daa4511888316f3037b1881ef1 logged out


==
==
Finished build.
==
==


Build step 'Execute shell' marked build as failure
Archiving artifacts
[description-setter] Could not determine description.
Recording test results
Email was triggered for: Failure - Any
Sending email for trigger: Failure - Any



###
## FAILED TESTS (if any) 
##
All tests passed

[jira] [Updated] (TEZ-3841) Proposal: Simulator mode

2017-09-29 Thread Solal Pirelli (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-3841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Solal Pirelli updated TEZ-3841:
---
Attachment: TEZ-3841.patch

> Proposal: Simulator mode
> 
>
> Key: TEZ-3841
> URL: https://issues.apache.org/jira/browse/TEZ-3841
> Project: Apache Tez
>  Issue Type: New Feature
>Reporter: Solal Pirelli
> Attachments: TEZ-3841.patch
>
>
> Early work on a new feature proposal: a "simulator" mode in which vertices 
> are not actually executed, but instead use a simplified "fake" processor 
> (which is configurable, and by default does nothing) to let a developer see 
> how certain workloads will be handled.
> For instance, one might want to check what happens if a vertex has each of 
> its 1000 tasks send a bunch of events - does this scale? Or, what if a 
> specific vertex fails 2% of the time - how does this impact overall graph 
> execution? Are 2 nodes with 10 containers per node enough, or should one 
> invest in a third node?
> My current implementation is pretty simple: mimic the "uber" stuff to add a 
> new "fake" mode with a custom task scheduler and container launcher. It adds 
> the following configuration values:
> * Boolean to enable fake mode
> * Number of nodes in fake mode
> * Number of containers per mode in fake mode
> * Class to run in fake mode - must inherit a new class `FakeProcessor`, with 
> a single method `run` that takes the vertex name, task index and task 
> attempt, and returns a list of events. Throwing an exception causes the task 
> to fail.
> I'm currently working on adding a "chaos monkey" kind of service which 
> randomly kills tasks, pre-empts containers, etc., but would appreciate some 
> feedback on what's already done first. :)
> P.S.: I have zero experience with using JIRA or contributing to Apache 
> projects; if there is a more formal procedure for suggesting a new feature, 
> please point me to it.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (TEZ-3841) Proposal: Simulator mode

2017-09-29 Thread Solal Pirelli (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-3841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Solal Pirelli updated TEZ-3841:
---
Attachment: (was: TEZ-3841.patch)

> Proposal: Simulator mode
> 
>
> Key: TEZ-3841
> URL: https://issues.apache.org/jira/browse/TEZ-3841
> Project: Apache Tez
>  Issue Type: New Feature
>Reporter: Solal Pirelli
>
> Early work on a new feature proposal: a "simulator" mode in which vertices 
> are not actually executed, but instead use a simplified "fake" processor 
> (which is configurable, and by default does nothing) to let a developer see 
> how certain workloads will be handled.
> For instance, one might want to check what happens if a vertex has each of 
> its 1000 tasks send a bunch of events - does this scale? Or, what if a 
> specific vertex fails 2% of the time - how does this impact overall graph 
> execution? Are 2 nodes with 10 containers per node enough, or should one 
> invest in a third node?
> My current implementation is pretty simple: mimic the "uber" stuff to add a 
> new "fake" mode with a custom task scheduler and container launcher. It adds 
> the following configuration values:
> * Boolean to enable fake mode
> * Number of nodes in fake mode
> * Number of containers per mode in fake mode
> * Class to run in fake mode - must inherit a new class `FakeProcessor`, with 
> a single method `run` that takes the vertex name, task index and task 
> attempt, and returns a list of events. Throwing an exception causes the task 
> to fail.
> I'm currently working on adding a "chaos monkey" kind of service which 
> randomly kills tasks, pre-empts containers, etc., but would appreciate some 
> feedback on what's already done first. :)
> P.S.: I have zero experience with using JIRA or contributing to Apache 
> projects; if there is a more formal procedure for suggesting a new feature, 
> please point me to it.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (TEZ-3847) AM web controller task counters are empty sometimes

2017-09-29 Thread Jonathan Eagles (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-3847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Eagles updated TEZ-3847:
-
Attachment: TEZ-3847.001.patch

> AM web controller task counters are empty sometimes
> ---
>
> Key: TEZ-3847
> URL: https://issues.apache.org/jira/browse/TEZ-3847
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Jonathan Eagles
> Attachments: TEZ-3847.001.patch
>
>
> The interval for statistics and counters are send at longer intervals and the 
> TaskAttemptImpl blindly overwrites it stats and counters with null.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (TEZ-3847) AM web controller task counters are empty sometimes

2017-09-29 Thread Jonathan Eagles (JIRA)
Jonathan Eagles created TEZ-3847:


 Summary: AM web controller task counters are empty sometimes
 Key: TEZ-3847
 URL: https://issues.apache.org/jira/browse/TEZ-3847
 Project: Apache Tez
  Issue Type: Bug
Reporter: Jonathan Eagles


The interval for statistics and counters are send at longer intervals and the 
TaskAttemptImpl blindly overwrites it stats and counters with null.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (TEZ-3846) Tez AM may not clean up properly on an internal error

2017-09-29 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16186309#comment-16186309
 ] 

Sergey Shelukhin commented on TEZ-3846:
---

Tez version was 0.9.0 (the one Hive is using on master). Unfortunately I don't 
have vertex logs.

> Tez AM may not clean up properly on an internal error
> -
>
> Key: TEZ-3846
> URL: https://issues.apache.org/jira/browse/TEZ-3846
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Zhiyuan Yang
>
> Normally, in Hive we blindly reopen the session on any submit error; however 
> I accidentally broke that, and while investigating noticed a new error before 
> reopen that claims that session where a DAG has failed is still running a 
> DAG. Looks like it should either clean up, or if we assume OOM is not 
> clean-up-able, die completely.
> {noformat}
> 2017-09-28T01:07:12,352  INFO [3d4e3f44-40c5-431a-b3de-801d60c1c579 main] 
> client.TezClient: Submitted dag to TezSession, 
> sessionName=HIVE-35a0e5c9-ce27-4b27-824c-ce9bc0fe104d, 
> applicationId=application_1506585924598_0001, 
> dagId=dag_1506585924598_0001_53, dagName=SELECT count(1) FROM (
> ...
> 2017-09-28T01:07:25,787 ERROR [3d4e3f44-40c5-431a-b3de-801d60c1c579 main] 
> SessionState: Status: Failed
> 2017-09-28T01:07:25,787 ERROR [3d4e3f44-40c5-431a-b3de-801d60c1c579 main] 
> SessionState: Vertex failed, vertexName=Map 61, 
> vertexId=vertex_1506585924598_0001_53_01, diagnostics=[Vertex 
> vertex_1506585924598_0001_53_01 [Map 61] killed/failed due 
> to:ROOT_INPUT_INIT_FAILURE, Vertex Input: src initializer failed, 
> vertex=vertex_1506585924598_0001_53_01 [Map 61], java.lang.OutOfMemoryError: 
> GC overhead limit exceeded
> 2017-09-28T01:07:25,787 ERROR [3d4e3f44-40c5-431a-b3de-801d60c1c579 main] 
> SessionState: Invalid event V_INTERNAL_ERROR on Vertex 
> vertex_1506585924598_0001_53_00 [Map 60]
> 2017-09-28T01:07:25,787 DEBUG [3d4e3f44-40c5-431a-b3de-801d60c1c579 main] 
> log.PerfLogger:  end=1506586045787 duration=13435 
> from=org.apache.hadoop.hive.ql.exec.tez.monitoring.TezJobMonitor>
> ... [reuse]
> 2017-09-28T01:07:28,459  INFO [11108166-069e-43d7-9e21-25b9214d01a4 main] 
> client.TezClient: Submitting dag to TezSession, 
> sessionName=HIVE-35a0e5c9-ce27-4b27-824c-ce9bc0fe104d, 
> applicationId=application_1506585924598_0001, dagName=insert overwrite table 
> orc_ppd_staging s...s(Stage-1), callerContext={ context=HIVE, 
> callerType=HIVE_QUERY_ID, 
> callerId=hiveptest_20170928010728_58f19d98-85da-4fad-83a7-7bf3aa0252a7 }
> 2017-09-28T01:07:35,259  INFO [11108166-069e-43d7-9e21-25b9214d01a4 main] 
> exec.Task: Dag submit failed due to App master already running a DAG
> {noformat}
> Session continues living and failing like that multiple times.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (TEZ-3833) Tasks should report codec errors during shuffle as fetch failures

2017-09-29 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16186138#comment-16186138
 ] 

Jason Lowe commented on TEZ-3833:
-

+1 lgtm.  Committing this.


> Tasks should report codec errors during shuffle as fetch failures
> -
>
> Key: TEZ-3833
> URL: https://issues.apache.org/jira/browse/TEZ-3833
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
> Attachments: TEZ-3833.001.patch, TEZ-3833.002.patch, 
> TEZ-3833.003.patch, TEZ-3833.004.patch, TEZ-3833.005.patch
>
>
> Do the equivalent of https://issues.apache.org/jira/browse/MAPREDUCE-6633 so 
> that compression errors do not prove fatal for the DAG/tasks.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (TEZ-3833) Tasks should report codec errors during shuffle as fetch failures

2017-09-29 Thread TezQA (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16186133#comment-16186133
 ] 

TezQA commented on TEZ-3833:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12889589/TEZ-3833.005.patch
  against master revision a4a3c6d.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 3.0.1) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-TEZ-Build/2649//testReport/
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/2649//console

This message is automatically generated.

> Tasks should report codec errors during shuffle as fetch failures
> -
>
> Key: TEZ-3833
> URL: https://issues.apache.org/jira/browse/TEZ-3833
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
> Attachments: TEZ-3833.001.patch, TEZ-3833.002.patch, 
> TEZ-3833.003.patch, TEZ-3833.004.patch, TEZ-3833.005.patch
>
>
> Do the equivalent of https://issues.apache.org/jira/browse/MAPREDUCE-6633 so 
> that compression errors do not prove fatal for the DAG/tasks.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (TEZ-3846) Tez AM may not clean up properly on an internal error

2017-09-29 Thread Kuhu Shukla (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16186024#comment-16186024
 ] 

Kuhu Shukla commented on TEZ-3846:
--

[~sershe], Thank you for reporting the issue. On what version of Tez was this 
issue seen? Wondering if any of the recent fixes and/or JIRAs might be related 
here. eg. TEZ-3817.

> Tez AM may not clean up properly on an internal error
> -
>
> Key: TEZ-3846
> URL: https://issues.apache.org/jira/browse/TEZ-3846
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Zhiyuan Yang
>
> Normally, in Hive we blindly reopen the session on any submit error; however 
> I accidentally broke that, and while investigating noticed a new error before 
> reopen that claims that session where a DAG has failed is still running a 
> DAG. Looks like it should either clean up, or if we assume OOM is not 
> clean-up-able, die completely.
> {noformat}
> 2017-09-28T01:07:12,352  INFO [3d4e3f44-40c5-431a-b3de-801d60c1c579 main] 
> client.TezClient: Submitted dag to TezSession, 
> sessionName=HIVE-35a0e5c9-ce27-4b27-824c-ce9bc0fe104d, 
> applicationId=application_1506585924598_0001, 
> dagId=dag_1506585924598_0001_53, dagName=SELECT count(1) FROM (
> ...
> 2017-09-28T01:07:25,787 ERROR [3d4e3f44-40c5-431a-b3de-801d60c1c579 main] 
> SessionState: Status: Failed
> 2017-09-28T01:07:25,787 ERROR [3d4e3f44-40c5-431a-b3de-801d60c1c579 main] 
> SessionState: Vertex failed, vertexName=Map 61, 
> vertexId=vertex_1506585924598_0001_53_01, diagnostics=[Vertex 
> vertex_1506585924598_0001_53_01 [Map 61] killed/failed due 
> to:ROOT_INPUT_INIT_FAILURE, Vertex Input: src initializer failed, 
> vertex=vertex_1506585924598_0001_53_01 [Map 61], java.lang.OutOfMemoryError: 
> GC overhead limit exceeded
> 2017-09-28T01:07:25,787 ERROR [3d4e3f44-40c5-431a-b3de-801d60c1c579 main] 
> SessionState: Invalid event V_INTERNAL_ERROR on Vertex 
> vertex_1506585924598_0001_53_00 [Map 60]
> 2017-09-28T01:07:25,787 DEBUG [3d4e3f44-40c5-431a-b3de-801d60c1c579 main] 
> log.PerfLogger:  end=1506586045787 duration=13435 
> from=org.apache.hadoop.hive.ql.exec.tez.monitoring.TezJobMonitor>
> ... [reuse]
> 2017-09-28T01:07:28,459  INFO [11108166-069e-43d7-9e21-25b9214d01a4 main] 
> client.TezClient: Submitting dag to TezSession, 
> sessionName=HIVE-35a0e5c9-ce27-4b27-824c-ce9bc0fe104d, 
> applicationId=application_1506585924598_0001, dagName=insert overwrite table 
> orc_ppd_staging s...s(Stage-1), callerContext={ context=HIVE, 
> callerType=HIVE_QUERY_ID, 
> callerId=hiveptest_20170928010728_58f19d98-85da-4fad-83a7-7bf3aa0252a7 }
> 2017-09-28T01:07:35,259  INFO [11108166-069e-43d7-9e21-25b9214d01a4 main] 
> exec.Task: Dag submit failed due to App master already running a DAG
> {noformat}
> Session continues living and failing like that multiple times.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)