[jira] [Commented] (TEZ-2196) Consider reusing UnorderedPartitionedKVWriter with single output in UnorderedKVOutput
[ https://issues.apache.org/jira/browse/TEZ-2196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14381458#comment-14381458 ] Hadoop QA commented on TEZ-2196: {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12707434/TEZ-2196.4.patch against master revision 2fe2d63. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/349//testReport/ Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/349//console This message is automatically generated. > Consider reusing UnorderedPartitionedKVWriter with single output in > UnorderedKVOutput > - > > Key: TEZ-2196 > URL: https://issues.apache.org/jira/browse/TEZ-2196 > Project: Apache Tez > Issue Type: Improvement >Reporter: Rajesh Balamohan >Assignee: Rajesh Balamohan > Attachments: TEZ-2196.1.patch, TEZ-2196.2.patch, TEZ-2196.3.patch, > TEZ-2196.4.patch > > > Can possibly get rid of FileBasedKVWriter and reuse > UnorderedPartitionedKVWriter with single partition in UnorderedKVOutput. > This can also benefit from pipelined shuffle changes done in > UnorderedPartitionedKVWriter. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Success: TEZ-2196 PreCommit Build #349
Jira: https://issues.apache.org/jira/browse/TEZ-2196 Build: https://builds.apache.org/job/PreCommit-TEZ-Build/349/ ### ## LAST 60 LINES OF THE CONSOLE ### [...truncated 2762 lines...] [INFO] Final Memory: 73M/970M [INFO] {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12707434/TEZ-2196.4.patch against master revision 2fe2d63. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/349//testReport/ Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/349//console This message is automatically generated. == == Adding comment to Jira. == == Comment added. 3a56c319caec4fc6845e2093c40600ce5df14ef3 logged out == == Finished build. == == Archiving artifacts Sending artifact delta relative to PreCommit-TEZ-Build #343 Archived 44 artifacts Archive block size is 32768 Received 2 blocks and 2656212 bytes Compression is 2.4% Took 2 sec Description set: TEZ-2196 Recording test results Email was triggered for: Success Sending email for trigger: Success ### ## FAILED TESTS (if any) ## All tests passed
[jira] [Updated] (TEZ-2213) For the ordered case, enabling pipelined shuffle should automatically disable final merge
[ https://issues.apache.org/jira/browse/TEZ-2213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rajesh Balamohan updated TEZ-2213: -- Attachment: TEZ-2213.1.patch [~sseth] - Plz review when you find sometime. > For the ordered case, enabling pipelined shuffle should automatically disable > final merge > - > > Key: TEZ-2213 > URL: https://issues.apache.org/jira/browse/TEZ-2213 > Project: Apache Tez > Issue Type: Improvement >Reporter: Siddharth Seth >Assignee: Rajesh Balamohan > Attachments: TEZ-2213.1.patch > > > Currently, it ends up throwing an exception. Given the defaults - enabling > pipelined shuffle requires two parameters to be set. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (TEZ-2213) For the ordered case, enabling pipelined shuffle should automatically disable final merge
[ https://issues.apache.org/jira/browse/TEZ-2213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rajesh Balamohan reassigned TEZ-2213: - Assignee: Rajesh Balamohan > For the ordered case, enabling pipelined shuffle should automatically disable > final merge > - > > Key: TEZ-2213 > URL: https://issues.apache.org/jira/browse/TEZ-2213 > Project: Apache Tez > Issue Type: Improvement >Reporter: Siddharth Seth >Assignee: Rajesh Balamohan > > Currently, it ends up throwing an exception. Given the defaults - enabling > pipelined shuffle requires two parameters to be set. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TEZ-2196) Consider reusing UnorderedPartitionedKVWriter with single output in UnorderedKVOutput
[ https://issues.apache.org/jira/browse/TEZ-2196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rajesh Balamohan updated TEZ-2196: -- Attachment: TEZ-2196.4.patch Thanks [~sseth]. Addressed it in the latest patch. Will commit it shortly after pre-commit build passes. > Consider reusing UnorderedPartitionedKVWriter with single output in > UnorderedKVOutput > - > > Key: TEZ-2196 > URL: https://issues.apache.org/jira/browse/TEZ-2196 > Project: Apache Tez > Issue Type: Improvement >Reporter: Rajesh Balamohan >Assignee: Rajesh Balamohan > Attachments: TEZ-2196.1.patch, TEZ-2196.2.patch, TEZ-2196.3.patch, > TEZ-2196.4.patch > > > Can possibly get rid of FileBasedKVWriter and reuse > UnorderedPartitionedKVWriter with single partition in UnorderedKVOutput. > This can also benefit from pipelined shuffle changes done in > UnorderedPartitionedKVWriter. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-2235) Tasks throwing OOM before reaching memory limits
[ https://issues.apache.org/jira/browse/TEZ-2235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14381382#comment-14381382 ] Rajesh Balamohan commented on TEZ-2235: --- Looks more of hive issue. With hive "commit 4e185e7f8a760444aac8117d0088bbd8baa65a6a", it works fine. Will try to find out the issue and move this to hive jira. > Tasks throwing OOM before reaching memory limits > > > Key: TEZ-2235 > URL: https://issues.apache.org/jira/browse/TEZ-2235 > Project: Apache Tez > Issue Type: Bug >Reporter: Rajesh Balamohan > Attachments: Screen Shot 2015-03-26 at 5.04.46 AM.png, Screen Shot > 2015-03-26 at 5.05.06 AM.png > > > - Ran query13 in tpcds with hive (1.2.0-SNAPSHOT) at 10 TB scale with Tez > (0.7 master) > - tez.runtime.io.sort.mb=1800 on 4 GB container. > - OOM was thrown in lots of tasks when allocating memory to sorter. > - Heapdump reveals memory allocated to sorter. And other objects do not take > up that much space. > Need more investigation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-2217) The min-held-containers constraint is not enforced during query runtime
[ https://issues.apache.org/jira/browse/TEZ-2217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14381332#comment-14381332 ] Gopal V commented on TEZ-2217: -- [~bikassaha]: The patch keeps containers alive, which works better with this patch. There's a lot of log-spew with {{LOG.info("Holding onto idle container with no work. CId: "}} in the _post log files. I might take a couple of days to reviewing this, so If [~rajesh.balamohan] can spare some time to review this, we can get this in quickly. > The min-held-containers constraint is not enforced during query runtime > > > Key: TEZ-2217 > URL: https://issues.apache.org/jira/browse/TEZ-2217 > Project: Apache Tez > Issue Type: Bug >Affects Versions: 0.6.0, 0.7.0 >Reporter: Gopal V >Assignee: Bikas Saha > Attachments: TEZ-2217-debug.txt.bz2, TEZ-2217.1.patch, > TEZ-2217.2.patch, TEZ-2217.3.patch, TEZ-2217.txt.bz2 > > > The min-held containers constraint is respected during query idle times, but > is not respected when a query is actually in motion. > The AM releases unused containers during dag execution without checking for > min-held containers. > {code} > 2015-03-20 15:41:53,475 INFO [DelayedContainerManager] > rm.YarnTaskSchedulerService: Container's idle timeout expired. Releasing > container, containerId=container_1424502260528_1348_01_13, > containerExpiryTime=1426891313264, idleTimeoutMin=5000 > 2015-03-20 15:41:53,475 INFO [DelayedContainerManager] > rm.YarnTaskSchedulerService: Releasing unused container: > container_1424502260528_1348_01_13 > {code} > This is actually useful only after the AM has received a soft pre-emption > message, doing it on an idle cluster slows down one of the most common query > patterns in BI systems. > {code} > create temporary table smalltable as ...; > select ... bigtable JOIN smalltable ON ...; > {code} > The smaller query in the beginning throws away the pre-warmed capacity. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-714) OutputCommitters should not run in the main AM dispatcher thread
[ https://issues.apache.org/jira/browse/TEZ-714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14381238#comment-14381238 ] Bikas Saha commented on TEZ-714: typo {code} + private AtomicBoolean commitCancled = new AtomicBoolean(false); boolean commitAllOutputsOnSuccess = true; {code} Most of these should not be ignored because there is a bug is any of these events actually come in during commit. Maybe except vertex manager user code error, which can be ignored. {code} .addTransition(VertexState.COMMITTING, VertexState.COMMITTING, EnumSet.of( VertexEventType.V_MANAGER_USER_CODE_ERROR, VertexEventType.V_ROOT_INPUT_FAILED, VertexEventType.V_SOURCE_VERTEX_STARTED, VertexEventType.V_ROOT_INPUT_INITIALIZED, VertexEventType.V_NULL_EDGE_INITIALIZED, VertexEventType.V_SOURCE_TASK_ATTEMPT_COMPLETED, VertexEventType.V_TASK_ATTEMPT_COMPLETED)){code} Why is this now public? {code} public void abortVertex(final VertexStatus.State finalState) { {code} Where is abort being called on all outputs when the vertex/dag fails (failure could be in commit operation or due to external cause). Should we wait for all outstanding commit operations to get cancelled or complete and then call abort on all outputs? Why is this calling Vertex.abortVertex() instead of directly calling committer.abort() for the outputs? {code}if (commitAllOutputsOnSuccess) { for (Vertex vertex : vertices.values()) { ((VertexImpl)vertex).abortVertex(VertexStatus.State.FAILED); }{code} Why has calling commit operations moved from DAG.finished() to DAG.checkForCompletion()? finished() is expected to be called once but checkForCompletion can be called any number of times. finished() may need to be broken into 2 methods though to separate the parts which should happen after commits are done. In OutputKey there vertexName and groupVertexName can be merged so make the code paths similar. Where needed indicating group can be done via a boolean. {code} for (Map.Entry> entry : commitFutures.entrySet()) { OutputKey outputKey = entry.getKey(); if (outputKey.vertexGroupName != null) { LOG.info("Canceling commit of output:" + outputKey.getOutputName() + " of vertex group:" + outputKey.vertexGroupName); } else { LOG.info("Canceling commit of output:" + outputKey.getOutputName() + " of vertex:" + outputKey.vertexName); }{code} should this be private if its accessed by derived classes? Is CommitCompletedTransition used in the state machine? If not, then it does not need to be a transition.class. {code}// either commitFail or recoveryFail private boolean isFail = false;{code} Why is this directly sending events instead of using a common method? {code} if (super.isFail) { for (Vertex vertex : dag.vertices.values()) { ((VertexImpl)vertex).handle(new VertexEventTermination(vertex.getVertexId(), VertexTerminationCause.OTHER_VERTEX_FAILURE)); } return DAGState.TERMINATING; {code} Why is there no check for whether there are non-zero committers? {code} private synchronized DAGState commitOrFinish() { if (this.committed) { LOG.info("Ignoring multiple output commit/abort"); if (commitFutures.isEmpty() && terminationCause == null) { return finished(DAGState.SUCCEEDED); } else { return getState(); } } LOG.info("Calling DAG commit for dag: " + getID()); this.committed = true; // commit all shared outputs try { appContext.getHistoryHandler().handleCriticalEvent(new DAGHistoryEvent(getID(), new DAGCommitStartedEvent(getID(), clock.getTime(; } catch (IOException e) { LOG.error("Failed to send commit event to history/recovery handler", e); trySetTerminationCause(DAGTerminationCause.RECOVERY_FAILURE); return DAGState.FAILED; {code} Thanks for incorporating the suggestions about the flow. The new code is much simpler, though there may be some issues that may need ironing out if the above comments are valid. Haven't seen the tests yet. > OutputCommitters should not run in the main AM dispatcher thread > > > Key: TEZ-714 > URL: https://issues.apache.org/jira/browse/TEZ-714 > Project: Apache Tez > Issue Type: Improvement >Reporter: Siddharth Seth >Assignee: Jeff Zhang >Priority: Critical > Attachments: DAG_2.pdf, TEZ-714-1.patch, TEZ-714-2.patch, > TEZ-714-3.patch, TEZ-714-4.patch, Vertex_2.pdf > > > Follow up jira from TEZ-41. > 1) If there's multiple OutputCommitters on a Vertex, they can be run i
[jira] [Commented] (TEZ-2103) Implement a Partial completion VertexManagerPlugin
[ https://issues.apache.org/jira/browse/TEZ-2103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14381196#comment-14381196 ] Alok Asok commented on TEZ-2103: Hi So I had a doubt regarding this Short circuit mechanism. Does the Vertex manager keep checking the state of the application through heartbeats till the limit condition is met? If so does it send some specially structured message to the scheduler to close the rest of the sibling task and set their flag a success? How is this ordering done exactly? I was going in through the Tez native umbilical communication protocol and didnt know where to look for specifics. Thanks Alok Asok > Implement a Partial completion VertexManagerPlugin > -- > > Key: TEZ-2103 > URL: https://issues.apache.org/jira/browse/TEZ-2103 > Project: Apache Tez > Issue Type: New Feature >Reporter: Gopal V > Labels: gsoc, gsoc2015, hadoop, java, tez > > Currently, there is no sibling communication between tasks - this implies > that a task can be completed by the first vertex in a wave of tasks, but the > entire wave of tasks has to complete before success can be reported. > This occurs in limit + filter query patterns common between the data access > engines. > {code} > select * from data where x > 1 limit 10; > {code} > will run through a full-table scan worth of tasks to generate 10 rows per > task, to aggregate it to produce the final 10 row result. > The VertexManager receives counters/events early enough to short-circuit the > rest of the vertex tasks, to prevent the remainder of tasks from getting > scheduled when the limit condition has been satisfied by an initial sub-set > of the tasks. > This is a specialization of the VertexManagerPlugin for this common case > scheduling pattern. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-2229) bower ESUDO Cannot be run with sudo -- during build
[ https://issues.apache.org/jira/browse/TEZ-2229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14381159#comment-14381159 ] Fengdong Yu commented on TEZ-2229: -- hi [~pramachandran], can we add some text in tez-ui/README or add some code in maven plugin to regcognize the current user. > bower ESUDO Cannot be run with sudo -- during build > --- > > Key: TEZ-2229 > URL: https://issues.apache.org/jira/browse/TEZ-2229 > Project: Apache Tez > Issue Type: Bug >Affects Versions: 0.6.0 > Environment: Linux x86_64 >Reporter: Fengdong Yu > > I build Tez using root, I never install node/npm locally before my build. > then there are exception messages during build tez-ui module. Maven debug > logs: > {code} > [DEBUG] env: SSH_TTY=/dev/pts/0 > [DEBUG] env: TERM=xterm > [DEBUG] env: USER=root > [DEBUG] env: XFILESEARCHPATH=/usr/dt/app-defaults/%L/Dt > [DEBUG] Toolchains are ignored, 'executable' parameter is set to > /root/temp/apache-tez-0.6.0-src/tez-ui/src/main/webapp/node/node > [DEBUG] Executing command line: > [/root/temp/apache-tez-0.6.0-src/tez-ui/src/main/webapp/node/node, > node_modules/bower/bin/bower, install, --remove-unnecessary-resolutions=false] > bower ESUDO Cannot be run with sudo > Additional error details: > Since bower is a user command, there is no need to execute it with superuser > permissions. > If you're having permission errors when using bower without sudo, please > spend a few minutes learning more about how your system should work and make > any necessary repairs. > http://www.joyent.com/blog/installing-node-and-npm > https://gist.github.com/isaacs/579814 > You can however run a command with sudo using --allow-root option > {code} > {code} > [ERROR] Failed to execute goal org.codehaus.mojo:exec-maven-plugin:1.3.2:exec > (Bower install) on project tez-ui: Command execution failed. Process exited > with an error: 1 (Exit value: 1) -> [ > Help 1]org.apache.maven.lifecycle.LifecycleExecutionException: Failed to > execute goal org.codehaus.mojo:exec-maven-plugin:1.3.2:exec (Bower install) > on project tez-ui: Command execution failed. > at > org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:216) > at > org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:153) > at > org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:145) > at > org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:116) > at > org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:80) > at > org.apache.maven.lifecycle.internal.builder.singlethreaded.SingleThreadedBuilder.build(SingleThreadedBuilder.java:51) > at > org.apache.maven.lifecycle.internal.LifecycleStarter.execute(LifecycleStarter.java:120) > at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:355) > at org.apache.maven.DefaultMaven.execute(DefaultMaven.java:155) > at org.apache.maven.cli.MavenCli.execute(MavenCli.java:584) > at org.apache.maven.cli.MavenCli.doMain(MavenCli.java:216) > at org.apache.maven.cli.MavenCli.main(MavenCli.java:160) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced(Launcher.java:289) > at > org.codehaus.plexus.classworlds.launcher.Launcher.launch(Launcher.java:229) > at > org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode(Launcher.java:415) > at > org.codehaus.plexus.classworlds.launcher.Launcher.main(Launcher.java:356) > Caused by: org.apache.maven.plugin.MojoExecutionException: Command execution > failed. > at org.codehaus.mojo.exec.ExecMojo.execute(ExecMojo.java:303) > at > org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo(DefaultBuildPluginManager.java:132) > at > org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:208) > ... 19 more > Caused by: org.apache.commons.exec.ExecuteException: Process exited with an > error: 1 (Exit value: 1) > at > org.apache.commons.exec.DefaultExecutor.executeInternal(DefaultExecutor.java:402) > at > org.apache.commons.exec.DefaultExecutor.execute(DefaultExecutor.java:164) > at org.codehaus.mojo.exec.ExecMojo.executeCommandLine(ExecMojo.java:746) > at org.codehaus.mojo.exec.ExecMojo.execute(ExecMojo.java:292) > ... 21 more > [ERROR] > [ERROR] > [ERROR] For more
[jira] [Commented] (TEZ-2235) Tasks throwing OOM before reaching memory limits
[ https://issues.apache.org/jira/browse/TEZ-2235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14381110#comment-14381110 ] Rajesh Balamohan commented on TEZ-2235: --- Haven't checked on the other branches. Will check it soon. > Tasks throwing OOM before reaching memory limits > > > Key: TEZ-2235 > URL: https://issues.apache.org/jira/browse/TEZ-2235 > Project: Apache Tez > Issue Type: Bug >Reporter: Rajesh Balamohan > Attachments: Screen Shot 2015-03-26 at 5.04.46 AM.png, Screen Shot > 2015-03-26 at 5.05.06 AM.png > > > - Ran query13 in tpcds with hive (1.2.0-SNAPSHOT) at 10 TB scale with Tez > (0.7 master) > - tez.runtime.io.sort.mb=1800 on 4 GB container. > - OOM was thrown in lots of tasks when allocating memory to sorter. > - Heapdump reveals memory allocated to sorter. And other objects do not take > up that much space. > Need more investigation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-2235) Tasks throwing OOM before reaching memory limits
[ https://issues.apache.org/jira/browse/TEZ-2235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14381107#comment-14381107 ] Hitesh Shah commented on TEZ-2235: -- Is this only with master or also in 0.5/0.6 branches? > Tasks throwing OOM before reaching memory limits > > > Key: TEZ-2235 > URL: https://issues.apache.org/jira/browse/TEZ-2235 > Project: Apache Tez > Issue Type: Bug >Reporter: Rajesh Balamohan > Attachments: Screen Shot 2015-03-26 at 5.04.46 AM.png, Screen Shot > 2015-03-26 at 5.05.06 AM.png > > > - Ran query13 in tpcds with hive (1.2.0-SNAPSHOT) at 10 TB scale with Tez > (0.7 master) > - tez.runtime.io.sort.mb=1800 on 4 GB container. > - OOM was thrown in lots of tasks when allocating memory to sorter. > - Heapdump reveals memory allocated to sorter. And other objects do not take > up that much space. > Need more investigation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TEZ-2235) Tasks throwing OOM before reaching memory limits
[ https://issues.apache.org/jira/browse/TEZ-2235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rajesh Balamohan updated TEZ-2235: -- Attachment: Screen Shot 2015-03-26 at 5.05.06 AM.png Screen Shot 2015-03-26 at 5.04.46 AM.png > Tasks throwing OOM before reaching memory limits > > > Key: TEZ-2235 > URL: https://issues.apache.org/jira/browse/TEZ-2235 > Project: Apache Tez > Issue Type: Bug >Reporter: Rajesh Balamohan > Attachments: Screen Shot 2015-03-26 at 5.04.46 AM.png, Screen Shot > 2015-03-26 at 5.05.06 AM.png > > > - Ran query13 in tpcds with hive (1.2.0-SNAPSHOT) at 10 TB scale with Tez > (0.7 master) > - tez.runtime.io.sort.mb=1800 on 4 GB container. > - OOM was thrown in lots of tasks when allocating memory to sorter. > - Heapdump reveals memory allocated to sorter. And other objects do not take > up that much space. > Need more investigation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (TEZ-2235) Tasks throwing OOM before reaching memory limits
Rajesh Balamohan created TEZ-2235: - Summary: Tasks throwing OOM before reaching memory limits Key: TEZ-2235 URL: https://issues.apache.org/jira/browse/TEZ-2235 Project: Apache Tez Issue Type: Bug Reporter: Rajesh Balamohan - Ran query13 in tpcds with hive (1.2.0-SNAPSHOT) at 10 TB scale with Tez (0.7 master) - tez.runtime.io.sort.mb=1800 on 4 GB container. - OOM was thrown in lots of tasks when allocating memory to sorter. - Heapdump reveals memory allocated to sorter. And other objects do not take up that much space. Need more investigation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-2214) FetcherOrderedGrouped can get stuck indefinitely when MergeManager misses memToDiskMerging
[ https://issues.apache.org/jira/browse/TEZ-2214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14381005#comment-14381005 ] Hitesh Shah commented on TEZ-2214: -- Updated fix versions. > FetcherOrderedGrouped can get stuck indefinitely when MergeManager misses > memToDiskMerging > -- > > Key: TEZ-2214 > URL: https://issues.apache.org/jira/browse/TEZ-2214 > Project: Apache Tez > Issue Type: Bug >Reporter: Rajesh Balamohan >Assignee: Rajesh Balamohan > Fix For: 0.5.4, 0.6.1 > > Attachments: TEZ-2214.1.patch, TEZ-2214.2.patch, TEZ-2214.3.patch > > > Scenario: > - commitMemory & usedMemory are beyond their allowed threshold. > - InMemoryMerge kicks off and is in the process of flushing memory contents > to disk > - As it progresses, it releases memory segments as well (but not yet over). > - Fetchers who need memory < maxSingleShuffleLimit, get scheduled. > - If fetchers are fast, this quickly adds up to commitMemory & usedMemory. > Since InMemoryMerge is already in progress, this wouldn't trigger another > merge(). > - Pretty soon all fetchers would be stalled and get into the following state. > {noformat} > Thread 9351: (state = BLOCKED) > - java.lang.Object.wait(long) @bci=0 (Compiled frame; information may be > imprecise) > - java.lang.Object.wait() @bci=2, line=502 (Compiled frame) > - > org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MergeManager.waitForShuffleToMergeMemory() > @bci=17, line=337 (Interpreted frame) > - > org.apache.tez.runtime.library.common.shuffle.orderedgrouped.FetcherOrderedGrouped.run() > @bci=34, line=157 (Interpreted frame) > {noformat} > - Even if InMemoryMerger completes, "commitedMem & usedMem" are beyond their > threshold and no other fetcher threads (all are in stalled state) are there > to release memory. This causes fetchers to wait indefinitely. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TEZ-2214) FetcherOrderedGrouped can get stuck indefinitely when MergeManager misses memToDiskMerging
[ https://issues.apache.org/jira/browse/TEZ-2214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Shah updated TEZ-2214: - Fix Version/s: 0.6.1 0.5.4 > FetcherOrderedGrouped can get stuck indefinitely when MergeManager misses > memToDiskMerging > -- > > Key: TEZ-2214 > URL: https://issues.apache.org/jira/browse/TEZ-2214 > Project: Apache Tez > Issue Type: Bug >Reporter: Rajesh Balamohan >Assignee: Rajesh Balamohan > Fix For: 0.5.4, 0.6.1 > > Attachments: TEZ-2214.1.patch, TEZ-2214.2.patch, TEZ-2214.3.patch > > > Scenario: > - commitMemory & usedMemory are beyond their allowed threshold. > - InMemoryMerge kicks off and is in the process of flushing memory contents > to disk > - As it progresses, it releases memory segments as well (but not yet over). > - Fetchers who need memory < maxSingleShuffleLimit, get scheduled. > - If fetchers are fast, this quickly adds up to commitMemory & usedMemory. > Since InMemoryMerge is already in progress, this wouldn't trigger another > merge(). > - Pretty soon all fetchers would be stalled and get into the following state. > {noformat} > Thread 9351: (state = BLOCKED) > - java.lang.Object.wait(long) @bci=0 (Compiled frame; information may be > imprecise) > - java.lang.Object.wait() @bci=2, line=502 (Compiled frame) > - > org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MergeManager.waitForShuffleToMergeMemory() > @bci=17, line=337 (Interpreted frame) > - > org.apache.tez.runtime.library.common.shuffle.orderedgrouped.FetcherOrderedGrouped.run() > @bci=34, line=157 (Interpreted frame) > {noformat} > - Even if InMemoryMerger completes, "commitedMem & usedMem" are beyond their > threshold and no other fetcher threads (all are in stalled state) are there > to release memory. This causes fetchers to wait indefinitely. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (TEZ-2214) FetcherOrderedGrouped can get stuck indefinitely when MergeManager misses memToDiskMerging
[ https://issues.apache.org/jira/browse/TEZ-2214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14380980#comment-14380980 ] Rajesh Balamohan edited comment on TEZ-2214 at 3/25/15 11:01 PM: - Thanks [~sseth], [~hitesh]. Committed .3 version of patch to master, branch-0.6 and branch-0.5 >> commit 2fe2d63529b3fb420c15d4be6bbf50d501edb626 >> was (Author: rajesh.balamohan): Thanks [~sseth], [~hitesh]. Committed to master, branch-0.6 and branch-0.5 >> commit 2fe2d63529b3fb420c15d4be6bbf50d501edb626 >> > FetcherOrderedGrouped can get stuck indefinitely when MergeManager misses > memToDiskMerging > -- > > Key: TEZ-2214 > URL: https://issues.apache.org/jira/browse/TEZ-2214 > Project: Apache Tez > Issue Type: Bug >Reporter: Rajesh Balamohan >Assignee: Rajesh Balamohan > Attachments: TEZ-2214.1.patch, TEZ-2214.2.patch, TEZ-2214.3.patch > > > Scenario: > - commitMemory & usedMemory are beyond their allowed threshold. > - InMemoryMerge kicks off and is in the process of flushing memory contents > to disk > - As it progresses, it releases memory segments as well (but not yet over). > - Fetchers who need memory < maxSingleShuffleLimit, get scheduled. > - If fetchers are fast, this quickly adds up to commitMemory & usedMemory. > Since InMemoryMerge is already in progress, this wouldn't trigger another > merge(). > - Pretty soon all fetchers would be stalled and get into the following state. > {noformat} > Thread 9351: (state = BLOCKED) > - java.lang.Object.wait(long) @bci=0 (Compiled frame; information may be > imprecise) > - java.lang.Object.wait() @bci=2, line=502 (Compiled frame) > - > org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MergeManager.waitForShuffleToMergeMemory() > @bci=17, line=337 (Interpreted frame) > - > org.apache.tez.runtime.library.common.shuffle.orderedgrouped.FetcherOrderedGrouped.run() > @bci=34, line=157 (Interpreted frame) > {noformat} > - Even if InMemoryMerger completes, "commitedMem & usedMem" are beyond their > threshold and no other fetcher threads (all are in stalled state) are there > to release memory. This causes fetchers to wait indefinitely. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-2214) FetcherOrderedGrouped can get stuck indefinitely when MergeManager misses memToDiskMerging
[ https://issues.apache.org/jira/browse/TEZ-2214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14380911#comment-14380911 ] Siddharth Seth commented on TEZ-2214: - +1 on either the .2 or .3 patch btw. > FetcherOrderedGrouped can get stuck indefinitely when MergeManager misses > memToDiskMerging > -- > > Key: TEZ-2214 > URL: https://issues.apache.org/jira/browse/TEZ-2214 > Project: Apache Tez > Issue Type: Bug >Reporter: Rajesh Balamohan >Assignee: Rajesh Balamohan > Attachments: TEZ-2214.1.patch, TEZ-2214.2.patch, TEZ-2214.3.patch > > > Scenario: > - commitMemory & usedMemory are beyond their allowed threshold. > - InMemoryMerge kicks off and is in the process of flushing memory contents > to disk > - As it progresses, it releases memory segments as well (but not yet over). > - Fetchers who need memory < maxSingleShuffleLimit, get scheduled. > - If fetchers are fast, this quickly adds up to commitMemory & usedMemory. > Since InMemoryMerge is already in progress, this wouldn't trigger another > merge(). > - Pretty soon all fetchers would be stalled and get into the following state. > {noformat} > Thread 9351: (state = BLOCKED) > - java.lang.Object.wait(long) @bci=0 (Compiled frame; information may be > imprecise) > - java.lang.Object.wait() @bci=2, line=502 (Compiled frame) > - > org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MergeManager.waitForShuffleToMergeMemory() > @bci=17, line=337 (Interpreted frame) > - > org.apache.tez.runtime.library.common.shuffle.orderedgrouped.FetcherOrderedGrouped.run() > @bci=34, line=157 (Interpreted frame) > {noformat} > - Even if InMemoryMerger completes, "commitedMem & usedMem" are beyond their > threshold and no other fetcher threads (all are in stalled state) are there > to release memory. This causes fetchers to wait indefinitely. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-2217) The min-held-containers constraint is not enforced during query runtime
[ https://issues.apache.org/jira/browse/TEZ-2217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14380829#comment-14380829 ] Hadoop QA commented on TEZ-2217: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12707314/TEZ-2217.3.patch against master revision d1b4bd4. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in : org.apache.tez.dag.app.rm.TestContainerReuse Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/348//testReport/ Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/348//console This message is automatically generated. > The min-held-containers constraint is not enforced during query runtime > > > Key: TEZ-2217 > URL: https://issues.apache.org/jira/browse/TEZ-2217 > Project: Apache Tez > Issue Type: Bug >Affects Versions: 0.6.0, 0.7.0 >Reporter: Gopal V >Assignee: Bikas Saha > Attachments: TEZ-2217-debug.txt.bz2, TEZ-2217.1.patch, > TEZ-2217.2.patch, TEZ-2217.3.patch, TEZ-2217.txt.bz2 > > > The min-held containers constraint is respected during query idle times, but > is not respected when a query is actually in motion. > The AM releases unused containers during dag execution without checking for > min-held containers. > {code} > 2015-03-20 15:41:53,475 INFO [DelayedContainerManager] > rm.YarnTaskSchedulerService: Container's idle timeout expired. Releasing > container, containerId=container_1424502260528_1348_01_13, > containerExpiryTime=1426891313264, idleTimeoutMin=5000 > 2015-03-20 15:41:53,475 INFO [DelayedContainerManager] > rm.YarnTaskSchedulerService: Releasing unused container: > container_1424502260528_1348_01_13 > {code} > This is actually useful only after the AM has received a soft pre-emption > message, doing it on an idle cluster slows down one of the most common query > patterns in BI systems. > {code} > create temporary table smalltable as ...; > select ... bigtable JOIN smalltable ON ...; > {code} > The smaller query in the beginning throws away the pre-warmed capacity. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Failed: TEZ-2217 PreCommit Build #348
Jira: https://issues.apache.org/jira/browse/TEZ-2217 Build: https://builds.apache.org/job/PreCommit-TEZ-Build/348/ ### ## LAST 60 LINES OF THE CONSOLE ### [...truncated 2717 lines...] {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12707314/TEZ-2217.3.patch against master revision d1b4bd4. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in : org.apache.tez.dag.app.rm.TestContainerReuse Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/348//testReport/ Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/348//console This message is automatically generated. == == Adding comment to Jira. == == Comment added. 512fae301497059df6a11d72ea59315461b071e1 logged out == == Finished build. == == Build step 'Execute shell' marked build as failure Archiving artifacts Sending artifact delta relative to PreCommit-TEZ-Build #343 Archived 44 artifacts Archive block size is 32768 Received 2 blocks and 2653407 bytes Compression is 2.4% Took 1 sec [description-setter] Could not determine description. Recording test results Email was triggered for: Failure Sending email for trigger: Failure ### ## FAILED TESTS (if any) ## 2 tests failed. REGRESSION: org.apache.tez.dag.app.rm.TestContainerReuse.testSimpleReuse Error Message: Wanted but not invoked: aMRMClientAsyncForTest.releaseAssignedContainer( container_1_0001_01_02 ); -> at org.apache.tez.dag.app.rm.TestContainerReuse.testSimpleReuse(TestContainerReuse.java:488) However, there were other interactions with this mock: aMRMClientAsyncForTest.init( Configuration: core-default.xml, core-site.xml, yarn-default.xml, yarn-site.xml ); -> at org.apache.tez.dag.app.rm.TestContainerReuse.testSimpleReuse(TestContainerReuse.java:396) aMRMClientAsyncForTest.setConfig( Configuration: core-default.xml, core-site.xml, yarn-default.xml, yarn-site.xml ); -> at org.apache.tez.dag.app.rm.TestContainerReuse.testSimpleReuse(TestContainerReuse.java:396) aMRMClientAsyncForTest.serviceInit( Configuration: core-default.xml, core-site.xml, yarn-default.xml, yarn-site.xml ); -> at org.apache.tez.dag.app.rm.TestContainerReuse.testSimpleReuse(TestContainerReuse.java:396) aMRMClientAsyncForTest.setHeartbeatInterval( 1000 ); -> at org.apache.tez.dag.app.rm.TestContainerReuse.testSimpleReuse(TestContainerReuse.java:396) aMRMClientAsyncForTest.start(); -> at org.apache.tez.dag.app.rm.TestContainerReuse.testSimpleReuse(TestContainerReuse.java:396) aMRMClientAsyncForTest.serviceStart(); -> at org.apache.tez.dag.app.rm.TestContainerReuse.testSimpleReuse(TestContainerReuse.java:396) aMRMClientAsyncForTest.registerApplicationMaster( "host", 0, "" ); -> at org.apache.tez.dag.app.rm.TestContainerReuse.testSimpleReuse(TestContainerReuse.java:396) aMRMClientAsyncForTest.addContainerRequest( Capability[]Priority[1] ); -> at org.apache.tez.dag.app.rm.TestContainerReuse.testSimpleReuse(TestContainerReuse.java:433) aMRMClientAsyncForTest.addContainerRequest( Capability[]Priority[1] ); -> at org.apache.tez.dag.app.rm.TestContainerReuse.testSimpleReuse(TestContainerReuse.java:434) aMRMClientAsyncForTest.addContainerRequest( Capability[]Priority[1] ); -> at org.apache.tez.dag.app.rm.TestContainerReuse.testSimpleReuse(TestContainerReuse.java:435) aMRMClientAsyncForTest.addContainerRequest( Capability[]Priority[1] ); -> at org.apache.tez.dag.app.rm.TestCon
[jira] [Created] (TEZ-2234) Allow vertex managers to get output size per source vertex
Bikas Saha created TEZ-2234: --- Summary: Allow vertex managers to get output size per source vertex Key: TEZ-2234 URL: https://issues.apache.org/jira/browse/TEZ-2234 Project: Apache Tez Issue Type: Bug Reporter: Bikas Saha Assignee: Bikas Saha Vertex managers may need per source vertex output stats to make reconfiguration decisions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (TEZ-2233) setParallelism should allow setting built-in edge managers
Bikas Saha created TEZ-2233: --- Summary: setParallelism should allow setting built-in edge managers Key: TEZ-2233 URL: https://issues.apache.org/jira/browse/TEZ-2233 Project: Apache Tez Issue Type: Bug Reporter: Bikas Saha Assignee: Bikas Saha Currently, all edge managers set during setParallelism end up becoming custom edges. However, just like during dag creation, it should be possible to specify standard edge types like scatter_gather if that is what the final user decision is. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-2196) Consider reusing UnorderedPartitionedKVWriter with single output in UnorderedKVOutput
[ https://issues.apache.org/jira/browse/TEZ-2196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14380752#comment-14380752 ] Siddharth Seth commented on TEZ-2196: - Looks good. Minor: getInitialMemoryRequirement - this should return 0 when pipelining is disabled, and numPartitions=1 since buffers aren't being used. Otherwise we unnecessarily penalize other Inputs / Outputs which may exist on the vertex. > Consider reusing UnorderedPartitionedKVWriter with single output in > UnorderedKVOutput > - > > Key: TEZ-2196 > URL: https://issues.apache.org/jira/browse/TEZ-2196 > Project: Apache Tez > Issue Type: Improvement >Reporter: Rajesh Balamohan >Assignee: Rajesh Balamohan > Attachments: TEZ-2196.1.patch, TEZ-2196.2.patch, TEZ-2196.3.patch > > > Can possibly get rid of FileBasedKVWriter and reuse > UnorderedPartitionedKVWriter with single partition in UnorderedKVOutput. > This can also benefit from pipelined shuffle changes done in > UnorderedPartitionedKVWriter. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (TEZ-2232) Allow setParallelism to be called multiple times before tasks get scheduled
Bikas Saha created TEZ-2232: --- Summary: Allow setParallelism to be called multiple times before tasks get scheduled Key: TEZ-2232 URL: https://issues.apache.org/jira/browse/TEZ-2232 Project: Apache Tez Issue Type: Bug Reporter: Bikas Saha Assignee: Bikas Saha Currently, this is allowed only once currently. It is harder to support this after the vertex tasks have already started running. But allowing it before tasks start running is actually trivial. This just allows VertexManagers to change their minds multiple times before they start the vertex processing. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TEZ-2217) The min-held-containers constraint is not enforced during query runtime
[ https://issues.apache.org/jira/browse/TEZ-2217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bikas Saha updated TEZ-2217: Attachment: TEZ-2217.3.patch > The min-held-containers constraint is not enforced during query runtime > > > Key: TEZ-2217 > URL: https://issues.apache.org/jira/browse/TEZ-2217 > Project: Apache Tez > Issue Type: Bug >Affects Versions: 0.6.0, 0.7.0 >Reporter: Gopal V >Assignee: Bikas Saha > Attachments: TEZ-2217-debug.txt.bz2, TEZ-2217.1.patch, > TEZ-2217.2.patch, TEZ-2217.3.patch, TEZ-2217.txt.bz2 > > > The min-held containers constraint is respected during query idle times, but > is not respected when a query is actually in motion. > The AM releases unused containers during dag execution without checking for > min-held containers. > {code} > 2015-03-20 15:41:53,475 INFO [DelayedContainerManager] > rm.YarnTaskSchedulerService: Container's idle timeout expired. Releasing > container, containerId=container_1424502260528_1348_01_13, > containerExpiryTime=1426891313264, idleTimeoutMin=5000 > 2015-03-20 15:41:53,475 INFO [DelayedContainerManager] > rm.YarnTaskSchedulerService: Releasing unused container: > container_1424502260528_1348_01_13 > {code} > This is actually useful only after the AM has received a soft pre-emption > message, doing it on an idle cluster slows down one of the most common query > patterns in BI systems. > {code} > create temporary table smalltable as ...; > select ... bigtable JOIN smalltable ON ...; > {code} > The smaller query in the beginning throws away the pre-warmed capacity. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TEZ-2231) Create project by-laws
[ https://issues.apache.org/jira/browse/TEZ-2231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Shah updated TEZ-2231: - Attachment: (was: by-laws.patch.2) > Create project by-laws > -- > > Key: TEZ-2231 > URL: https://issues.apache.org/jira/browse/TEZ-2231 > Project: Apache Tez > Issue Type: Task >Reporter: Hitesh Shah >Assignee: Hitesh Shah > Attachments: by-laws.2.patch, by-laws.patch > > > Define the Project by-laws. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TEZ-2231) Create project by-laws
[ https://issues.apache.org/jira/browse/TEZ-2231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Shah updated TEZ-2231: - Attachment: by-laws.2.patch > Create project by-laws > -- > > Key: TEZ-2231 > URL: https://issues.apache.org/jira/browse/TEZ-2231 > Project: Apache Tez > Issue Type: Task >Reporter: Hitesh Shah >Assignee: Hitesh Shah > Attachments: by-laws.2.patch, by-laws.patch > > > Define the Project by-laws. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TEZ-2231) Create project by-laws
[ https://issues.apache.org/jira/browse/TEZ-2231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Shah updated TEZ-2231: - Attachment: by-laws.patch.2 Minor modification to lazy approval to clarify that a minimum of 1 +1 vote is needed. > Create project by-laws > -- > > Key: TEZ-2231 > URL: https://issues.apache.org/jira/browse/TEZ-2231 > Project: Apache Tez > Issue Type: Task >Reporter: Hitesh Shah >Assignee: Hitesh Shah > Attachments: by-laws.patch, by-laws.patch.2 > > > Define the Project by-laws. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TEZ-2217) The min-held-containers constraint is not enforced during query runtime
[ https://issues.apache.org/jira/browse/TEZ-2217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bikas Saha updated TEZ-2217: Attachment: (was: TEZ-2217.3.patch) > The min-held-containers constraint is not enforced during query runtime > > > Key: TEZ-2217 > URL: https://issues.apache.org/jira/browse/TEZ-2217 > Project: Apache Tez > Issue Type: Bug >Affects Versions: 0.6.0, 0.7.0 >Reporter: Gopal V >Assignee: Bikas Saha > Attachments: TEZ-2217-debug.txt.bz2, TEZ-2217.1.patch, > TEZ-2217.2.patch, TEZ-2217.txt.bz2 > > > The min-held containers constraint is respected during query idle times, but > is not respected when a query is actually in motion. > The AM releases unused containers during dag execution without checking for > min-held containers. > {code} > 2015-03-20 15:41:53,475 INFO [DelayedContainerManager] > rm.YarnTaskSchedulerService: Container's idle timeout expired. Releasing > container, containerId=container_1424502260528_1348_01_13, > containerExpiryTime=1426891313264, idleTimeoutMin=5000 > 2015-03-20 15:41:53,475 INFO [DelayedContainerManager] > rm.YarnTaskSchedulerService: Releasing unused container: > container_1424502260528_1348_01_13 > {code} > This is actually useful only after the AM has received a soft pre-emption > message, doing it on an idle cluster slows down one of the most common query > patterns in BI systems. > {code} > create temporary table smalltable as ...; > select ... bigtable JOIN smalltable ON ...; > {code} > The smaller query in the beginning throws away the pre-warmed capacity. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TEZ-2217) The min-held-containers constraint is not enforced during query runtime
[ https://issues.apache.org/jira/browse/TEZ-2217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bikas Saha updated TEZ-2217: Attachment: TEZ-2217.3.patch > The min-held-containers constraint is not enforced during query runtime > > > Key: TEZ-2217 > URL: https://issues.apache.org/jira/browse/TEZ-2217 > Project: Apache Tez > Issue Type: Bug >Affects Versions: 0.6.0, 0.7.0 >Reporter: Gopal V >Assignee: Bikas Saha > Attachments: TEZ-2217-debug.txt.bz2, TEZ-2217.1.patch, > TEZ-2217.2.patch, TEZ-2217.3.patch, TEZ-2217.txt.bz2 > > > The min-held containers constraint is respected during query idle times, but > is not respected when a query is actually in motion. > The AM releases unused containers during dag execution without checking for > min-held containers. > {code} > 2015-03-20 15:41:53,475 INFO [DelayedContainerManager] > rm.YarnTaskSchedulerService: Container's idle timeout expired. Releasing > container, containerId=container_1424502260528_1348_01_13, > containerExpiryTime=1426891313264, idleTimeoutMin=5000 > 2015-03-20 15:41:53,475 INFO [DelayedContainerManager] > rm.YarnTaskSchedulerService: Releasing unused container: > container_1424502260528_1348_01_13 > {code} > This is actually useful only after the AM has received a soft pre-emption > message, doing it on an idle cluster slows down one of the most common query > patterns in BI systems. > {code} > create temporary table smalltable as ...; > select ... bigtable JOIN smalltable ON ...; > {code} > The smaller query in the beginning throws away the pre-warmed capacity. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (TEZ-2231) Create project by-laws
Hitesh Shah created TEZ-2231: Summary: Create project by-laws Key: TEZ-2231 URL: https://issues.apache.org/jira/browse/TEZ-2231 Project: Apache Tez Issue Type: Task Reporter: Hitesh Shah Assignee: Hitesh Shah Attachments: by-laws.patch Define the Project by-laws. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TEZ-2231) Create project by-laws
[ https://issues.apache.org/jira/browse/TEZ-2231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Shah updated TEZ-2231: - Attachment: by-laws.patch > Create project by-laws > -- > > Key: TEZ-2231 > URL: https://issues.apache.org/jira/browse/TEZ-2231 > Project: Apache Tez > Issue Type: Task >Reporter: Hitesh Shah >Assignee: Hitesh Shah > Attachments: by-laws.patch > > > Define the Project by-laws. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-2214) FetcherOrderedGrouped can get stuck indefinitely when MergeManager misses memToDiskMerging
[ https://issues.apache.org/jira/browse/TEZ-2214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14380398#comment-14380398 ] Siddharth Seth commented on TEZ-2214: - I think both - the .2 and .3 patch - are good. As long as there's no other entity which is reserving memory. i.e. the MemToMemMerger may just become a little more complicated, or if we ever support data via events. A fetcher will always trigger the MemToDiskMerger - and then go and wait on waitForInMemoryMerge, followed by waitForShuffleToMergeMemory. If the data fetched by this Fetcher triggered a merge - it'll always wait and re-check to see if another merge is required. If the data fetched did not trigger a merge (and a merge wasn't in progress) - memory limits haven't been hit, and a future fetch would trigger this. > FetcherOrderedGrouped can get stuck indefinitely when MergeManager misses > memToDiskMerging > -- > > Key: TEZ-2214 > URL: https://issues.apache.org/jira/browse/TEZ-2214 > Project: Apache Tez > Issue Type: Bug >Reporter: Rajesh Balamohan >Assignee: Rajesh Balamohan > Attachments: TEZ-2214.1.patch, TEZ-2214.2.patch, TEZ-2214.3.patch > > > Scenario: > - commitMemory & usedMemory are beyond their allowed threshold. > - InMemoryMerge kicks off and is in the process of flushing memory contents > to disk > - As it progresses, it releases memory segments as well (but not yet over). > - Fetchers who need memory < maxSingleShuffleLimit, get scheduled. > - If fetchers are fast, this quickly adds up to commitMemory & usedMemory. > Since InMemoryMerge is already in progress, this wouldn't trigger another > merge(). > - Pretty soon all fetchers would be stalled and get into the following state. > {noformat} > Thread 9351: (state = BLOCKED) > - java.lang.Object.wait(long) @bci=0 (Compiled frame; information may be > imprecise) > - java.lang.Object.wait() @bci=2, line=502 (Compiled frame) > - > org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MergeManager.waitForShuffleToMergeMemory() > @bci=17, line=337 (Interpreted frame) > - > org.apache.tez.runtime.library.common.shuffle.orderedgrouped.FetcherOrderedGrouped.run() > @bci=34, line=157 (Interpreted frame) > {noformat} > - Even if InMemoryMerger completes, "commitedMem & usedMem" are beyond their > threshold and no other fetcher threads (all are in stalled state) are there > to release memory. This causes fetchers to wait indefinitely. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (TEZ-2103) Implement a Partial completion VertexManagerPlugin
[ https://issues.apache.org/jira/browse/TEZ-2103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14380376#comment-14380376 ] Hitesh Shah edited comment on TEZ-2103 at 3/25/15 6:06 PM: --- Hi [~alokasok], sorry for the late reply. The use-case is somewhat along these lines. Assume that there is a SQL query being run to do something like "select * from table where limit 10;". This can be broken down into a single stage DAG where each task runs on a partition of data and completes as soon as it has say 10 records that match the filter clauses. Now, if the overall data is huge i.e. multiple TBs and needs 100,000 tasks to scan all the data, running 100,000 tasks should not be necessary if the first task to complete returned 10 results. VertexManagers are sort of the vertex controllers. They control scheduling of tasks ( when to start running tasks, whether to enable slow start i.e. wait for upstream data to be ready before starting tasks ), locality management, vertex parallelism, etc. The VertexManagerPlugin is a user-provided plugin to modify the above behavior. If you have read up on Tez and take a more detailed look into the VertexManager code, the VertexManager could be made a bit more powerful to be able to trigger completion of a vertex sooner without needing to run all the tasks if certain conditions get matched earlier. The changes will likely also encompass the Vertex(Impl) state machine in terms of how you treat short-circuited tasks so that all the correct bookkeeping is done from a state management point of view. was (Author: hitesh): Hi [~alokasok], sorry for the late reply. The use-case is somewhat along these lines. Assume that there is a SQL query being run to do something like "select * from table where limit 10;". This can be broken down into a single stage DAG where each task runs on a partition of data and completes as soon as it has say 10 records that match the filter clauses. Now, if the overall data is huge i.e. multiple TBs and needs 100,000 tasks to scan all the data, running 100,000 tasks should not be necessary if the first task to complete returned 10 results. VertexManagers are sort of the vertex controllers. They control scheduling of tasks ( when to start running tasks, whether to enable slow start i.e. wait for upstream data to be ready before starting tasks ), locality management, vertex parallelism. The VertexManagerPlugin is a user-provided plugin to modify the above behavior. If you have read up on Tez and take a more detailed look into the VertexManager code, the VertexManager could be made a bit more powerful to be able to trigger completion of a vertex sooner without needing to run all the tasks if certain conditions get matched earlier. The changes will likely also encompass the Vertex(Impl) state machine in terms of how you treat short-circuited tasks so that all the correct bookkeeping is done from a state management point of view. > Implement a Partial completion VertexManagerPlugin > -- > > Key: TEZ-2103 > URL: https://issues.apache.org/jira/browse/TEZ-2103 > Project: Apache Tez > Issue Type: New Feature >Reporter: Gopal V > Labels: gsoc, gsoc2015, hadoop, java, tez > > Currently, there is no sibling communication between tasks - this implies > that a task can be completed by the first vertex in a wave of tasks, but the > entire wave of tasks has to complete before success can be reported. > This occurs in limit + filter query patterns common between the data access > engines. > {code} > select * from data where x > 1 limit 10; > {code} > will run through a full-table scan worth of tasks to generate 10 rows per > task, to aggregate it to produce the final 10 row result. > The VertexManager receives counters/events early enough to short-circuit the > rest of the vertex tasks, to prevent the remainder of tasks from getting > scheduled when the limit condition has been satisfied by an initial sub-set > of the tasks. > This is a specialization of the VertexManagerPlugin for this common case > scheduling pattern. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-2103) Implement a Partial completion VertexManagerPlugin
[ https://issues.apache.org/jira/browse/TEZ-2103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14380376#comment-14380376 ] Hitesh Shah commented on TEZ-2103: -- Hi [~alokasok], sorry for the late reply. The use-case is somewhat along these lines. Assume that there is a SQL query being run to do something like "select * from table where limit 10;". This can be broken down into a single stage DAG where each task runs on a partition of data and completes as soon as it has say 10 records that match the filter clauses. Now, if the overall data is huge i.e. multiple TBs and needs 100,000 tasks to scan all the data, running 100,000 tasks should not be necessary if the first task to complete returned 10 results. VertexManagers are sort of the vertex controllers. They control scheduling of tasks ( when to start running tasks, whether to enable slow start i.e. wait for upstream data to be ready before starting tasks ), locality management, vertex parallelism. The VertexManagerPlugin is a user-provided plugin to modify the above behavior. If you have read up on Tez and take a more detailed look into the VertexManager code, the VertexManager could be made a bit more powerful to be able to trigger completion of a vertex sooner without needing to run all the tasks if certain conditions get matched earlier. The changes will likely also encompass the Vertex(Impl) state machine in terms of how you treat short-circuited tasks so that all the correct bookkeeping is done from a state management point of view. > Implement a Partial completion VertexManagerPlugin > -- > > Key: TEZ-2103 > URL: https://issues.apache.org/jira/browse/TEZ-2103 > Project: Apache Tez > Issue Type: New Feature >Reporter: Gopal V > Labels: gsoc, gsoc2015, hadoop, java, tez > > Currently, there is no sibling communication between tasks - this implies > that a task can be completed by the first vertex in a wave of tasks, but the > entire wave of tasks has to complete before success can be reported. > This occurs in limit + filter query patterns common between the data access > engines. > {code} > select * from data where x > 1 limit 10; > {code} > will run through a full-table scan worth of tasks to generate 10 rows per > task, to aggregate it to produce the final 10 row result. > The VertexManager receives counters/events early enough to short-circuit the > rest of the vertex tasks, to prevent the remainder of tasks from getting > scheduled when the limit condition has been satisfied by an initial sub-set > of the tasks. > This is a specialization of the VertexManagerPlugin for this common case > scheduling pattern. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (TEZ-2205) Tez still tries to post to ATS when yarn.timeline-service.enabled=false
[ https://issues.apache.org/jira/browse/TEZ-2205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14380270#comment-14380270 ] Hitesh Shah edited comment on TEZ-2205 at 3/25/15 5:57 PM: --- Adding a test for this would be useful both the acl policy manager and the logging service. was (Author: hitesh): Adding a test for this would be useful. > Tez still tries to post to ATS when yarn.timeline-service.enabled=false > --- > > Key: TEZ-2205 > URL: https://issues.apache.org/jira/browse/TEZ-2205 > Project: Apache Tez > Issue Type: Sub-task >Affects Versions: 0.6.1 >Reporter: Chang Li >Assignee: Chang Li > Attachments: TEZ-2205.1.patch, TEZ-2205.wip.patch > > > when set yarn.timeline-service.enabled=false, Tez still tries posting to ATS, > but hits error as token is not found. Does not fail the job because of the > fix to not fail job when there is error posting to ATS. But it should not be > trying to post to ATS in the first place. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (TEZ-2230) Speculative attempt should not have the original attempts machine in its preferred locations
Bikas Saha created TEZ-2230: --- Summary: Speculative attempt should not have the original attempts machine in its preferred locations Key: TEZ-2230 URL: https://issues.apache.org/jira/browse/TEZ-2230 Project: Apache Tez Issue Type: Sub-task Affects Versions: 0.6.0 Reporter: Bikas Saha Assignee: Bikas Saha -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-2205) Tez still tries to post to ATS when yarn.timeline-service.enabled=false
[ https://issues.apache.org/jira/browse/TEZ-2205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14380270#comment-14380270 ] Hitesh Shah commented on TEZ-2205: -- Adding a test for this would be useful. > Tez still tries to post to ATS when yarn.timeline-service.enabled=false > --- > > Key: TEZ-2205 > URL: https://issues.apache.org/jira/browse/TEZ-2205 > Project: Apache Tez > Issue Type: Sub-task >Affects Versions: 0.6.1 >Reporter: Chang Li >Assignee: Chang Li > Attachments: TEZ-2205.1.patch, TEZ-2205.wip.patch > > > when set yarn.timeline-service.enabled=false, Tez still tries posting to ATS, > but hits error as token is not found. Does not fail the job because of the > fix to not fail job when there is error posting to ATS. But it should not be > trying to post to ATS in the first place. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-2205) Tez still tries to post to ATS when yarn.timeline-service.enabled=false
[ https://issues.apache.org/jira/browse/TEZ-2205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14380267#comment-14380267 ] Hitesh Shah commented on TEZ-2205: -- {code} LOG.warn("Timeline service is not enabled"); {code} - the log should be more clear about the ATSLogging service/acl manager being disabled as the timeline service being disabled The changes in ATSHistoryLoggingService could be more optimal. If it is disabled, why even bother queueing up events? > Tez still tries to post to ATS when yarn.timeline-service.enabled=false > --- > > Key: TEZ-2205 > URL: https://issues.apache.org/jira/browse/TEZ-2205 > Project: Apache Tez > Issue Type: Sub-task >Affects Versions: 0.6.1 >Reporter: Chang Li >Assignee: Chang Li > Attachments: TEZ-2205.1.patch, TEZ-2205.wip.patch > > > when set yarn.timeline-service.enabled=false, Tez still tries posting to ATS, > but hits error as token is not found. Does not fail the job because of the > fix to not fail job when there is error posting to ATS. But it should not be > trying to post to ATS in the first place. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-2205) Tez still tries to post to ATS when yarn.timeline-service.enabled=false
[ https://issues.apache.org/jira/browse/TEZ-2205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14380268#comment-14380268 ] Hadoop QA commented on TEZ-2205: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12707237/TEZ-2205.1.patch against master revision d1b4bd4. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/347//testReport/ Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/347//console This message is automatically generated. > Tez still tries to post to ATS when yarn.timeline-service.enabled=false > --- > > Key: TEZ-2205 > URL: https://issues.apache.org/jira/browse/TEZ-2205 > Project: Apache Tez > Issue Type: Sub-task >Affects Versions: 0.6.1 >Reporter: Chang Li >Assignee: Chang Li > Attachments: TEZ-2205.1.patch, TEZ-2205.wip.patch > > > when set yarn.timeline-service.enabled=false, Tez still tries posting to ATS, > but hits error as token is not found. Does not fail the job because of the > fix to not fail job when there is error posting to ATS. But it should not be > trying to post to ATS in the first place. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Failed: TEZ-2205 PreCommit Build #347
Jira: https://issues.apache.org/jira/browse/TEZ-2205 Build: https://builds.apache.org/job/PreCommit-TEZ-Build/347/ ### ## LAST 60 LINES OF THE CONSOLE ### [...truncated 2752 lines...] {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12707237/TEZ-2205.1.patch against master revision d1b4bd4. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/347//testReport/ Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/347//console This message is automatically generated. == == Adding comment to Jira. == == Comment added. cb01236764c0e9affe953e1793bb1fec9ce79c15 logged out == == Finished build. == == Build step 'Execute shell' marked build as failure Archiving artifacts Sending artifact delta relative to PreCommit-TEZ-Build #343 Archived 44 artifacts Archive block size is 32768 Received 4 blocks and 2597651 bytes Compression is 4.8% Took 1.3 sec [description-setter] Could not determine description. Recording test results Email was triggered for: Failure Sending email for trigger: Failure ### ## FAILED TESTS (if any) ## All tests passed
[jira] [Commented] (TEZ-2214) FetcherOrderedGrouped can get stuck indefinitely when MergeManager misses memToDiskMerging
[ https://issues.apache.org/jira/browse/TEZ-2214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14380255#comment-14380255 ] Hitesh Shah commented on TEZ-2214: -- Question for patch 3 with respect to waitForInMemoryMerge(). Does this need to have only 2 runs of the merger? Or should this be a loop? Will there ever be a case where the same situation comes about when the second merge is in progress? > FetcherOrderedGrouped can get stuck indefinitely when MergeManager misses > memToDiskMerging > -- > > Key: TEZ-2214 > URL: https://issues.apache.org/jira/browse/TEZ-2214 > Project: Apache Tez > Issue Type: Bug >Reporter: Rajesh Balamohan >Assignee: Rajesh Balamohan > Attachments: TEZ-2214.1.patch, TEZ-2214.2.patch, TEZ-2214.3.patch > > > Scenario: > - commitMemory & usedMemory are beyond their allowed threshold. > - InMemoryMerge kicks off and is in the process of flushing memory contents > to disk > - As it progresses, it releases memory segments as well (but not yet over). > - Fetchers who need memory < maxSingleShuffleLimit, get scheduled. > - If fetchers are fast, this quickly adds up to commitMemory & usedMemory. > Since InMemoryMerge is already in progress, this wouldn't trigger another > merge(). > - Pretty soon all fetchers would be stalled and get into the following state. > {noformat} > Thread 9351: (state = BLOCKED) > - java.lang.Object.wait(long) @bci=0 (Compiled frame; information may be > imprecise) > - java.lang.Object.wait() @bci=2, line=502 (Compiled frame) > - > org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MergeManager.waitForShuffleToMergeMemory() > @bci=17, line=337 (Interpreted frame) > - > org.apache.tez.runtime.library.common.shuffle.orderedgrouped.FetcherOrderedGrouped.run() > @bci=34, line=157 (Interpreted frame) > {noformat} > - Even if InMemoryMerger completes, "commitedMem & usedMem" are beyond their > threshold and no other fetcher threads (all are in stalled state) are there > to release memory. This causes fetchers to wait indefinitely. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TEZ-2205) Tez still tries to post to ATS when yarn.timeline-service.enabled=false
[ https://issues.apache.org/jira/browse/TEZ-2205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chang Li updated TEZ-2205: -- Attachment: TEZ-2205.1.patch > Tez still tries to post to ATS when yarn.timeline-service.enabled=false > --- > > Key: TEZ-2205 > URL: https://issues.apache.org/jira/browse/TEZ-2205 > Project: Apache Tez > Issue Type: Sub-task >Affects Versions: 0.6.1 >Reporter: Chang Li >Assignee: Chang Li > Attachments: TEZ-2205.1.patch, TEZ-2205.wip.patch > > > when set yarn.timeline-service.enabled=false, Tez still tries posting to ATS, > but hits error as token is not found. Does not fail the job because of the > fix to not fail job when there is error posting to ATS. But it should not be > trying to post to ATS in the first place. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-2227) Tez UI shows empty page under IE11
[ https://issues.apache.org/jira/browse/TEZ-2227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14380143#comment-14380143 ] Prakash Ramachandran commented on TEZ-2227: --- will look into IE support issues. > Tez UI shows empty page under IE11 > -- > > Key: TEZ-2227 > URL: https://issues.apache.org/jira/browse/TEZ-2227 > Project: Apache Tez > Issue Type: Bug > Components: UI >Affects Versions: 0.6.0 >Reporter: Fengdong Yu >Assignee: Prakash Ramachandran >Priority: Minor > Attachments: IE11.PNG, chrome.PNG > > > Tez UI works well under Chrome and Firefox, but shows empty page udner IE11. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (TEZ-2227) Tez UI shows empty page under IE11
[ https://issues.apache.org/jira/browse/TEZ-2227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prakash Ramachandran reassigned TEZ-2227: - Assignee: Prakash Ramachandran > Tez UI shows empty page under IE11 > -- > > Key: TEZ-2227 > URL: https://issues.apache.org/jira/browse/TEZ-2227 > Project: Apache Tez > Issue Type: Bug > Components: UI >Affects Versions: 0.6.0 >Reporter: Fengdong Yu >Assignee: Prakash Ramachandran >Priority: Minor > Attachments: IE11.PNG, chrome.PNG > > > Tez UI works well under Chrome and Firefox, but shows empty page udner IE11. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-2229) bower ESUDO Cannot be run with sudo -- during build
[ https://issues.apache.org/jira/browse/TEZ-2229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14380141#comment-14380141 ] Prakash Ramachandran commented on TEZ-2229: --- The allow-root was removed as it is generally not recommended like in the error message shown (mixing sudo and then running without root often causes permission issues etc.). it would be required if the build is done as root. > bower ESUDO Cannot be run with sudo -- during build > --- > > Key: TEZ-2229 > URL: https://issues.apache.org/jira/browse/TEZ-2229 > Project: Apache Tez > Issue Type: Bug >Affects Versions: 0.6.0 > Environment: Linux x86_64 >Reporter: Fengdong Yu > > I build Tez using root, I never install node/npm locally before my build. > then there are exception messages during build tez-ui module. Maven debug > logs: > {code} > [DEBUG] env: SSH_TTY=/dev/pts/0 > [DEBUG] env: TERM=xterm > [DEBUG] env: USER=root > [DEBUG] env: XFILESEARCHPATH=/usr/dt/app-defaults/%L/Dt > [DEBUG] Toolchains are ignored, 'executable' parameter is set to > /root/temp/apache-tez-0.6.0-src/tez-ui/src/main/webapp/node/node > [DEBUG] Executing command line: > [/root/temp/apache-tez-0.6.0-src/tez-ui/src/main/webapp/node/node, > node_modules/bower/bin/bower, install, --remove-unnecessary-resolutions=false] > bower ESUDO Cannot be run with sudo > Additional error details: > Since bower is a user command, there is no need to execute it with superuser > permissions. > If you're having permission errors when using bower without sudo, please > spend a few minutes learning more about how your system should work and make > any necessary repairs. > http://www.joyent.com/blog/installing-node-and-npm > https://gist.github.com/isaacs/579814 > You can however run a command with sudo using --allow-root option > {code} > {code} > [ERROR] Failed to execute goal org.codehaus.mojo:exec-maven-plugin:1.3.2:exec > (Bower install) on project tez-ui: Command execution failed. Process exited > with an error: 1 (Exit value: 1) -> [ > Help 1]org.apache.maven.lifecycle.LifecycleExecutionException: Failed to > execute goal org.codehaus.mojo:exec-maven-plugin:1.3.2:exec (Bower install) > on project tez-ui: Command execution failed. > at > org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:216) > at > org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:153) > at > org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:145) > at > org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:116) > at > org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:80) > at > org.apache.maven.lifecycle.internal.builder.singlethreaded.SingleThreadedBuilder.build(SingleThreadedBuilder.java:51) > at > org.apache.maven.lifecycle.internal.LifecycleStarter.execute(LifecycleStarter.java:120) > at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:355) > at org.apache.maven.DefaultMaven.execute(DefaultMaven.java:155) > at org.apache.maven.cli.MavenCli.execute(MavenCli.java:584) > at org.apache.maven.cli.MavenCli.doMain(MavenCli.java:216) > at org.apache.maven.cli.MavenCli.main(MavenCli.java:160) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced(Launcher.java:289) > at > org.codehaus.plexus.classworlds.launcher.Launcher.launch(Launcher.java:229) > at > org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode(Launcher.java:415) > at > org.codehaus.plexus.classworlds.launcher.Launcher.main(Launcher.java:356) > Caused by: org.apache.maven.plugin.MojoExecutionException: Command execution > failed. > at org.codehaus.mojo.exec.ExecMojo.execute(ExecMojo.java:303) > at > org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo(DefaultBuildPluginManager.java:132) > at > org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:208) > ... 19 more > Caused by: org.apache.commons.exec.ExecuteException: Process exited with an > error: 1 (Exit value: 1) > at > org.apache.commons.exec.DefaultExecutor.executeInternal(DefaultExecutor.java:402) > at > org.apache.commons.exec.DefaultExecutor.execute(DefaultExecutor.java:164) > at org.codehaus.mojo.exec.ExecMojo.executeCommandLine(ExecMojo.java:746) > a
[jira] [Commented] (TEZ-2047) Build fails against hadoop-2.2 post TEZ-2018
[ https://issues.apache.org/jira/browse/TEZ-2047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14380052#comment-14380052 ] Hitesh Shah commented on TEZ-2047: -- Updated missing fix version. > Build fails against hadoop-2.2 post TEZ-2018 > > > Key: TEZ-2047 > URL: https://issues.apache.org/jira/browse/TEZ-2047 > Project: Apache Tez > Issue Type: Bug >Reporter: Hitesh Shah >Assignee: Prakash Ramachandran >Priority: Blocker > Fix For: 0.6.1 > > Attachments: TEZ-2047.1.patch, TEZ-2047.1.patch, TEZ-2047.2.patch > > > Failed to execute goal > org.apache.maven.plugins:maven-compiler-plugin:3.1:compile (default-compile) > on project tez-dag: Compilation failure: Compilation failure: > [ERROR] > /home/jenkins/jenkins-slave/workspace/Tez-Build-Hadoop-2.2/tez-dag/src/main/java/org/apache/tez/dag/app/web/WebUIService.java:[85,13] > cannot find symbol > [ERROR] symbol : method > withHttpPolicy(org.apache.hadoop.conf.Configuration,org.apache.hadoop.http.HttpConfig.Policy) > [ERROR] location: class > org.apache.hadoop.yarn.webapp.WebApps.Builder > [ERROR] > /home/jenkins/jenkins-slave/workspace/Tez-Build-Hadoop-2.2/tez-dag/src/main/java/org/apache/tez/dag/app/web/WebUIService.java:[87,45] > cannot find symbol > [ERROR] symbol : method getConnectorAddress(int) > [ERROR] location: class org.apache.hadoop.http.HttpServer -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TEZ-2047) Build fails against hadoop-2.2 post TEZ-2018
[ https://issues.apache.org/jira/browse/TEZ-2047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Shah updated TEZ-2047: - Fix Version/s: 0.6.1 > Build fails against hadoop-2.2 post TEZ-2018 > > > Key: TEZ-2047 > URL: https://issues.apache.org/jira/browse/TEZ-2047 > Project: Apache Tez > Issue Type: Bug >Reporter: Hitesh Shah >Assignee: Prakash Ramachandran >Priority: Blocker > Fix For: 0.6.1 > > Attachments: TEZ-2047.1.patch, TEZ-2047.1.patch, TEZ-2047.2.patch > > > Failed to execute goal > org.apache.maven.plugins:maven-compiler-plugin:3.1:compile (default-compile) > on project tez-dag: Compilation failure: Compilation failure: > [ERROR] > /home/jenkins/jenkins-slave/workspace/Tez-Build-Hadoop-2.2/tez-dag/src/main/java/org/apache/tez/dag/app/web/WebUIService.java:[85,13] > cannot find symbol > [ERROR] symbol : method > withHttpPolicy(org.apache.hadoop.conf.Configuration,org.apache.hadoop.http.HttpConfig.Policy) > [ERROR] location: class > org.apache.hadoop.yarn.webapp.WebApps.Builder > [ERROR] > /home/jenkins/jenkins-slave/workspace/Tez-Build-Hadoop-2.2/tez-dag/src/main/java/org/apache/tez/dag/app/web/WebUIService.java:[87,45] > cannot find symbol > [ERROR] symbol : method getConnectorAddress(int) > [ERROR] location: class org.apache.hadoop.http.HttpServer -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-2227) Tez UI shows empty page under IE11
[ https://issues.apache.org/jira/browse/TEZ-2227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14380050#comment-14380050 ] Hitesh Shah commented on TEZ-2227: -- \cc [~Sreenath] [~pramachandran] in case they have seen this before > Tez UI shows empty page under IE11 > -- > > Key: TEZ-2227 > URL: https://issues.apache.org/jira/browse/TEZ-2227 > Project: Apache Tez > Issue Type: Bug > Components: UI >Affects Versions: 0.6.0 >Reporter: Fengdong Yu >Priority: Minor > Attachments: IE11.PNG, chrome.PNG > > > Tez UI works well under Chrome and Firefox, but shows empty page udner IE11. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-2229) bower ESUDO Cannot be run with sudo -- during build
[ https://issues.apache.org/jira/browse/TEZ-2229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14380047#comment-14380047 ] Hitesh Shah commented on TEZ-2229: -- Adding a link to TEZ-1838 where allow-root was removed. \cc [~pramachandran] [~Sreenath] any comments on this? > bower ESUDO Cannot be run with sudo -- during build > --- > > Key: TEZ-2229 > URL: https://issues.apache.org/jira/browse/TEZ-2229 > Project: Apache Tez > Issue Type: Bug >Affects Versions: 0.6.0 > Environment: Linux x86_64 >Reporter: Fengdong Yu > > I build Tez using root, I never install node/npm locally before my build. > then there are exception messages during build tez-ui module. Maven debug > logs: > {code} > [DEBUG] env: SSH_TTY=/dev/pts/0 > [DEBUG] env: TERM=xterm > [DEBUG] env: USER=root > [DEBUG] env: XFILESEARCHPATH=/usr/dt/app-defaults/%L/Dt > [DEBUG] Toolchains are ignored, 'executable' parameter is set to > /root/temp/apache-tez-0.6.0-src/tez-ui/src/main/webapp/node/node > [DEBUG] Executing command line: > [/root/temp/apache-tez-0.6.0-src/tez-ui/src/main/webapp/node/node, > node_modules/bower/bin/bower, install, --remove-unnecessary-resolutions=false] > bower ESUDO Cannot be run with sudo > Additional error details: > Since bower is a user command, there is no need to execute it with superuser > permissions. > If you're having permission errors when using bower without sudo, please > spend a few minutes learning more about how your system should work and make > any necessary repairs. > http://www.joyent.com/blog/installing-node-and-npm > https://gist.github.com/isaacs/579814 > You can however run a command with sudo using --allow-root option > {code} > {code} > [ERROR] Failed to execute goal org.codehaus.mojo:exec-maven-plugin:1.3.2:exec > (Bower install) on project tez-ui: Command execution failed. Process exited > with an error: 1 (Exit value: 1) -> [ > Help 1]org.apache.maven.lifecycle.LifecycleExecutionException: Failed to > execute goal org.codehaus.mojo:exec-maven-plugin:1.3.2:exec (Bower install) > on project tez-ui: Command execution failed. > at > org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:216) > at > org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:153) > at > org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:145) > at > org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:116) > at > org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:80) > at > org.apache.maven.lifecycle.internal.builder.singlethreaded.SingleThreadedBuilder.build(SingleThreadedBuilder.java:51) > at > org.apache.maven.lifecycle.internal.LifecycleStarter.execute(LifecycleStarter.java:120) > at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:355) > at org.apache.maven.DefaultMaven.execute(DefaultMaven.java:155) > at org.apache.maven.cli.MavenCli.execute(MavenCli.java:584) > at org.apache.maven.cli.MavenCli.doMain(MavenCli.java:216) > at org.apache.maven.cli.MavenCli.main(MavenCli.java:160) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced(Launcher.java:289) > at > org.codehaus.plexus.classworlds.launcher.Launcher.launch(Launcher.java:229) > at > org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode(Launcher.java:415) > at > org.codehaus.plexus.classworlds.launcher.Launcher.main(Launcher.java:356) > Caused by: org.apache.maven.plugin.MojoExecutionException: Command execution > failed. > at org.codehaus.mojo.exec.ExecMojo.execute(ExecMojo.java:303) > at > org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo(DefaultBuildPluginManager.java:132) > at > org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:208) > ... 19 more > Caused by: org.apache.commons.exec.ExecuteException: Process exited with an > error: 1 (Exit value: 1) > at > org.apache.commons.exec.DefaultExecutor.executeInternal(DefaultExecutor.java:402) > at > org.apache.commons.exec.DefaultExecutor.execute(DefaultExecutor.java:164) > at org.codehaus.mojo.exec.ExecMojo.executeCommandLine(ExecMojo.java:746) > at org.codehaus.mojo.exec.ExecMojo.execute(ExecMojo.java:292) > ... 21 more > [ERROR] > [ERROR] > [ERROR] For more information abo
[jira] [Commented] (TEZ-714) OutputCommitters should not run in the main AM dispatcher thread
[ https://issues.apache.org/jira/browse/TEZ-714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14380043#comment-14380043 ] Hadoop QA commented on TEZ-714: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12707217/TEZ-714-4.patch against master revision d1b4bd4. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in : org.apache.tez.dag.app.dag.impl.TestCommit Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/346//testReport/ Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/346//console This message is automatically generated. > OutputCommitters should not run in the main AM dispatcher thread > > > Key: TEZ-714 > URL: https://issues.apache.org/jira/browse/TEZ-714 > Project: Apache Tez > Issue Type: Improvement >Reporter: Siddharth Seth >Assignee: Jeff Zhang >Priority: Critical > Attachments: DAG_2.pdf, TEZ-714-1.patch, TEZ-714-2.patch, > TEZ-714-3.patch, TEZ-714-4.patch, Vertex_2.pdf > > > Follow up jira from TEZ-41. > 1) If there's multiple OutputCommitters on a Vertex, they can be run in > parallel. > 2) Running an OutputCommitter in the main thread blocks all other event > handling, w.r.t the DAG, and causes the event queue to back up. > 3) This should also cover shared commits that happen in the DAG. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Failed: TEZ-714 PreCommit Build #346
Jira: https://issues.apache.org/jira/browse/TEZ-714 Build: https://builds.apache.org/job/PreCommit-TEZ-Build/346/ ### ## LAST 60 LINES OF THE CONSOLE ### [...truncated 2381 lines...] {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12707217/TEZ-714-4.patch against master revision d1b4bd4. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in : org.apache.tez.dag.app.dag.impl.TestCommit Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/346//testReport/ Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/346//console This message is automatically generated. == == Adding comment to Jira. == == Comment added. d097ae729e94fba50749fa203ae0f8b6d76f0a90 logged out == == Finished build. == == Build step 'Execute shell' marked build as failure Archiving artifacts Sending artifact delta relative to PreCommit-TEZ-Build #343 Archived 44 artifacts Archive block size is 32768 Received 2 blocks and 2638094 bytes Compression is 2.4% Took 1 sec [description-setter] Could not determine description. Recording test results Email was triggered for: Failure Sending email for trigger: Failure ### ## FAILED TESTS (if any) ## 1 tests failed. REGRESSION: org.apache.tez.dag.app.dag.impl.TestCommit.testVertexGroupCommitFinishedEventFail Error Message: expected:<0> but was:<1> Stack Trace: java.lang.AssertionError: expected:<0> but was:<1> at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.failNotEquals(Assert.java:743) at org.junit.Assert.assertEquals(Assert.java:118) at org.junit.Assert.assertEquals(Assert.java:555) at org.junit.Assert.assertEquals(Assert.java:542) at org.apache.tez.dag.app.dag.impl.TestCommit.testVertexGroupCommitFinishedEventFail(TestCommit.java:1194)
[jira] [Comment Edited] (TEZ-714) OutputCommitters should not run in the main AM dispatcher thread
[ https://issues.apache.org/jira/browse/TEZ-714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14379904#comment-14379904 ] Jeff Zhang edited comment on TEZ-714 at 3/25/15 3:04 PM: - Upload new patch to fix the test failed issue. was (Author: zjffdu): test failed, will check it > OutputCommitters should not run in the main AM dispatcher thread > > > Key: TEZ-714 > URL: https://issues.apache.org/jira/browse/TEZ-714 > Project: Apache Tez > Issue Type: Improvement >Reporter: Siddharth Seth >Assignee: Jeff Zhang >Priority: Critical > Attachments: DAG_2.pdf, TEZ-714-1.patch, TEZ-714-2.patch, > TEZ-714-3.patch, TEZ-714-4.patch, Vertex_2.pdf > > > Follow up jira from TEZ-41. > 1) If there's multiple OutputCommitters on a Vertex, they can be run in > parallel. > 2) Running an OutputCommitter in the main thread blocks all other event > handling, w.r.t the DAG, and causes the event queue to back up. > 3) This should also cover shared commits that happen in the DAG. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TEZ-714) OutputCommitters should not run in the main AM dispatcher thread
[ https://issues.apache.org/jira/browse/TEZ-714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeff Zhang updated TEZ-714: --- Attachment: TEZ-714-4.patch > OutputCommitters should not run in the main AM dispatcher thread > > > Key: TEZ-714 > URL: https://issues.apache.org/jira/browse/TEZ-714 > Project: Apache Tez > Issue Type: Improvement >Reporter: Siddharth Seth >Assignee: Jeff Zhang >Priority: Critical > Attachments: DAG_2.pdf, TEZ-714-1.patch, TEZ-714-2.patch, > TEZ-714-3.patch, TEZ-714-4.patch, Vertex_2.pdf > > > Follow up jira from TEZ-41. > 1) If there's multiple OutputCommitters on a Vertex, they can be run in > parallel. > 2) Running an OutputCommitter in the main thread blocks all other event > handling, w.r.t the DAG, and causes the event queue to back up. > 3) This should also cover shared commits that happen in the DAG. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-2205) Tez still tries to post to ATS when yarn.timeline-service.enabled=false
[ https://issues.apache.org/jira/browse/TEZ-2205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14379951#comment-14379951 ] Chang Li commented on TEZ-2205: --- Thanks a lot for patient discussion and explanations [~hitesh], [~jeagles], [~zjshen]! Understand the implementation requirement now, will provide a patch soon > Tez still tries to post to ATS when yarn.timeline-service.enabled=false > --- > > Key: TEZ-2205 > URL: https://issues.apache.org/jira/browse/TEZ-2205 > Project: Apache Tez > Issue Type: Sub-task >Affects Versions: 0.6.1 >Reporter: Chang Li >Assignee: Chang Li > Attachments: TEZ-2205.wip.patch > > > when set yarn.timeline-service.enabled=false, Tez still tries posting to ATS, > but hits error as token is not found. Does not fail the job because of the > fix to not fail job when there is error posting to ATS. But it should not be > trying to post to ATS in the first place. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-714) OutputCommitters should not run in the main AM dispatcher thread
[ https://issues.apache.org/jira/browse/TEZ-714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14379904#comment-14379904 ] Jeff Zhang commented on TEZ-714: test failed, will check it > OutputCommitters should not run in the main AM dispatcher thread > > > Key: TEZ-714 > URL: https://issues.apache.org/jira/browse/TEZ-714 > Project: Apache Tez > Issue Type: Improvement >Reporter: Siddharth Seth >Assignee: Jeff Zhang >Priority: Critical > Attachments: DAG_2.pdf, TEZ-714-1.patch, TEZ-714-2.patch, > TEZ-714-3.patch, Vertex_2.pdf > > > Follow up jira from TEZ-41. > 1) If there's multiple OutputCommitters on a Vertex, they can be run in > parallel. > 2) Running an OutputCommitter in the main thread blocks all other event > handling, w.r.t the DAG, and causes the event queue to back up. > 3) This should also cover shared commits that happen in the DAG. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-714) OutputCommitters should not run in the main AM dispatcher thread
[ https://issues.apache.org/jira/browse/TEZ-714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14379901#comment-14379901 ] Hadoop QA commented on TEZ-714: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12707199/TEZ-714-3.patch against master revision 60ddcba. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in : org.apache.tez.test.TestTezJobs org.apache.tez.mapreduce.TestMRRJobsDAGApi Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/345//testReport/ Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/345//console This message is automatically generated. > OutputCommitters should not run in the main AM dispatcher thread > > > Key: TEZ-714 > URL: https://issues.apache.org/jira/browse/TEZ-714 > Project: Apache Tez > Issue Type: Improvement >Reporter: Siddharth Seth >Assignee: Jeff Zhang >Priority: Critical > Attachments: DAG_2.pdf, TEZ-714-1.patch, TEZ-714-2.patch, > TEZ-714-3.patch, Vertex_2.pdf > > > Follow up jira from TEZ-41. > 1) If there's multiple OutputCommitters on a Vertex, they can be run in > parallel. > 2) Running an OutputCommitter in the main thread blocks all other event > handling, w.r.t the DAG, and causes the event queue to back up. > 3) This should also cover shared commits that happen in the DAG. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Failed: TEZ-714 PreCommit Build #345
Jira: https://issues.apache.org/jira/browse/TEZ-714 Build: https://builds.apache.org/job/PreCommit-TEZ-Build/345/ ### ## LAST 60 LINES OF THE CONSOLE ### [...truncated 2532 lines...] {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12707199/TEZ-714-3.patch against master revision 60ddcba. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in : org.apache.tez.test.TestTezJobs org.apache.tez.mapreduce.TestMRRJobsDAGApi Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/345//testReport/ Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/345//console This message is automatically generated. == == Adding comment to Jira. == == Comment added. ebe71b2173110526859381e9bdcee0ec01d52f37 logged out == == Finished build. == == Build step 'Execute shell' marked build as failure Archiving artifacts Sending artifact delta relative to PreCommit-TEZ-Build #343 Archived 44 artifacts Archive block size is 32768 Received 4 blocks and 2585916 bytes Compression is 4.8% Took 9.6 sec [description-setter] Could not determine description. Recording test results Email was triggered for: Failure Sending email for trigger: Failure ### ## FAILED TESTS (if any) ## 2 tests failed. REGRESSION: org.apache.tez.mapreduce.TestMRRJobsDAGApi.testVertexGroups Error Message: Unsupported value for DAGStatus.State : DAG_COMMITTING Stack Trace: org.apache.tez.dag.api.TezUncheckedException: Unsupported value for DAGStatus.State : DAG_COMMITTING at org.apache.tez.dag.api.client.DAGStatus.getState(DAGStatus.java:83) at org.apache.tez.dag.api.client.DAGStatus.isCompleted(DAGStatus.java:89) at org.apache.tez.dag.api.client.DAGClientImpl._waitForCompletionWithStatusUpdates(DAGClientImpl.java:436) at org.apache.tez.dag.api.client.DAGClientImpl.waitForCompletionWithStatusUpdates(DAGClientImpl.java:298) at org.apache.tez.mapreduce.examples.UnionExample.run(UnionExample.java:280) at org.apache.tez.mapreduce.TestMRRJobsDAGApi.testVertexGroups(TestMRRJobsDAGApi.java:850) REGRESSION: org.apache.tez.test.TestTezJobs.testHashJoinExamplePipeline Error Message: Unsupported value for DAGStatus.State : DAG_COMMITTING Stack Trace: org.apache.tez.dag.api.TezUncheckedException: Unsupported value for DAGStatus.State : DAG_COMMITTING at org.apache.tez.dag.api.client.DAGStatus.getState(DAGStatus.java:83) at org.apache.tez.dag.api.client.DAGStatus.isCompleted(DAGStatus.java:89) at org.apache.tez.dag.api.client.DAGClientImpl._waitForCompletionWithStatusUpdates(DAGClientImpl.java:436) at org.apache.tez.dag.api.client.DAGClientImpl.waitForCompletionWithStatusUpdates(DAGClientImpl.java:298) at org.apache.tez.examples.TezExampleBase.runDag(TezExampleBase.java:134) at org.apache.tez.examples.HashJoinExample.runJob(HashJoinExample.java:130) at org.apache.tez.examples.TezExampleBase._execute(TezExampleBase.java:179) at org.apache.tez.examples.TezExampleBase.run(TezExampleBase.java:112) at org.apache.tez.test.TestTezJobs.testHashJoinExamplePipeline(TestTezJobs.java:404)
[jira] [Resolved] (TEZ-2047) Build fails against hadoop-2.2 post TEZ-2018
[ https://issues.apache.org/jira/browse/TEZ-2047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prakash Ramachandran resolved TEZ-2047. --- Resolution: Fixed Thanks hitesh committed to master, branch-0.6 > Build fails against hadoop-2.2 post TEZ-2018 > > > Key: TEZ-2047 > URL: https://issues.apache.org/jira/browse/TEZ-2047 > Project: Apache Tez > Issue Type: Bug >Reporter: Hitesh Shah >Assignee: Prakash Ramachandran >Priority: Blocker > Attachments: TEZ-2047.1.patch, TEZ-2047.1.patch, TEZ-2047.2.patch > > > Failed to execute goal > org.apache.maven.plugins:maven-compiler-plugin:3.1:compile (default-compile) > on project tez-dag: Compilation failure: Compilation failure: > [ERROR] > /home/jenkins/jenkins-slave/workspace/Tez-Build-Hadoop-2.2/tez-dag/src/main/java/org/apache/tez/dag/app/web/WebUIService.java:[85,13] > cannot find symbol > [ERROR] symbol : method > withHttpPolicy(org.apache.hadoop.conf.Configuration,org.apache.hadoop.http.HttpConfig.Policy) > [ERROR] location: class > org.apache.hadoop.yarn.webapp.WebApps.Builder > [ERROR] > /home/jenkins/jenkins-slave/workspace/Tez-Build-Hadoop-2.2/tez-dag/src/main/java/org/apache/tez/dag/app/web/WebUIService.java:[87,45] > cannot find symbol > [ERROR] symbol : method getConnectorAddress(int) > [ERROR] location: class org.apache.hadoop.http.HttpServer -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-714) OutputCommitters should not run in the main AM dispatcher thread
[ https://issues.apache.org/jira/browse/TEZ-714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14379887#comment-14379887 ] Jeff Zhang commented on TEZ-714: [~bikassaha] Thanks for the suggestion. Upload a patch (unit test is included, but e2e test in MockDAGAppMaster has not implemented yet. ) > OutputCommitters should not run in the main AM dispatcher thread > > > Key: TEZ-714 > URL: https://issues.apache.org/jira/browse/TEZ-714 > Project: Apache Tez > Issue Type: Improvement >Reporter: Siddharth Seth >Assignee: Jeff Zhang >Priority: Critical > Attachments: DAG_2.pdf, TEZ-714-1.patch, TEZ-714-2.patch, > TEZ-714-3.patch, Vertex_2.pdf > > > Follow up jira from TEZ-41. > 1) If there's multiple OutputCommitters on a Vertex, they can be run in > parallel. > 2) Running an OutputCommitter in the main thread blocks all other event > handling, w.r.t the DAG, and causes the event queue to back up. > 3) This should also cover shared commits that happen in the DAG. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TEZ-714) OutputCommitters should not run in the main AM dispatcher thread
[ https://issues.apache.org/jira/browse/TEZ-714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeff Zhang updated TEZ-714: --- Attachment: TEZ-714-3.patch > OutputCommitters should not run in the main AM dispatcher thread > > > Key: TEZ-714 > URL: https://issues.apache.org/jira/browse/TEZ-714 > Project: Apache Tez > Issue Type: Improvement >Reporter: Siddharth Seth >Assignee: Jeff Zhang >Priority: Critical > Attachments: DAG_2.pdf, TEZ-714-1.patch, TEZ-714-2.patch, > TEZ-714-3.patch, Vertex_2.pdf > > > Follow up jira from TEZ-41. > 1) If there's multiple OutputCommitters on a Vertex, they can be run in > parallel. > 2) Running an OutputCommitter in the main thread blocks all other event > handling, w.r.t the DAG, and causes the event queue to back up. > 3) This should also cover shared commits that happen in the DAG. -- This message was sent by Atlassian JIRA (v6.3.4#6332)