[jira] [Commented] (TEZ-2199) updateLocalResourcesForInputSplits assumes wrongly that split data is on same FS as the default FS
[ https://issues.apache.org/jira/browse/TEZ-2199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14361454#comment-14361454 ] Hadoop QA commented on TEZ-2199: {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12704551/TEZ-2199.1.patch against master revision b18552b. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/303//testReport/ Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/303//console This message is automatically generated. > updateLocalResourcesForInputSplits assumes wrongly that split data is on same > FS as the default FS > -- > > Key: TEZ-2199 > URL: https://issues.apache.org/jira/browse/TEZ-2199 > Project: Apache Tez > Issue Type: Bug >Reporter: Hitesh Shah >Assignee: Hitesh Shah > Attachments: TEZ-2199.1.patch > > > Seen in a Windows Azure scenario: > Caused by: java.io.FileNotFoundException: > hdfs://namenode:9000/hive/scratch/_tez_scratch_dir/split_Map_1/job.split: No > such file or directory. > at > org.apache.hadoop.fs.azure.NativeAzureFileSystem.getFileStatus(NativeAzureFileSystem.java:1625) > at > org.apache.tez.mapreduce.hadoop.MRInputHelpers.updateLocalResourcesForInputSplits(MRInputHelpers.java:639) > at > org.apache.tez.mapreduce.hadoop.MRInputHelpers.configureMRInputWithLegacySplitGeneration(MRInputHelpers.java:115) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Success: TEZ-2199 PreCommit Build #303
Jira: https://issues.apache.org/jira/browse/TEZ-2199 Build: https://builds.apache.org/job/PreCommit-TEZ-Build/303/ ### ## LAST 60 LINES OF THE CONSOLE ### [...truncated 2748 lines...] [INFO] Final Memory: 70M/967M [INFO] {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12704551/TEZ-2199.1.patch against master revision b18552b. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/303//testReport/ Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/303//console This message is automatically generated. == == Adding comment to Jira. == == Comment added. ae43047303f0ec5a143958f9f93034a82f2a8dbd logged out == == Finished build. == == Archiving artifacts Sending artifact delta relative to PreCommit-TEZ-Build #297 Archived 44 artifacts Archive block size is 32768 Received 4 blocks and 2592123 bytes Compression is 4.8% Took 0.91 sec Description set: TEZ-2199 Recording test results Email was triggered for: Success Sending email for trigger: Success ### ## FAILED TESTS (if any) ## All tests passed
[jira] [Commented] (TEZ-1909) Remove need to copy over all events from attempt 1 to attempt 2 dir
[ https://issues.apache.org/jira/browse/TEZ-1909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14361426#comment-14361426 ] Hitesh Shah commented on TEZ-1909: -- Comments: - any reason why this is needed in the DAGAppMaster "Set getDagIDs()" ? - the "if (skipAllOtherEvents) {" check is probably also needed at the top of the loop to prevent new files from being opened and read ( in addition to short-circuiting the read of all events in the given file ). Maybe just log a message that other files were present and skipped - I do not see TEZ_AM_RECOVERY_HANDLE_REMAINING_EVENT_WHEN_STOPPED being used anywhere apart from being set to true in one of the tests. - please replace "import com.sun.tools.javac.util.List;" with java.lang.List - testCorruptedLastRecord should also verify that the dag submitted event was seen. - also, we should add a test for adding corrupt data to the summary stream and ensuring that its processing fails - there may not be a need to add "getDAGNames()". Instead, you can just use "dagAppMaster.dagNames.add(dagSummaryData.dagName);" as dagNames should be package-private. > Remove need to copy over all events from attempt 1 to attempt 2 dir > --- > > Key: TEZ-1909 > URL: https://issues.apache.org/jira/browse/TEZ-1909 > Project: Apache Tez > Issue Type: Sub-task >Reporter: Hitesh Shah >Assignee: Jeff Zhang > Attachments: TEZ-1909-1.patch, TEZ-1909-2.patch > > > Use of file versions should prevent the need for copying over data into a > second attempt dir. Care needs to be taken to handle "last corrupt record" > handling. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-2193) Check returned value from EdgeManagerPlugin before using it
[ https://issues.apache.org/jira/browse/TEZ-2193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14361402#comment-14361402 ] Bikas Saha commented on TEZ-2193: - Perhaps add a test similar to TestEdge#testOneToOneEdgeManager() for the scatter gather edge change. getPhysicalOutput/Input... methods may be called multiple times when creating the tasks of large vertex. It would help if the Preconditions message was not created with string + everytime (even though its going to pass almost always). Perhaps we can use a pre-assembled string here if we don't print the actual invalid value. {code} + Preconditions.checkArgument(physicalOutputCount >= 0, + "PhysicalOutputCount should not be negative," + + "physicalOutputCount=" + physicalOutputCount + + ", srcVertex=" + sourceVertex.getLogIdentifier() + + ", destVertex=" + destinationVertex.getLogIdentifier() + + ", EdgeManager=" + edgeManager.getClass().getName());{code} Consumer task num can be 0 because a task in the source may not have any consumers in this edge but may have consumers on a different edge. {code} srcTaskIndex); + Preconditions.checkArgument(numConsumers > 0, + "ConsumerTaskNum must be positive," + + "numConsumers=" + numConsumers{code} > Check returned value from EdgeManagerPlugin before using it > --- > > Key: TEZ-2193 > URL: https://issues.apache.org/jira/browse/TEZ-2193 > Project: Apache Tez > Issue Type: Bug >Reporter: Jeff Zhang >Assignee: Jeff Zhang > Attachments: TEZ-2193-1.patch, TEZ-2193-2.patch, TEZ-2193-3.patch > > > e.g. dag has vertices v1, v2 and shuffle edge between them, and v2 has custom > vertex manager and -1 parallelism. In this case v1's output spec may be has > -1 physical edge which will cause task hangs in TezChild. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TEZ-2199) updateLocalResourcesForInputSplits assumes wrongly that split data is on same FS as the default FS
[ https://issues.apache.org/jira/browse/TEZ-2199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Shah updated TEZ-2199: - Attachment: TEZ-2199.1.patch [~sseth] review please. > updateLocalResourcesForInputSplits assumes wrongly that split data is on same > FS as the default FS > -- > > Key: TEZ-2199 > URL: https://issues.apache.org/jira/browse/TEZ-2199 > Project: Apache Tez > Issue Type: Bug >Reporter: Hitesh Shah >Assignee: Hitesh Shah > Attachments: TEZ-2199.1.patch > > > Seen in a Windows Azure scenario: > Caused by: java.io.FileNotFoundException: > hdfs://namenode:9000/hive/scratch/_tez_scratch_dir/split_Map_1/job.split: No > such file or directory. > at > org.apache.hadoop.fs.azure.NativeAzureFileSystem.getFileStatus(NativeAzureFileSystem.java:1625) > at > org.apache.tez.mapreduce.hadoop.MRInputHelpers.updateLocalResourcesForInputSplits(MRInputHelpers.java:639) > at > org.apache.tez.mapreduce.hadoop.MRInputHelpers.configureMRInputWithLegacySplitGeneration(MRInputHelpers.java:115) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (TEZ-2021) Tez tool to analyze shuffle performance in large clusters by mining task logs
[ https://issues.apache.org/jira/browse/TEZ-2021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14361379#comment-14361379 ] Rajesh Balamohan edited comment on TEZ-2021 at 3/14/15 12:19 AM: - [~jeagles] This has been tested on small cluster with 20 nodes. It would be really helpful if you would like to try it out and provide your comments. * Apply this patch. * Build tez-tfile-parser in $TEZ/tez-tools/tez-tfile-parser/ ** "mvn clean package" * Populate env.sh in $TEZ/tez-tools/perf-analyzer/shuffle/ ** PIG_HOME, TEZ_HOME ** YARN_APP_LOGS_LOCATION *** "yarn.log-aggregation-enable" is set to true in the cluster *** Note down "yarn.nodemanager.remote-app-log-dir & yarn.nodemanager.remote-app-log-dir-suffix" parameters in your cluster and setup YARN_APP_LOGS_LOCATIONin env.sh appropriately * This requires "gnuplot" in the machine where you are planning to run. * Run "sh gnuplot.sh " (In case you would like to parse some other user's job, you might want to set "export APP_USER=appUserWhoRanTheJob" before running this) was (Author: rajesh.balamohan): This has been tested on small cluster with 20 nodes. It would be really helpful if you would like to try it out and provide your comments. * Apply this patch. * Build tez-tfile-parser in $TEZ/tez-tools/tez-tfile-parser/ ** "mvn clean package" * Populate env.sh in $TEZ/tez-tools/perf-analyzer/shuffle/ ** PIG_HOME, TEZ_HOME ** YARN_APP_LOGS_LOCATION *** "yarn.log-aggregation-enable" is set to true in the cluster *** Note down "yarn.nodemanager.remote-app-log-dir & yarn.nodemanager.remote-app-log-dir-suffix" parameters in your cluster and setup YARN_APP_LOGS_LOCATIONin env.sh appropriately * This requires "gnuplot" in the machine where you are planning to run. * Run "sh gnuplot.sh " (In case you would like to parse some other user's job, you might want to set "export APP_USER=appUserWhoRanTheJob" before running this) > Tez tool to analyze shuffle performance in large clusters by mining task logs > - > > Key: TEZ-2021 > URL: https://issues.apache.org/jira/browse/TEZ-2021 > Project: Apache Tez > Issue Type: Improvement >Reporter: Rajesh Balamohan >Assignee: Rajesh Balamohan > Attachments: TEZ-2021.1.patch, TEZ-2021.2.patch, > avg_time_Taken_after_fix.png, avg_time_taken_to_download.png, > no_of_times_contacted.png, total_data_transferred.png > > > Tez tool to analyze shuffle performance in large clusters by mining task > logs. Provide an easier way to visualize (heat chart) and identify bad nodes > in large cluster. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-2021) Tez tool to analyze shuffle performance in large clusters by mining task logs
[ https://issues.apache.org/jira/browse/TEZ-2021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14361379#comment-14361379 ] Rajesh Balamohan commented on TEZ-2021: --- This has been tested on small cluster with 20 nodes. It would be really helpful if you would like to try it out and provide your comments. * Apply this patch. * Build tez-tfile-parser in $TEZ/tez-tools/tez-tfile-parser/ ** "mvn clean package" * Populate env.sh in $TEZ/tez-tools/perf-analyzer/shuffle/ ** PIG_HOME, TEZ_HOME ** YARN_APP_LOGS_LOCATION *** "yarn.log-aggregation-enable" is set to true in the cluster *** Note down "yarn.nodemanager.remote-app-log-dir & yarn.nodemanager.remote-app-log-dir-suffix" parameters in your cluster and setup YARN_APP_LOGS_LOCATIONin env.sh appropriately * This requires "gnuplot" in the machine where you are planning to run. * Run "sh gnuplot.sh " (In case you would like to parse some other user's job, you might want to set "export APP_USER=appUserWhoRanTheJob" before running this) > Tez tool to analyze shuffle performance in large clusters by mining task logs > - > > Key: TEZ-2021 > URL: https://issues.apache.org/jira/browse/TEZ-2021 > Project: Apache Tez > Issue Type: Improvement >Reporter: Rajesh Balamohan >Assignee: Rajesh Balamohan > Attachments: TEZ-2021.1.patch, TEZ-2021.2.patch, > avg_time_Taken_after_fix.png, avg_time_taken_to_download.png, > no_of_times_contacted.png, total_data_transferred.png > > > Tez tool to analyze shuffle performance in large clusters by mining task > logs. Provide an easier way to visualize (heat chart) and identify bad nodes > in large cluster. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (TEZ-160) Remove 5 second sleep at the end of AM completion.
[ https://issues.apache.org/jira/browse/TEZ-160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14361375#comment-14361375 ] Bikas Saha edited comment on TEZ-160 at 3/14/15 12:17 AM: -- This should affect you if your tests are not using session mode and running 1 dag per AM. Is that the case? was (Author: bikassaha): This should affect you if your tests are not using session mode. Is that the case? > Remove 5 second sleep at the end of AM completion. > -- > > Key: TEZ-160 > URL: https://issues.apache.org/jira/browse/TEZ-160 > Project: Apache Tez > Issue Type: Bug >Reporter: Siddharth Seth > Labels: TEZ-0.2.0 > > ClientServiceDelegate/DAGClient doesn't seem to be getting job completion > status from the AM after job completion. It, instead, always relies on the RM > for this information. The information returned by the AM should be used while > it's available. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (TEZ-160) Remove 5 second sleep at the end of AM completion.
[ https://issues.apache.org/jira/browse/TEZ-160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14361375#comment-14361375 ] Bikas Saha edited comment on TEZ-160 at 3/14/15 12:18 AM: -- This only happens at AM shutdown, not DAG completion. This should affect you if your tests are not using session mode and running 1 dag per AM. Is that the case? was (Author: bikassaha): This should affect you if your tests are not using session mode and running 1 dag per AM. Is that the case? > Remove 5 second sleep at the end of AM completion. > -- > > Key: TEZ-160 > URL: https://issues.apache.org/jira/browse/TEZ-160 > Project: Apache Tez > Issue Type: Bug >Reporter: Siddharth Seth > Labels: TEZ-0.2.0 > > ClientServiceDelegate/DAGClient doesn't seem to be getting job completion > status from the AM after job completion. It, instead, always relies on the RM > for this information. The information returned by the AM should be used while > it's available. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-160) Remove 5 second sleep at the end of AM completion.
[ https://issues.apache.org/jira/browse/TEZ-160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14361375#comment-14361375 ] Bikas Saha commented on TEZ-160: This should affect you if your tests are not using session mode. Is that the case? > Remove 5 second sleep at the end of AM completion. > -- > > Key: TEZ-160 > URL: https://issues.apache.org/jira/browse/TEZ-160 > Project: Apache Tez > Issue Type: Bug >Reporter: Siddharth Seth > Labels: TEZ-0.2.0 > > ClientServiceDelegate/DAGClient doesn't seem to be getting job completion > status from the AM after job completion. It, instead, always relies on the RM > for this information. The information returned by the AM should be used while > it's available. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-776) Reduce AM mem usage caused by storing TezEvents
[ https://issues.apache.org/jira/browse/TEZ-776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14361273#comment-14361273 ] Bikas Saha commented on TEZ-776: All options have been sufficiently discussed on this thread and offline. The option of moving all of event handling to edge plugins is a much larger change that shifts a lot of framework responsibility to the user. Secondly, its not clear how future changes/features additions around dynamic graph reconfigurations like changes edges and vertices at runtime may or may not be affected by having given control of event management to user code. Things like event obsoletion which can be done easily by the framework for all edges and IO's would need to be done by every plugin. Every plugin would need to have additional metadata tracking objects which are currently provided by the framework. Each plugin would have to handle versioning of events and speculation like conditions which break the time-sequential nature of version numbers. And probably other stuff. Firstly, that is a much larger change, that is related but orthogonal to the memory issue and must be discussed separately on its own right. Secondly, while at a high level it may seem likely that in some cases edge plugins might do better at CPU, I suspect that after handling event versioning, obsoletion, etc. the argument that plugins can avoid iterating over events may turn out to be specious for CPU efficiency. My suggestion to follow up on that approach separately is based on the above arguments. It's not been effectively established that moving essential framework responsibilities to the user is the right approach long term. Neither is it clear that the CPU efficiency of the final implementation that does more than the sunny day scenario is going to be significantly better at the cost of adding complexity in user code. That can only be measured. Hence, I suggested that it be evaluated before including that change in the project. This is the case with any change or feature right? My only objection was to tie the progress on this jira by pre-accepting the other changes without going through due process. Specially when this jira does not mandate any user code changes. In the meanwhile, the current patch does not mandate any API changes for users. Unless users want to make the API change they can continue to use the existing API, even across releases. If they do want to make the change, its much simpler because it follows the existing pattern. But for users who are running large jobs and using framework built-in components, they can be unblocked on their scalability issues. Hence, my suggestion to complete the reviews of this patch and resolve it so that there is forward progress without requiring any user to make any code changes. In order to make progress, what I can try to do is limit the on demand routing to only composite event expansion and not change the flow for any other event. Add a new optional API for composite event expansion that will be implement by internal scatter-gather edge and optional so that users dont need to change their code. This will solve the memory scalability issue without increasing any CPU cost compared to any scenario as it exists today. I hope that clarifies and we can make progress on this jira. > Reduce AM mem usage caused by storing TezEvents > --- > > Key: TEZ-776 > URL: https://issues.apache.org/jira/browse/TEZ-776 > Project: Apache Tez > Issue Type: Sub-task >Reporter: Siddharth Seth >Assignee: Bikas Saha > Attachments: TEZ-776.ondemand.1.patch, TEZ-776.ondemand.2.patch, > TEZ-776.ondemand.3.patch, TEZ-776.ondemand.4.patch, TEZ-776.ondemand.5.patch, > TEZ-776.ondemand.6.patch, TEZ-776.ondemand.patch, With_Patch_AM_hotspots.png, > With_Patch_AM_profile.png, Without_patch_AM_CPU_Usage.png, > events-problem-solutions.txt, with_patch_jmc_output_of_AM.png, > without_patch_jmc_output_of_AM.png > > > This is open ended at the moment. > A fair chunk of the AM heap is taken up by TezEvents (specifically > DataMovementEvents - 64 bytes per event). > Depending on the connection pattern - this puts limits on the number of tasks > that can be processed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-2064) SessionNotRunning Exception not thrown is all cases
[ https://issues.apache.org/jira/browse/TEZ-2064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14361170#comment-14361170 ] Hadoop QA commented on TEZ-2064: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12698474/TEZ-2064.2.patch against master revision a809f96. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/302//testReport/ Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/302//console This message is automatically generated. > SessionNotRunning Exception not thrown is all cases > --- > > Key: TEZ-2064 > URL: https://issues.apache.org/jira/browse/TEZ-2064 > Project: Apache Tez > Issue Type: Bug >Reporter: Jonathan Eagles >Assignee: Jonathan Eagles >Priority: Critical > Attachments: TEZ-2064.1.patch, TEZ-2064.2.patch > > > Hive handles SessionNotRunning during submitDAG() and restarts the tez-session > if it receives one. In YHIVE-15, we did not receive that and the query > failed. In some scenarios the Application will fall out of the RM's knowledge > and a ApplicationNotFound exception is received instead. > Here are my asks. > 1. TezClient.submitDAG()/stop() should return SessionNotRunning exception if > application is expired. Basically any API which currently returns > SessionNotRunning should handle the app-not-found scenario. > 2. It would help if TezClient.getAppMasterStatus() can return > TezAppMasterStatus.SHUTDOWN if tez-session-application does not exist in RM. > That way, as a precaution, applications could check before submitting DAG's. > 3. I think it might be better if verifySessionStateForSubmission() checks the > app Status every time instead of checking sessionStarted. I am not sure about > side-effects, but will leave that to your decision. > If 3 takes time, we can pursue that later. It would really help to get 1 & 2 > in > the next tez release, especially for busy grids. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Failed: TEZ-2064 PreCommit Build #302
Jira: https://issues.apache.org/jira/browse/TEZ-2064 Build: https://builds.apache.org/job/PreCommit-TEZ-Build/302/ ### ## LAST 60 LINES OF THE CONSOLE ### [...truncated 2750 lines...] {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12698474/TEZ-2064.2.patch against master revision a809f96. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/302//testReport/ Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/302//console This message is automatically generated. == == Adding comment to Jira. == == Comment added. 159038a8b30e33d20ce11bc080b3ea5f7c3959b0 logged out == == Finished build. == == Build step 'Execute shell' marked build as failure Archiving artifacts Sending artifact delta relative to PreCommit-TEZ-Build #297 Archived 44 artifacts Archive block size is 32768 Received 4 blocks and 2590348 bytes Compression is 4.8% Took 0.86 sec [description-setter] Could not determine description. Recording test results Email was triggered for: Failure Sending email for trigger: Failure ### ## FAILED TESTS (if any) ## All tests passed
[jira] [Assigned] (TEZ-2199) updateLocalResourcesForInputSplits assumes wrongly that split data is on same FS as the default FS
[ https://issues.apache.org/jira/browse/TEZ-2199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Shah reassigned TEZ-2199: Assignee: Hitesh Shah > updateLocalResourcesForInputSplits assumes wrongly that split data is on same > FS as the default FS > -- > > Key: TEZ-2199 > URL: https://issues.apache.org/jira/browse/TEZ-2199 > Project: Apache Tez > Issue Type: Bug >Reporter: Hitesh Shah >Assignee: Hitesh Shah > > Seen in a Windows Azure scenario: > Caused by: java.io.FileNotFoundException: > hdfs://namenode:9000/hive/scratch/_tez_scratch_dir/split_Map_1/job.split: No > such file or directory. > at > org.apache.hadoop.fs.azure.NativeAzureFileSystem.getFileStatus(NativeAzureFileSystem.java:1625) > at > org.apache.tez.mapreduce.hadoop.MRInputHelpers.updateLocalResourcesForInputSplits(MRInputHelpers.java:639) > at > org.apache.tez.mapreduce.hadoop.MRInputHelpers.configureMRInputWithLegacySplitGeneration(MRInputHelpers.java:115) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (TEZ-2199) updateLocalResourcesForInputSplits assumes wrongly that split data is on same FS as the default FS
Hitesh Shah created TEZ-2199: Summary: updateLocalResourcesForInputSplits assumes wrongly that split data is on same FS as the default FS Key: TEZ-2199 URL: https://issues.apache.org/jira/browse/TEZ-2199 Project: Apache Tez Issue Type: Bug Reporter: Hitesh Shah Seen in a Windows Azure scenario: Caused by: java.io.FileNotFoundException: hdfs://namenode:9000/hive/scratch/_tez_scratch_dir/split_Map_1/job.split: No such file or directory. at org.apache.hadoop.fs.azure.NativeAzureFileSystem.getFileStatus(NativeAzureFileSystem.java:1625) at org.apache.tez.mapreduce.hadoop.MRInputHelpers.updateLocalResourcesForInputSplits(MRInputHelpers.java:639) at org.apache.tez.mapreduce.hadoop.MRInputHelpers.configureMRInputWithLegacySplitGeneration(MRInputHelpers.java:115) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-2064) SessionNotRunning Exception not thrown is all cases
[ https://issues.apache.org/jira/browse/TEZ-2064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14361102#comment-14361102 ] Hitesh Shah commented on TEZ-2064: -- Triggered pre-commit build. > SessionNotRunning Exception not thrown is all cases > --- > > Key: TEZ-2064 > URL: https://issues.apache.org/jira/browse/TEZ-2064 > Project: Apache Tez > Issue Type: Bug >Reporter: Jonathan Eagles >Assignee: Jonathan Eagles >Priority: Critical > Attachments: TEZ-2064.1.patch, TEZ-2064.2.patch > > > Hive handles SessionNotRunning during submitDAG() and restarts the tez-session > if it receives one. In YHIVE-15, we did not receive that and the query > failed. In some scenarios the Application will fall out of the RM's knowledge > and a ApplicationNotFound exception is received instead. > Here are my asks. > 1. TezClient.submitDAG()/stop() should return SessionNotRunning exception if > application is expired. Basically any API which currently returns > SessionNotRunning should handle the app-not-found scenario. > 2. It would help if TezClient.getAppMasterStatus() can return > TezAppMasterStatus.SHUTDOWN if tez-session-application does not exist in RM. > That way, as a precaution, applications could check before submitting DAG's. > 3. I think it might be better if verifySessionStateForSubmission() checks the > app Status every time instead of checking sessionStarted. I am not sure about > side-effects, but will leave that to your decision. > If 3 takes time, we can pursue that later. It would really help to get 1 & 2 > in > the next tez release, especially for busy grids. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-2021) Tez tool to analyze shuffle performance in large clusters by mining task logs
[ https://issues.apache.org/jira/browse/TEZ-2021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14361020#comment-14361020 ] Jonathan Eagles commented on TEZ-2021: -- haven't seen any recent updates to this ticket. Is this tool in good shape? > Tez tool to analyze shuffle performance in large clusters by mining task logs > - > > Key: TEZ-2021 > URL: https://issues.apache.org/jira/browse/TEZ-2021 > Project: Apache Tez > Issue Type: Improvement >Reporter: Rajesh Balamohan >Assignee: Rajesh Balamohan > Attachments: TEZ-2021.1.patch, TEZ-2021.2.patch, > avg_time_Taken_after_fix.png, avg_time_taken_to_download.png, > no_of_times_contacted.png, total_data_transferred.png > > > Tez tool to analyze shuffle performance in large clusters by mining task > logs. Provide an easier way to visualize (heat chart) and identify bad nodes > in large cluster. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-2064) SessionNotRunning Exception not thrown is all cases
[ https://issues.apache.org/jira/browse/TEZ-2064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14360861#comment-14360861 ] Hadoop QA commented on TEZ-2064: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12698474/TEZ-2064.2.patch against master revision a809f96. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in : org.apache.tez.test.TestAMRecovery Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/301//testReport/ Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/301//console This message is automatically generated. > SessionNotRunning Exception not thrown is all cases > --- > > Key: TEZ-2064 > URL: https://issues.apache.org/jira/browse/TEZ-2064 > Project: Apache Tez > Issue Type: Bug >Reporter: Jonathan Eagles >Assignee: Jonathan Eagles >Priority: Critical > Attachments: TEZ-2064.1.patch, TEZ-2064.2.patch > > > Hive handles SessionNotRunning during submitDAG() and restarts the tez-session > if it receives one. In YHIVE-15, we did not receive that and the query > failed. In some scenarios the Application will fall out of the RM's knowledge > and a ApplicationNotFound exception is received instead. > Here are my asks. > 1. TezClient.submitDAG()/stop() should return SessionNotRunning exception if > application is expired. Basically any API which currently returns > SessionNotRunning should handle the app-not-found scenario. > 2. It would help if TezClient.getAppMasterStatus() can return > TezAppMasterStatus.SHUTDOWN if tez-session-application does not exist in RM. > That way, as a precaution, applications could check before submitting DAG's. > 3. I think it might be better if verifySessionStateForSubmission() checks the > app Status every time instead of checking sessionStarted. I am not sure about > side-effects, but will leave that to your decision. > If 3 takes time, we can pursue that later. It would really help to get 1 & 2 > in > the next tez release, especially for busy grids. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Failed: TEZ-2064 PreCommit Build #301
Jira: https://issues.apache.org/jira/browse/TEZ-2064 Build: https://builds.apache.org/job/PreCommit-TEZ-Build/301/ ### ## LAST 60 LINES OF THE CONSOLE ### [...truncated 2536 lines...] {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12698474/TEZ-2064.2.patch against master revision a809f96. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in : org.apache.tez.test.TestAMRecovery Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/301//testReport/ Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/301//console This message is automatically generated. == == Adding comment to Jira. == == Comment added. cda39fff49b136e145b567c990028cad4831b8fc logged out == == Finished build. == == Build step 'Execute shell' marked build as failure Archiving artifacts Sending artifact delta relative to PreCommit-TEZ-Build #297 Archived 44 artifacts Archive block size is 32768 Received 4 blocks and 2588594 bytes Compression is 4.8% Took 1.4 sec [description-setter] Could not determine description. Recording test results Email was triggered for: Failure Sending email for trigger: Failure ### ## FAILED TESTS (if any) ## 1 tests failed. FAILED: org.apache.tez.test.TestAMRecovery.testVertexCompletelyFinished_Broadcast Error Message: File does not exist: /user/jenkins/target/org.apache.tez.test.TestAMRecovery-tmpDir/14711/.tez/application_1426269594468_0007/recovery/2/dag_1426269594468_0007_1.recovery at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:66) at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:56) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsUpdateTimes(FSNamesystem.java:1891) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1832) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1812) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1784) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:542) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:362) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2039) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2035) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2033) Stack Trace: java.io.FileNotFoundException: File does not exist: /user/jenkins/target/org.apache.tez.test.TestAMRecovery-tmpDir/14711/.tez/application_1426269594468_0007/recovery/2/dag_1426269594468_0007_1.recovery at org.apache.hadoop.h
[jira] [Commented] (TEZ-2191) Simulation improvements to MockDAGAppMaster
[ https://issues.apache.org/jira/browse/TEZ-2191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14360798#comment-14360798 ] Bikas Saha commented on TEZ-2191: - Thanks! Yes. They are right now there because I pulled the code from the memory events testing patch. When that test goes in then these will be used. Yes, the accuracy is intentional because storing it in ms often leads to 0 because the numbers are small per invocation. > Simulation improvements to MockDAGAppMaster > --- > > Key: TEZ-2191 > URL: https://issues.apache.org/jira/browse/TEZ-2191 > Project: Apache Tez > Issue Type: Task >Reporter: Bikas Saha >Assignee: Bikas Saha > Attachments: TEZ-2191.1.patch, TEZ-2191.2.patch, TEZ-2191.3.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Failed: TEZ-2064 PreCommit Build #300
Jira: https://issues.apache.org/jira/browse/TEZ-2064 Build: https://builds.apache.org/job/PreCommit-TEZ-Build/300/ ### ## LAST 60 LINES OF THE CONSOLE ### Started by user jeagles Building remotely on H7 (Mapreduce Falcon Hadoop Pig Zookeeper Tez Hdfs) in workspace /home/jenkins/jenkins-slave/workspace/PreCommit-TEZ-Build > git rev-parse --is-inside-work-tree # timeout=10 Fetching changes from the remote Git repository > git config remote.origin.url https://git-wip-us.apache.org/repos/asf/tez.git > # timeout=10 Cleaning workspace > git rev-parse --verify HEAD # timeout=10 Resetting working tree > git reset --hard # timeout=10 > git clean -fdx # timeout=10 Fetching upstream changes from https://git-wip-us.apache.org/repos/asf/tez.git > git --version # timeout=10 > git fetch --tags --progress https://git-wip-us.apache.org/repos/asf/tez.git > +refs/heads/*:refs/remotes/origin/* > git rev-parse refs/remotes/origin/master^{commit} # timeout=10 > git rev-parse refs/remotes/origin/origin/master^{commit} # timeout=10 Checking out Revision a809f96c6e6c7bfe8f683980713bff5bfe373419 (refs/remotes/origin/master) > git config core.sparsecheckout # timeout=10 > git checkout -f a809f96c6e6c7bfe8f683980713bff5bfe373419 > git rev-list 55d7fce0608506543eb6bbf53177b16c7f017e5b # timeout=10 No emails were triggered. [PreCommit-TEZ-Build] $ /bin/bash /tmp/hudson7683334766425930884.sh Running in Jenkins mode == == Testing patch for TEZ-2064. == == HEAD is now at a809f96 TEZ-2189. Tez UI live AM tracking url only works for localhost addresses (jeagles) Previous HEAD position was a809f96... TEZ-2189. Tez UI live AM tracking url only works for localhost addresses (jeagles) Switched to branch 'master' Your branch is behind 'origin/master' by 1 commit, and can be fast-forwarded. (use "git pull" to update your local branch) First, rewinding head to replay your work on top of it... Fast-forwarded master to a809f96c6e6c7bfe8f683980713bff5bfe373419. TEZ-2064 is not "Patch Available". Exiting. == == Finished build. == == Archiving artifacts ERROR: No artifacts found that match the file pattern "patchprocess/*.*". Configuration error? ERROR: ?patchprocess/*.*? doesn?t match anything, but ?*.*? does. Perhaps that?s what you mean? Build step 'Archive the artifacts' changed build result to FAILURE [description-setter] Could not determine description. Recording test results Email was triggered for: Failure Sending email for trigger: Failure ### ## FAILED TESTS (if any) ## No tests ran.
[jira] [Commented] (TEZ-2189) Tez UI live AM tracking url only works for localhost addresses
[ https://issues.apache.org/jira/browse/TEZ-2189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14360639#comment-14360639 ] Hitesh Shah commented on TEZ-2189: -- +1. Test failure is unrelated. > Tez UI live AM tracking url only works for localhost addresses > -- > > Key: TEZ-2189 > URL: https://issues.apache.org/jira/browse/TEZ-2189 > Project: Apache Tez > Issue Type: Bug > Components: UI >Reporter: Jonathan Eagles >Assignee: Jonathan Eagles > Attachments: TEZ-2189.1.patch, TEZ-2189.2.patch, TEZ-2189.3.patch, > TEZ-2189.4.patch, TEZ-2189.5.patch, TEZ-2189.6.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Failed: TEZ-2189 PreCommit Build #299
Jira: https://issues.apache.org/jira/browse/TEZ-2189 Build: https://builds.apache.org/job/PreCommit-TEZ-Build/299/ ### ## LAST 60 LINES OF THE CONSOLE ### [...truncated 1848 lines...] {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12704452/TEZ-2189.6.patch against master revision 55d7fce. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in : org.apache.tez.client.TestTezClient Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/299//testReport/ Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/299//console This message is automatically generated. == == Adding comment to Jira. == == Comment added. d407794a7911272faa4c39072feea1b54ea0d853 logged out == == Finished build. == == Build step 'Execute shell' marked build as failure Archiving artifacts Sending artifact delta relative to PreCommit-TEZ-Build #297 Archived 44 artifacts Archive block size is 32768 Received 4 blocks and 2534881 bytes Compression is 4.9% Took 1.2 sec [description-setter] Could not determine description. Recording test results Email was triggered for: Failure Sending email for trigger: Failure ### ## FAILED TESTS (if any) ## 1 tests failed. REGRESSION: org.apache.tez.client.TestTezClient.testTezclientSession Error Message: test timed out after 5000 milliseconds Stack Trace: java.lang.Exception: test timed out after 5000 milliseconds at java.net.PlainDatagramSocketImpl.receive0(Native Method) at java.net.AbstractPlainDatagramSocketImpl.receive(AbstractPlainDatagramSocketImpl.java:145) at java.net.DatagramSocket.receive(DatagramSocket.java:786) at com.sun.jndi.dns.DnsClient.doUdpQuery(DnsClient.java:416) at com.sun.jndi.dns.DnsClient.query(DnsClient.java:210) at com.sun.jndi.dns.Resolver.query(Resolver.java:81) at com.sun.jndi.dns.DnsContext.c_getAttributes(DnsContext.java:430) at com.sun.jndi.toolkit.ctx.ComponentDirContext.p_getAttributes(ComponentDirContext.java:231) at com.sun.jndi.toolkit.ctx.PartialCompositeDirContext.getAttributes(PartialCompositeDirContext.java:139) at com.sun.jndi.toolkit.url.GenericURLDirContext.getAttributes(GenericURLDirContext.java:103) at sun.security.krb5.KrbServiceLocator.getKerberosService(KrbServiceLocator.java:87) at sun.security.krb5.Config.checkRealm(Config.java:1295) at sun.security.krb5.Config.getRealmFromDNS(Config.java:1268) at sun.security.krb5.Config.getDefaultRealm(Config.java:1162) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.security.authentication.util.KerberosUtil.getDefaultRealm(KerberosUtil.java:84) at org.apache.hadoop.security.authentication.util.KerberosName.(KerberosName.java:86) at org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:261) at org.apache.hadoop.security.UserGroupInformation.ensureInitialized(UserGroupInformation.java:248) at org.apache.hadoop.security.UserGroupInformation.loginUserFromSubject(UserGroupInformation.java:763) at org.apache.hadoop.security.UserGroupInformation.getLoginUser(
[jira] [Commented] (TEZ-2189) Tez UI live AM tracking url only works for localhost addresses
[ https://issues.apache.org/jira/browse/TEZ-2189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14360598#comment-14360598 ] Hadoop QA commented on TEZ-2189: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12704452/TEZ-2189.6.patch against master revision 55d7fce. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in : org.apache.tez.client.TestTezClient Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/299//testReport/ Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/299//console This message is automatically generated. > Tez UI live AM tracking url only works for localhost addresses > -- > > Key: TEZ-2189 > URL: https://issues.apache.org/jira/browse/TEZ-2189 > Project: Apache Tez > Issue Type: Bug > Components: UI >Reporter: Jonathan Eagles >Assignee: Jonathan Eagles > Attachments: TEZ-2189.1.patch, TEZ-2189.2.patch, TEZ-2189.3.patch, > TEZ-2189.4.patch, TEZ-2189.5.patch, TEZ-2189.6.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-2061) Tez UI: vertex id column and filter on tasks page should be changed to vertex name
[ https://issues.apache.org/jira/browse/TEZ-2061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14360585#comment-14360585 ] Sreenath Somarajapuram commented on TEZ-2061: - +1 LGTM > Tez UI: vertex id column and filter on tasks page should be changed to vertex > name > -- > > Key: TEZ-2061 > URL: https://issues.apache.org/jira/browse/TEZ-2061 > Project: Apache Tez > Issue Type: Bug > Components: UI >Reporter: Prakash Ramachandran >Assignee: Prakash Ramachandran > Attachments: TEZ-2061.1.patch, TEZ-2061.2.patch > > > VertexId search box is not really useful unless one types in the whole vertex > id. At some point later, vertex name might be a better option. May need > backend changes or could be done on the UI with an additional call to convert > name to id from the dag info. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TEZ-2189) Tez UI live AM tracking url only works for localhost addresses
[ https://issues.apache.org/jira/browse/TEZ-2189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Eagles updated TEZ-2189: - Attachment: TEZ-2189.6.patch [~hitesh], addressed the https issue and added a test case for missing scheme. > Tez UI live AM tracking url only works for localhost addresses > -- > > Key: TEZ-2189 > URL: https://issues.apache.org/jira/browse/TEZ-2189 > Project: Apache Tez > Issue Type: Bug > Components: UI >Reporter: Jonathan Eagles >Assignee: Jonathan Eagles > Attachments: TEZ-2189.1.patch, TEZ-2189.2.patch, TEZ-2189.3.patch, > TEZ-2189.4.patch, TEZ-2189.5.patch, TEZ-2189.6.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Failed: TEZ-2061 PreCommit Build #298
Jira: https://issues.apache.org/jira/browse/TEZ-2061 Build: https://builds.apache.org/job/PreCommit-TEZ-Build/298/ ### ## LAST 60 LINES OF THE CONSOLE ### [...truncated 2752 lines...] {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12704439/TEZ-2061.2.patch against master revision 55d7fce. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/298//testReport/ Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/298//console This message is automatically generated. == == Adding comment to Jira. == == Comment added. d57312d264b5cf6343f9c29fc172c2782c539f91 logged out == == Finished build. == == Build step 'Execute shell' marked build as failure Archiving artifacts Sending artifact delta relative to PreCommit-TEZ-Build #297 Archived 44 artifacts Archive block size is 32768 Received 6 blocks and 2530589 bytes Compression is 7.2% Took 0.97 sec [description-setter] Could not determine description. Recording test results Email was triggered for: Failure Sending email for trigger: Failure ### ## FAILED TESTS (if any) ## All tests passed
[jira] [Commented] (TEZ-2061) Tez UI: vertex id column and filter on tasks page should be changed to vertex name
[ https://issues.apache.org/jira/browse/TEZ-2061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14360517#comment-14360517 ] Hadoop QA commented on TEZ-2061: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12704439/TEZ-2061.2.patch against master revision 55d7fce. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/298//testReport/ Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/298//console This message is automatically generated. > Tez UI: vertex id column and filter on tasks page should be changed to vertex > name > -- > > Key: TEZ-2061 > URL: https://issues.apache.org/jira/browse/TEZ-2061 > Project: Apache Tez > Issue Type: Bug > Components: UI >Reporter: Prakash Ramachandran >Assignee: Prakash Ramachandran > Attachments: TEZ-2061.1.patch, TEZ-2061.2.patch > > > VertexId search box is not really useful unless one types in the whole vertex > id. At some point later, vertex name might be a better option. May need > backend changes or could be done on the UI with an additional call to convert > name to id from the dag info. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TEZ-2061) Tez UI: vertex id column and filter on tasks page should be changed to vertex name
[ https://issues.apache.org/jira/browse/TEZ-2061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prakash Ramachandran updated TEZ-2061: -- Attachment: TEZ-2061.2.patch addressed comments > Tez UI: vertex id column and filter on tasks page should be changed to vertex > name > -- > > Key: TEZ-2061 > URL: https://issues.apache.org/jira/browse/TEZ-2061 > Project: Apache Tez > Issue Type: Bug > Components: UI >Reporter: Prakash Ramachandran >Assignee: Prakash Ramachandran > Attachments: TEZ-2061.1.patch, TEZ-2061.2.patch > > > VertexId search box is not really useful unless one types in the whole vertex > id. At some point later, vertex name might be a better option. May need > backend changes or could be done on the UI with an additional call to convert > name to id from the dag info. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-160) Remove 5 second sleep at the end of AM completion.
[ https://issues.apache.org/jira/browse/TEZ-160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14360423#comment-14360423 ] André Kelpe commented on TEZ-160: - Could the sleep period be made configurable until this is fixed correctly? We have a test suite with a few thousand dags and waiting 5 extra seconds for every one of them adds a lot of wall-clock time. > Remove 5 second sleep at the end of AM completion. > -- > > Key: TEZ-160 > URL: https://issues.apache.org/jira/browse/TEZ-160 > Project: Apache Tez > Issue Type: Bug >Reporter: Siddharth Seth > Labels: TEZ-0.2.0 > > ClientServiceDelegate/DAGClient doesn't seem to be getting job completion > status from the AM after job completion. It, instead, always relies on the RM > for this information. The information returned by the AM should be used while > it's available. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-2061) Tez UI: vertex id column and filter on tasks page should be changed to vertex name
[ https://issues.apache.org/jira/browse/TEZ-2061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14360418#comment-14360418 ] Sreenath Somarajapuram commented on TEZ-2061: - Please add vertex name to tasks_controller.js also. > Tez UI: vertex id column and filter on tasks page should be changed to vertex > name > -- > > Key: TEZ-2061 > URL: https://issues.apache.org/jira/browse/TEZ-2061 > Project: Apache Tez > Issue Type: Bug > Components: UI >Reporter: Prakash Ramachandran >Assignee: Prakash Ramachandran > Attachments: TEZ-2061.1.patch > > > VertexId search box is not really useful unless one types in the whole vertex > id. At some point later, vertex name might be a better option. May need > backend changes or could be done on the UI with an additional call to convert > name to id from the dag info. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-2189) Tez UI live AM tracking url only works for localhost addresses
[ https://issues.apache.org/jira/browse/TEZ-2189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14360295#comment-14360295 ] Hitesh Shah commented on TEZ-2189: -- Minor nit: {code} if (!historyUrl.isEmpty() && !historyUrl.startsWith("http://";)) { {code} - above doesn't handle https - we should either just check startsWith "http" instead of "http://"; or convert to URI, check for presence/absence of a scheme before prefixing http as a default? Future jira: - AM webapp tracking url does not account for running with https enabled. We hardcode the tracking url to use http. > Tez UI live AM tracking url only works for localhost addresses > -- > > Key: TEZ-2189 > URL: https://issues.apache.org/jira/browse/TEZ-2189 > Project: Apache Tez > Issue Type: Bug > Components: UI >Reporter: Jonathan Eagles >Assignee: Jonathan Eagles > Attachments: TEZ-2189.1.patch, TEZ-2189.2.patch, TEZ-2189.3.patch, > TEZ-2189.4.patch, TEZ-2189.5.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TEZ-2061) Tez UI: vertex id column and filter on tasks page should be changed to vertex name
[ https://issues.apache.org/jira/browse/TEZ-2061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prakash Ramachandran updated TEZ-2061: -- Attachment: TEZ-2061.1.patch * changed the column for task and task attempts on dag page to show vertex name instead of vertex id. * changed search to search by name instead of id. [~Sreenath] can you review? > Tez UI: vertex id column and filter on tasks page should be changed to vertex > name > -- > > Key: TEZ-2061 > URL: https://issues.apache.org/jira/browse/TEZ-2061 > Project: Apache Tez > Issue Type: Bug > Components: UI >Reporter: Prakash Ramachandran >Assignee: Prakash Ramachandran > Attachments: TEZ-2061.1.patch > > > VertexId search box is not really useful unless one types in the whole vertex > id. At some point later, vertex name might be a better option. May need > backend changes or could be done on the UI with an additional call to convert > name to id from the dag info. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-2198) Fix sorter spill counts
[ https://issues.apache.org/jira/browse/TEZ-2198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14360173#comment-14360173 ] Gopal V commented on TEZ-2198: -- That exact update means that a clear recommendation can be made on whether to use this optimization or not by simply checking the ADDITIONAL_SPILL_COUNT & once it is active ADDITIONAL_SPILL_COUNT will always be zero. That makes it easy to check whether pipelined-shuffle is active & to predict whether it adds any benefit for a given case. > Fix sorter spill counts > --- > > Key: TEZ-2198 > URL: https://issues.apache.org/jira/browse/TEZ-2198 > Project: Apache Tez > Issue Type: Improvement >Reporter: Rajesh Balamohan > > Prior to pipelined shuffle, tez merged all spilled data into a single file. > This ended up creating one index file and one output file. In this context, > TaskCounter.ADDITIONAL_SPILL_COUNT was referred as the number of additional > spills and there was no counter needed to track the number of merges. > With pipelined shuffle, there is no final merge and ADDITIONAL_SPILL_COUNT > would be misleading, as these spills are direct output files which are > consumed by the consumers. > It would be good to have the following > - ADDITIONAL_SPILL_COUNT: represents the spills that are needed by the task > to generate the final merged output > - TOTAL_SPILLS: represents the total number of shuffle directories (index + > output files) that got created at the end of processing. > For e.g, Assume sorter generated 5 spills in an attempt > Without pipelining: > == > ADDITIONAL_SPILL_COUNT = 5 <-- Additional spills involved in sorting > TOTAL_SPILLS = 1 <-- Final merged output > With pipelining: > > ADDITIONAL_SPILL_COUNT = 5 <-- Additional spills involved in sorting > TOTAL_SPILLS = 0 <--- No final output -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-2198) Fix sorter spill counts
[ https://issues.apache.org/jira/browse/TEZ-2198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14360170#comment-14360170 ] Gopal V commented on TEZ-2198: -- [~rajesh.balamohan]: The example seems to not match the description. It should be With pipelining : === ADDITIONAL_SPILL_COUNT = 0 <-- Additional spills involved in sorting TOTAL_SPILL_COUNT = 5 <--- All spills are in task are final The easier thing to remember is that ADDITIONAL_SPILL_COUNT includes only spills which are read by the same task that produced the spill, because they are additional read IO in the output phase. The TOTAL_SPILL_COUNT is the number of files being offered via shuffle-handler (indirectly related to the number of DME events & shuffle fetcher requests). > Fix sorter spill counts > --- > > Key: TEZ-2198 > URL: https://issues.apache.org/jira/browse/TEZ-2198 > Project: Apache Tez > Issue Type: Improvement >Reporter: Rajesh Balamohan > > Prior to pipelined shuffle, tez merged all spilled data into a single file. > This ended up creating one index file and one output file. In this context, > TaskCounter.ADDITIONAL_SPILL_COUNT was referred as the number of additional > spills and there was no counter needed to track the number of merges. > With pipelined shuffle, there is no final merge and ADDITIONAL_SPILL_COUNT > would be misleading, as these spills are direct output files which are > consumed by the consumers. > It would be good to have the following > - ADDITIONAL_SPILL_COUNT: represents the spills that are needed by the task > to generate the final merged output > - TOTAL_SPILLS: represents the total number of shuffle directories (index + > output files) that got created at the end of processing. > For e.g, Assume sorter generated 5 spills in an attempt > Without pipelining: > == > ADDITIONAL_SPILL_COUNT = 5 <-- Additional spills involved in sorting > TOTAL_SPILLS = 1 <-- Final merged output > With pipelining: > > ADDITIONAL_SPILL_COUNT = 5 <-- Additional spills involved in sorting > TOTAL_SPILLS = 0 <--- No final output -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-1421) MRCombiner throws NPE in MapredWordCount on master branch
[ https://issues.apache.org/jira/browse/TEZ-1421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14360111#comment-14360111 ] Tsuyoshi Ozawa commented on TEZ-1421: - I've investigated this deeply: this bug happens when TEZ_RUNTIME_COMBINER_CLASS is set, but MRJobConfig.COMBINE_CLASS_ATTR or "mapred.combiner.class" is null. I'll check code of MRHelpers. > MRCombiner throws NPE in MapredWordCount on master branch > - > > Key: TEZ-1421 > URL: https://issues.apache.org/jira/browse/TEZ-1421 > Project: Apache Tez > Issue Type: Bug >Reporter: Tsuyoshi Ozawa >Assignee: Tsuyoshi Ozawa >Priority: Blocker > > I tested MapredWordCount against 70GB generated by RandowTextWriter. When a > Combiner runs, it throws NPE. It looks setCombinerClass doesn't work > correctly. > {quote} > Caused by: java.lang.RuntimeException: java.lang.NullPointerException > at > org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:131) > at > org.apache.tez.mapreduce.combine.MRCombiner.runOldCombiner(MRCombiner.java:122) > at org.apache.tez.mapreduce.combine.MRCombiner.combine(MRCombiner.java:112) > at > org.apache.tez.runtime.library.common.shuffle.impl.MergeManager.runCombineProcessor(MergeManager.java:472) > at > org.apache.tez.runtime.library.common.shuffle.impl.MergeManager$InMemoryMerger.merge(MergeManager.java:605) > at > org.apache.tez.runtime.library.common.shuffle.impl.MergeThread.run(MergeThread.java:89) > {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-2191) Simulation improvements to MockDAGAppMaster
[ https://issues.apache.org/jira/browse/TEZ-2191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14360106#comment-14360106 ] Rajesh Balamohan commented on TEZ-2191: --- +1. lgtm. - heartbeatTime, heartbeatCPU times are not used in the testcases. Is the intention to make use of it on need basis later? Also, it is in microseconds accuracy. Is that intentional? > Simulation improvements to MockDAGAppMaster > --- > > Key: TEZ-2191 > URL: https://issues.apache.org/jira/browse/TEZ-2191 > Project: Apache Tez > Issue Type: Task >Reporter: Bikas Saha >Assignee: Bikas Saha > Attachments: TEZ-2191.1.patch, TEZ-2191.2.patch, TEZ-2191.3.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-776) Reduce AM mem usage caused by storing TezEvents
[ https://issues.apache.org/jira/browse/TEZ-776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14360078#comment-14360078 ] Siddharth Seth commented on TEZ-776: Please see my first comment on the document posted - questioning the CPU efficiency of the ODR approach. *This is converting, what is primarily a MXN memory problem, into a MXN CPU problem.* That’s an approach, which I wouldn’t even consider, except for the fact - that we already have an (unnecessary) MXN CPU issue for ScatterGather edges - which I didn’t realize earlier - and that single case becomes better in terms of memory. For other edge types - they in fact move from a < MXN memory/CPU issue to a guaranteed MXN CPU issue. This forces CPU inefficiency on ALL edge types. Introducing a N^2 algorithm (where N is non-trivial), when a more optimal approach exists, is not the right way to go. The fact that routing is a fraction of AM CPU, to me, says that we have other avenues to improve CPU utilization along with memory, rather than using this as justification to put in an inefficient algorithm. There's numbers posted previously which show CPU efficiency improving marginally or remaining roughly the same for ScatterGather, but degrading quite a bit for OneToOne. If there were no API changes involved - this can be iterated upon more easily, since it does improve things for the most commonly used case and users wouldn't know the difference. However, API changes are involved here - which are avoidable, and are also required in the approach of moving events into the edge. Hence my previous comment and suggestion. bq. some yet to be built concept. Other approaches could be implemented in full, tested, profiled and verified I’m at a loss here. Are you suggesting that we discuss options based off of patches ? Surely we can reason about and discuss alternate approaches without code changes being in place ? I'm sure it makes sense for you to go ahead and iterate on the approach, test it etc. However, if there's alternates being discussed from day1, which haven't been fully discussed - there is a chance that the final approach and patch will need to change. > Reduce AM mem usage caused by storing TezEvents > --- > > Key: TEZ-776 > URL: https://issues.apache.org/jira/browse/TEZ-776 > Project: Apache Tez > Issue Type: Sub-task >Reporter: Siddharth Seth >Assignee: Bikas Saha > Attachments: TEZ-776.ondemand.1.patch, TEZ-776.ondemand.2.patch, > TEZ-776.ondemand.3.patch, TEZ-776.ondemand.4.patch, TEZ-776.ondemand.5.patch, > TEZ-776.ondemand.6.patch, TEZ-776.ondemand.patch, With_Patch_AM_hotspots.png, > With_Patch_AM_profile.png, Without_patch_AM_CPU_Usage.png, > events-problem-solutions.txt, with_patch_jmc_output_of_AM.png, > without_patch_jmc_output_of_AM.png > > > This is open ended at the moment. > A fair chunk of the AM heap is taken up by TezEvents (specifically > DataMovementEvents - 64 bytes per event). > Depending on the connection pattern - this puts limits on the number of tasks > that can be processed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (TEZ-2198) Fix sorter spill counts
Rajesh Balamohan created TEZ-2198: - Summary: Fix sorter spill counts Key: TEZ-2198 URL: https://issues.apache.org/jira/browse/TEZ-2198 Project: Apache Tez Issue Type: Improvement Reporter: Rajesh Balamohan Prior to pipelined shuffle, tez merged all spilled data into a single file. This ended up creating one index file and one output file. In this context, TaskCounter.ADDITIONAL_SPILL_COUNT was referred as the number of additional spills and there was no counter needed to track the number of merges. With pipelined shuffle, there is no final merge and ADDITIONAL_SPILL_COUNT would be misleading, as these spills are direct output files which are consumed by the consumers. It would be good to have the following - ADDITIONAL_SPILL_COUNT: represents the spills that are needed by the task to generate the final merged output - TOTAL_SPILLS: represents the total number of shuffle directories (index + output files) that got created at the end of processing. For e.g, Assume sorter generated 5 spills in an attempt Without pipelining: == ADDITIONAL_SPILL_COUNT = 5 <-- Additional spills involved in sorting TOTAL_SPILLS = 1 <-- Final merged output With pipelining: ADDITIONAL_SPILL_COUNT = 5 <-- Additional spills involved in sorting TOTAL_SPILLS = 0 <--- No final output -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Success: TEZ-1909 PreCommit Build #297
Jira: https://issues.apache.org/jira/browse/TEZ-1909 Build: https://builds.apache.org/job/PreCommit-TEZ-Build/297/ ### ## LAST 60 LINES OF THE CONSOLE ### [...truncated 2755 lines...] {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12704361/TEZ-1909-2.patch against master revision 55d7fce. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/297//testReport/ Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/297//console This message is automatically generated. == == Adding comment to Jira. == == Comment added. 53b683416b271c75c5cf70cc6d1cb7b38a777a16 logged out == == Finished build. == == Archiving artifacts Sending artifact delta relative to PreCommit-TEZ-Build #296 Archived 44 artifacts Archive block size is 32768 Received 6 blocks and 2544544 bytes Compression is 7.2% Took 10 min Description set: TEZ-1909 Recording test results Email was triggered for: Success Sending email for trigger: Success ERROR: H0 is offline; cannot locate JDK 1.7 (latest) ERROR: H0 is offline; cannot locate JDK 1.7 (latest) ERROR: H0 is offline; cannot locate JDK 1.7 (latest) ERROR: H0 is offline; cannot locate JDK 1.7 (latest) ### ## FAILED TESTS (if any) ## All tests passed