[jira] [Created] (TEZ-2948) Stop using dagName in the dagComplete notification to TaskCommunicators
Siddharth Seth created TEZ-2948: --- Summary: Stop using dagName in the dagComplete notification to TaskCommunicators Key: TEZ-2948 URL: https://issues.apache.org/jira/browse/TEZ-2948 Project: Apache Tez Issue Type: Task Affects Versions: 0.8.0-alpha Reporter: Siddharth Seth Assignee: Siddharth Seth -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (TEZ-2480) TEZ-2003: exception when closing output (ignored)
[ https://issues.apache.org/jira/browse/TEZ-2480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth reassigned TEZ-2480: --- Assignee: Siddharth Seth > TEZ-2003: exception when closing output (ignored) > - > > Key: TEZ-2480 > URL: https://issues.apache.org/jira/browse/TEZ-2480 > Project: Apache Tez > Issue Type: Bug >Affects Versions: TEZ-2003 >Reporter: Sergey Shelukhin >Assignee: Siddharth Seth > Attachments: TEZ-2480.1.txt > > > Happens a lot in some queries: > {noformat} > sershe_20150522112029_d0863b33-8d2f-4b4c-b013-9ef70a2bc586:1_Map 1_8_0)] WARN > org.apache.tez.runtime.LogicalIOProcessorRuntimeTask: Ignoring exception when > closing output Reducer 2(cleanup). Exception > class=java.lang.NullPointerException, message=null > java.lang.NullPointerException > at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:618) > at > org.apache.hadoop.fs.RawLocalFileSystem.pathToFile(RawLocalFileSystem.java:81) > at > org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:613) > at > org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:831) > at > org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:608) > at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1425) > at > org.apache.hadoop.fs.RawLocalFileSystem.open(RawLocalFileSystem.java:198) > at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:768) > at > org.apache.tez.runtime.library.common.sort.impl.TezSpillRecord.(TezSpillRecord.java:64) > at > org.apache.tez.runtime.library.common.sort.impl.TezSpillRecord.(TezSpillRecord.java:56) > at > org.apache.tez.runtime.library.common.sort.impl.TezSpillRecord.(TezSpillRecord.java:51) > at > org.apache.tez.runtime.library.output.OrderedPartitionedKVOutput.generateEvents(OrderedPartitionedKVOutput.java:209) > at > org.apache.tez.runtime.library.output.OrderedPartitionedKVOutput.close(OrderedPartitionedKVOutput.java:186) > at > org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.cleanup(LogicalIOProcessorRuntimeTask.java:849) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:104) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:35) > at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > {noformat} > Can this be fixed or not logged? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-2480) TEZ-2003: exception when closing output (ignored)
[ https://issues.apache.org/jira/browse/TEZ-2480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15011768#comment-15011768 ] Sergey Shelukhin commented on TEZ-2480: --- +1 > TEZ-2003: exception when closing output (ignored) > - > > Key: TEZ-2480 > URL: https://issues.apache.org/jira/browse/TEZ-2480 > Project: Apache Tez > Issue Type: Bug >Affects Versions: TEZ-2003 >Reporter: Sergey Shelukhin >Assignee: Siddharth Seth > Attachments: TEZ-2480.1.txt > > > Happens a lot in some queries: > {noformat} > sershe_20150522112029_d0863b33-8d2f-4b4c-b013-9ef70a2bc586:1_Map 1_8_0)] WARN > org.apache.tez.runtime.LogicalIOProcessorRuntimeTask: Ignoring exception when > closing output Reducer 2(cleanup). Exception > class=java.lang.NullPointerException, message=null > java.lang.NullPointerException > at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:618) > at > org.apache.hadoop.fs.RawLocalFileSystem.pathToFile(RawLocalFileSystem.java:81) > at > org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:613) > at > org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:831) > at > org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:608) > at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1425) > at > org.apache.hadoop.fs.RawLocalFileSystem.open(RawLocalFileSystem.java:198) > at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:768) > at > org.apache.tez.runtime.library.common.sort.impl.TezSpillRecord.(TezSpillRecord.java:64) > at > org.apache.tez.runtime.library.common.sort.impl.TezSpillRecord.(TezSpillRecord.java:56) > at > org.apache.tez.runtime.library.common.sort.impl.TezSpillRecord.(TezSpillRecord.java:51) > at > org.apache.tez.runtime.library.output.OrderedPartitionedKVOutput.generateEvents(OrderedPartitionedKVOutput.java:209) > at > org.apache.tez.runtime.library.output.OrderedPartitionedKVOutput.close(OrderedPartitionedKVOutput.java:186) > at > org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.cleanup(LogicalIOProcessorRuntimeTask.java:849) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:104) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:35) > at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > {noformat} > Can this be fixed or not logged? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-2948) Stop using dagName in the dagComplete notification to TaskCommunicators
[ https://issues.apache.org/jira/browse/TEZ-2948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15011797#comment-15011797 ] Hitesh Shah commented on TEZ-2948: -- +1 pending pre-commit > Stop using dagName in the dagComplete notification to TaskCommunicators > --- > > Key: TEZ-2948 > URL: https://issues.apache.org/jira/browse/TEZ-2948 > Project: Apache Tez > Issue Type: Task >Affects Versions: 0.8.0-alpha >Reporter: Siddharth Seth >Assignee: Siddharth Seth > Attachments: TEZ-2948.1.txt > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (TEZ-2949) Allow duplicate dag names within session for Tez
Hitesh Shah created TEZ-2949: Summary: Allow duplicate dag names within session for Tez Key: TEZ-2949 URL: https://issues.apache.org/jira/browse/TEZ-2949 Project: Apache Tez Issue Type: Bug Reporter: Hitesh Shah Hive would like to support setting hive.query.name ( HIVE-12357 ) by users. Hence this will create dag name clashes. This jira is to relax the dag name uniqueness requirement. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Failed: TEZ-2480 PreCommit Build #1323
Jira: https://issues.apache.org/jira/browse/TEZ-2480 Build: https://builds.apache.org/job/PreCommit-TEZ-Build/1323/ ### ## LAST 60 LINES OF THE CONSOLE ### [...truncated 2693 lines...] [ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException [ERROR] [ERROR] After correcting the problems, you can resume the build with the command [ERROR] mvn -rf :tez-runtime-internals {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12773051/TEZ-2480.1.txt against master revision e5e4fc7. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 3.0.1) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in : org.apache.tez.runtime.api.impl.TestProcessorContext Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/1323//testReport/ Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/1323//console This message is automatically generated. == == Adding comment to Jira. == == Comment added. 3d02b8709a5867f7165e36f23da23441d4e3df47 logged out == == Finished build. == == Build step 'Execute shell' marked build as failure Archiving artifacts [description-setter] Could not determine description. Recording test results Email was triggered for: Failure - Any Sending email for trigger: Failure - Any ### ## FAILED TESTS (if any) ## 1 tests failed. FAILED: org.apache.tez.runtime.api.impl.TestProcessorContext.testDagNumber Error Message: null Stack Trace: java.lang.NullPointerException: null at org.apache.tez.runtime.RuntimeTask.notifyProgressInvocation(RuntimeTask.java:109) at org.apache.tez.runtime.api.impl.TezTaskContextImpl.notifyProgress(TezTaskContextImpl.java:178) at org.apache.tez.runtime.api.impl.TezProcessorContextImpl.setProgress(TezProcessorContextImpl.java:97) at org.apache.tez.runtime.api.impl.TestProcessorContext.testDagNumber(TestProcessorContext.java:101)
[jira] [Created] (TEZ-2946) Tez UI: At times RM return a huge error message making the yellow error bar to fill the whole screen.
Sreenath Somarajapuram created TEZ-2946: --- Summary: Tez UI: At times RM return a huge error message making the yellow error bar to fill the whole screen. Key: TEZ-2946 URL: https://issues.apache.org/jira/browse/TEZ-2946 Project: Apache Tez Issue Type: Bug Reporter: Sreenath Somarajapuram -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (TEZ-2947) Tez UI: Timeline, RM & AM requests gets into a consecutive loop in counters page without any delay
Sreenath Somarajapuram created TEZ-2947: --- Summary: Tez UI: Timeline, RM & AM requests gets into a consecutive loop in counters page without any delay Key: TEZ-2947 URL: https://issues.apache.org/jira/browse/TEZ-2947 Project: Apache Tez Issue Type: Bug Reporter: Sreenath Somarajapuram -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TEZ-2948) Stop using dagName in the dagComplete notification to TaskCommunicators
[ https://issues.apache.org/jira/browse/TEZ-2948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth updated TEZ-2948: Attachment: TEZ-2948.1.txt Moves to using the dag index - which will be unique within an application. [~hitesh] - please review. > Stop using dagName in the dagComplete notification to TaskCommunicators > --- > > Key: TEZ-2948 > URL: https://issues.apache.org/jira/browse/TEZ-2948 > Project: Apache Tez > Issue Type: Task >Affects Versions: 0.8.0-alpha >Reporter: Siddharth Seth >Assignee: Siddharth Seth > Attachments: TEZ-2948.1.txt > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-2480) TEZ-2003: exception when closing output (ignored)
[ https://issues.apache.org/jira/browse/TEZ-2480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15011861#comment-15011861 ] TezQA commented on TEZ-2480: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12773051/TEZ-2480.1.txt against master revision e5e4fc7. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 3.0.1) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in : org.apache.tez.runtime.api.impl.TestProcessorContext Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/1323//testReport/ Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/1323//console This message is automatically generated. > TEZ-2003: exception when closing output (ignored) > - > > Key: TEZ-2480 > URL: https://issues.apache.org/jira/browse/TEZ-2480 > Project: Apache Tez > Issue Type: Bug >Affects Versions: TEZ-2003 >Reporter: Sergey Shelukhin >Assignee: Siddharth Seth > Attachments: TEZ-2480.1.txt > > > Happens a lot in some queries: > {noformat} > sershe_20150522112029_d0863b33-8d2f-4b4c-b013-9ef70a2bc586:1_Map 1_8_0)] WARN > org.apache.tez.runtime.LogicalIOProcessorRuntimeTask: Ignoring exception when > closing output Reducer 2(cleanup). Exception > class=java.lang.NullPointerException, message=null > java.lang.NullPointerException > at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:618) > at > org.apache.hadoop.fs.RawLocalFileSystem.pathToFile(RawLocalFileSystem.java:81) > at > org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:613) > at > org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:831) > at > org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:608) > at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1425) > at > org.apache.hadoop.fs.RawLocalFileSystem.open(RawLocalFileSystem.java:198) > at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:768) > at > org.apache.tez.runtime.library.common.sort.impl.TezSpillRecord.(TezSpillRecord.java:64) > at > org.apache.tez.runtime.library.common.sort.impl.TezSpillRecord.(TezSpillRecord.java:56) > at > org.apache.tez.runtime.library.common.sort.impl.TezSpillRecord.(TezSpillRecord.java:51) > at > org.apache.tez.runtime.library.output.OrderedPartitionedKVOutput.generateEvents(OrderedPartitionedKVOutput.java:209) > at > org.apache.tez.runtime.library.output.OrderedPartitionedKVOutput.close(OrderedPartitionedKVOutput.java:186) > at > org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.cleanup(LogicalIOProcessorRuntimeTask.java:849) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:104) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:35) > at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > {noformat} > Can this be fixed or not logged? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TEZ-2949) Allow duplicate dag names within session for Tez
[ https://issues.apache.org/jira/browse/TEZ-2949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Shah updated TEZ-2949: - Hadoop Flags: Incompatible change > Allow duplicate dag names within session for Tez > > > Key: TEZ-2949 > URL: https://issues.apache.org/jira/browse/TEZ-2949 > Project: Apache Tez > Issue Type: Bug >Reporter: Hitesh Shah > > Hive would like to support setting hive.query.name ( HIVE-12357 ) by users. > Hence this will create dag name clashes. This jira is to relax the dag name > uniqueness requirement. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TEZ-2949) Allow duplicate dag names within session for Tez
[ https://issues.apache.org/jira/browse/TEZ-2949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Shah updated TEZ-2949: - Attachment: TEZ-2949.1.patch > Allow duplicate dag names within session for Tez > > > Key: TEZ-2949 > URL: https://issues.apache.org/jira/browse/TEZ-2949 > Project: Apache Tez > Issue Type: Bug >Reporter: Hitesh Shah > Attachments: TEZ-2949.1.patch > > > Hive would like to support setting hive.query.name ( HIVE-12357 ) by users. > Hence this will create dag name clashes. This jira is to relax the dag name > uniqueness requirement. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Failed: TEZ-2948 PreCommit Build #1324
Jira: https://issues.apache.org/jira/browse/TEZ-2948 Build: https://builds.apache.org/job/PreCommit-TEZ-Build/1324/ ### ## LAST 60 LINES OF THE CONSOLE ### [...truncated 2511 lines...] [ERROR] For more information about the errors and possible solutions, please read the following articles: [ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException [ERROR] [ERROR] After correcting the problems, you can resume the build with the command [ERROR] mvn -rf :tez-api {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12773050/TEZ-2948.1.txt against master revision e5e4fc7. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 3.0.1) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in : org.apache.tez.client.TestTezClient Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/1324//testReport/ Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/1324//console This message is automatically generated. == == Adding comment to Jira. == == Comment added. fc9809f0f7925865f7aafa952cb9741e97c1043e logged out == == Finished build. == == Build step 'Execute shell' marked build as failure Archiving artifacts Compressed 3.09 MB of artifacts by 21.3% relative to #1306 [description-setter] Could not determine description. Recording test results Email was triggered for: Failure - Any Sending email for trigger: Failure - Any ### ## FAILED TESTS (if any) ## 1 tests failed. FAILED: org.apache.tez.client.TestTezClient.testStopRetriesUntilTimeout Error Message: test timed out after 5000 milliseconds Stack Trace: java.lang.Exception: test timed out after 5000 milliseconds at java.lang.Thread.sleep(Native Method) at org.apache.tez.client.TezClient.stop(TezClient.java:589) at org.apache.tez.client.TestTezClient.testStopRetriesUntilTimeout(TestTezClient.java:557)
[jira] [Commented] (TEZ-2948) Stop using dagName in the dagComplete notification to TaskCommunicators
[ https://issues.apache.org/jira/browse/TEZ-2948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15011886#comment-15011886 ] TezQA commented on TEZ-2948: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12773050/TEZ-2948.1.txt against master revision e5e4fc7. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 3.0.1) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in : org.apache.tez.client.TestTezClient Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/1324//testReport/ Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/1324//console This message is automatically generated. > Stop using dagName in the dagComplete notification to TaskCommunicators > --- > > Key: TEZ-2948 > URL: https://issues.apache.org/jira/browse/TEZ-2948 > Project: Apache Tez > Issue Type: Task >Affects Versions: 0.8.0-alpha >Reporter: Siddharth Seth >Assignee: Siddharth Seth > Attachments: TEZ-2948.1.txt > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (TEZ-2877) Tez UI: Remove duplicate error handling code
[ https://issues.apache.org/jira/browse/TEZ-2877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sreenath Somarajapuram reassigned TEZ-2877: --- Assignee: Sreenath Somarajapuram > Tez UI: Remove duplicate error handling code > > > Key: TEZ-2877 > URL: https://issues.apache.org/jira/browse/TEZ-2877 > Project: Apache Tez > Issue Type: Bug >Reporter: Sreenath Somarajapuram >Assignee: Sreenath Somarajapuram >Priority: Minor > > 1. Code to display error log and error bar is duplicated everywhere. Create a > helper function that accepts an error object and does the same. Also replace > all the duplicate code with this function. > 2. When ATS is down, I see the following message: > "error code: Unknown, message: Error while loading tez-app.index. > Could not retrieve expected data from Timeline Server @ > http://localhost:8188/ws/v1/timeline/TEZ_APPLICATION/tez_application_1447798385040_0001; > It seems wrong to print unknown error code. > 3. "Info! Could not fetch application info from RM (yarn system metrics > publishing might be disabled), some details might be missing" > This message should be changed to "Info! Could not fetch application info > from YARN RM/Timeline (yarn system metrics publishing might be disabled), > some details might be missing" -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TEZ-2877) Tez UI: Remove duplicate error handling code
[ https://issues.apache.org/jira/browse/TEZ-2877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sreenath Somarajapuram updated TEZ-2877: Description: 1. Code to display error log and error bar is duplicated everywhere. Create a helper function that accepts an error object and does the same. Also replace all the duplicate code with this function. 2. When ATS is down, I see the following message: "error code: Unknown, message: Error while loading tez-app.index. Could not retrieve expected data from Timeline Server @ http://localhost:8188/ws/v1/timeline/TEZ_APPLICATION/tez_application_1447798385040_0001; It seems wrong to print unknown error code. 3. "Info! Could not fetch application info from RM (yarn system metrics publishing might be disabled), some details might be missing" This message should be changed to "Info! Could not fetch application info from YARN RM/Timeline (yarn system metrics publishing might be disabled), some details might be missing" was:Code to display error log and error bar is duplicated everywhere. Create a helper function that accepts an error object and does the same. Also replace all the duplicate code with this function. > Tez UI: Remove duplicate error handling code > > > Key: TEZ-2877 > URL: https://issues.apache.org/jira/browse/TEZ-2877 > Project: Apache Tez > Issue Type: Bug >Reporter: Sreenath Somarajapuram >Priority: Minor > > 1. Code to display error log and error bar is duplicated everywhere. Create a > helper function that accepts an error object and does the same. Also replace > all the duplicate code with this function. > 2. When ATS is down, I see the following message: > "error code: Unknown, message: Error while loading tez-app.index. > Could not retrieve expected data from Timeline Server @ > http://localhost:8188/ws/v1/timeline/TEZ_APPLICATION/tez_application_1447798385040_0001; > It seems wrong to print unknown error code. > 3. "Info! Could not fetch application info from RM (yarn system metrics > publishing might be disabled), some details might be missing" > This message should be changed to "Info! Could not fetch application info > from YARN RM/Timeline (yarn system metrics publishing might be disabled), > some details might be missing" -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (TEZ-2952) NPE in TestOnFileUnorderedKVOutput
Jeff Zhang created TEZ-2952: --- Summary: NPE in TestOnFileUnorderedKVOutput Key: TEZ-2952 URL: https://issues.apache.org/jira/browse/TEZ-2952 Project: Apache Tez Issue Type: Bug Reporter: Jeff Zhang https://builds.apache.org/job/Tez-Build/1316/console {code} testWithPipelinedShuffle(org.apache.tez.runtime.library.output.TestOnFileUnorderedKVOutput) Time elapsed: 0.815 sec <<< ERROR! java.lang.NullPointerException: null at org.apache.tez.runtime.RuntimeTask.notifyProgressInvocation(RuntimeTask.java:109) at org.apache.tez.runtime.api.impl.TezTaskContextImpl.notifyProgress(TezTaskContextImpl.java:178) at org.apache.tez.runtime.library.common.writers.UnorderedPartitionedKVWriter.write(UnorderedPartitionedKVWriter.java:323) at org.apache.tez.runtime.library.common.writers.UnorderedPartitionedKVWriter.write(UnorderedPartitionedKVWriter.java:256) at org.apache.tez.runtime.library.output.TestOnFileUnorderedKVOutput.testWithPipelinedShuffle(TestOnFileUnorderedKVOutput.java:179) testGeneratedDataMovementEvent(org.apache.tez.runtime.library.output.TestOnFileUnorderedKVOutput) Time elapsed: 0.082 sec <<< ERROR! java.lang.NullPointerException: null at org.apache.tez.runtime.RuntimeTask.notifyProgressInvocation(RuntimeTask.java:109) at org.apache.tez.runtime.api.impl.TezTaskContextImpl.notifyProgress(TezTaskContextImpl.java:178) at org.apache.tez.runtime.library.common.writers.UnorderedPartitionedKVWriter.write(UnorderedPartitionedKVWriter.java:253) at org.apache.tez.runtime.library.output.TestOnFileUnorderedKVOutput.testGeneratedDataMovementEvent(TestOnFileUnorderedKVOutput.java:138) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-2950) Poor performance of UnorderedPartitionedKVWriter
[ https://issues.apache.org/jira/browse/TEZ-2950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15012492#comment-15012492 ] Rajesh Balamohan commented on TEZ-2950: --- sizePerBuffer seems to be low as per the log. Can you please check with tez.task.scale.memory.ratios="PARTITIONED_UNSORTED_OUTPUT:12,UNSORTED_INPUT:1,UNSORTED_OUTPUT:1,SORTED_OUTPUT:12,SORTED_MERGED_INPUT:12,PROCESSOR:1,OTHER:4" > Poor performance of UnorderedPartitionedKVWriter > > > Key: TEZ-2950 > URL: https://issues.apache.org/jira/browse/TEZ-2950 > Project: Apache Tez > Issue Type: Bug >Reporter: Rohini Palaniswamy > > Came across a job which was taking a long time in > UnorderedPartitionedKVWriter.mergeAll. It was decompressing and reading data > from spill files (8500 spills) and then writing the final compressed merge > file. Why do we need spill files for UnorderedPartitionedKVWriter? Why not > just buffer and keep directly writing to the final file which will save a lot > of time. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-2950) Poor performance of UnorderedPartitionedKVWriter
[ https://issues.apache.org/jira/browse/TEZ-2950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15012291#comment-15012291 ] Gopal V commented on TEZ-2950: -- [~rohini]: this is already implemented for UnorderedPartitioned, right? set tez.runtime.enable.final-merge.in.output = false; > Poor performance of UnorderedPartitionedKVWriter > > > Key: TEZ-2950 > URL: https://issues.apache.org/jira/browse/TEZ-2950 > Project: Apache Tez > Issue Type: Bug >Reporter: Rohini Palaniswamy > > Came across a job which was taking a long time in > UnorderedPartitionedKVWriter.mergeAll. It was decompressing and reading data > from spill files (8500 spills) and then writing the final compressed merge > file. Why do we need spill files for UnorderedPartitionedKVWriter? Why not > just buffer and keep directly writing to the final file which will save a lot > of time. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-2950) Poor performance of UnorderedPartitionedKVWriter
[ https://issues.apache.org/jira/browse/TEZ-2950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15012390#comment-15012390 ] Rohini Palaniswamy commented on TEZ-2950: - I meant not possible to ask the user to set that and run. Need a new release of Pig for that. > Poor performance of UnorderedPartitionedKVWriter > > > Key: TEZ-2950 > URL: https://issues.apache.org/jira/browse/TEZ-2950 > Project: Apache Tez > Issue Type: Bug >Reporter: Rohini Palaniswamy > > Came across a job which was taking a long time in > UnorderedPartitionedKVWriter.mergeAll. It was decompressing and reading data > from spill files (8500 spills) and then writing the final compressed merge > file. Why do we need spill files for UnorderedPartitionedKVWriter? Why not > just buffer and keep directly writing to the final file which will save a lot > of time. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-2950) Poor performance of UnorderedPartitionedKVWriter
[ https://issues.apache.org/jira/browse/TEZ-2950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15012435#comment-15012435 ] Siddharth Seth commented on TEZ-2950: - bq. The downstream only starts receiving events if the source task completes successfully - this was done to allow for speculative execution. Node failures. Destination receives all events, but processes only some of them before the source node dies. Smaller chance of hitting this compared to pipelined shuffle but it's still possible. > Poor performance of UnorderedPartitionedKVWriter > > > Key: TEZ-2950 > URL: https://issues.apache.org/jira/browse/TEZ-2950 > Project: Apache Tez > Issue Type: Bug >Reporter: Rohini Palaniswamy > > Came across a job which was taking a long time in > UnorderedPartitionedKVWriter.mergeAll. It was decompressing and reading data > from spill files (8500 spills) and then writing the final compressed merge > file. Why do we need spill files for UnorderedPartitionedKVWriter? Why not > just buffer and keep directly writing to the final file which will save a lot > of time. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TEZ-2480) Exception when closing output (ignored)
[ https://issues.apache.org/jira/browse/TEZ-2480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Shah updated TEZ-2480: - Summary: Exception when closing output (ignored) (was: TEZ-2003: exception when closing output (ignored)) > Exception when closing output (ignored) > --- > > Key: TEZ-2480 > URL: https://issues.apache.org/jira/browse/TEZ-2480 > Project: Apache Tez > Issue Type: Bug >Affects Versions: TEZ-2003 >Reporter: Sergey Shelukhin >Assignee: Siddharth Seth > Attachments: TEZ-2480.1.txt > > > Happens a lot in some queries: > {noformat} > sershe_20150522112029_d0863b33-8d2f-4b4c-b013-9ef70a2bc586:1_Map 1_8_0)] WARN > org.apache.tez.runtime.LogicalIOProcessorRuntimeTask: Ignoring exception when > closing output Reducer 2(cleanup). Exception > class=java.lang.NullPointerException, message=null > java.lang.NullPointerException > at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:618) > at > org.apache.hadoop.fs.RawLocalFileSystem.pathToFile(RawLocalFileSystem.java:81) > at > org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:613) > at > org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:831) > at > org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:608) > at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1425) > at > org.apache.hadoop.fs.RawLocalFileSystem.open(RawLocalFileSystem.java:198) > at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:768) > at > org.apache.tez.runtime.library.common.sort.impl.TezSpillRecord.(TezSpillRecord.java:64) > at > org.apache.tez.runtime.library.common.sort.impl.TezSpillRecord.(TezSpillRecord.java:56) > at > org.apache.tez.runtime.library.common.sort.impl.TezSpillRecord.(TezSpillRecord.java:51) > at > org.apache.tez.runtime.library.output.OrderedPartitionedKVOutput.generateEvents(OrderedPartitionedKVOutput.java:209) > at > org.apache.tez.runtime.library.output.OrderedPartitionedKVOutput.close(OrderedPartitionedKVOutput.java:186) > at > org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.cleanup(LogicalIOProcessorRuntimeTask.java:849) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:104) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:35) > at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > {noformat} > Can this be fixed or not logged? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-2480) Exception when closing output (ignored)
[ https://issues.apache.org/jira/browse/TEZ-2480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15012284#comment-15012284 ] Hitesh Shah commented on TEZ-2480: -- Committing shortly. > Exception when closing output (ignored) > --- > > Key: TEZ-2480 > URL: https://issues.apache.org/jira/browse/TEZ-2480 > Project: Apache Tez > Issue Type: Bug >Affects Versions: TEZ-2003 >Reporter: Sergey Shelukhin >Assignee: Siddharth Seth > Attachments: TEZ-2480.1.txt > > > Happens a lot in some queries: > {noformat} > sershe_20150522112029_d0863b33-8d2f-4b4c-b013-9ef70a2bc586:1_Map 1_8_0)] WARN > org.apache.tez.runtime.LogicalIOProcessorRuntimeTask: Ignoring exception when > closing output Reducer 2(cleanup). Exception > class=java.lang.NullPointerException, message=null > java.lang.NullPointerException > at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:618) > at > org.apache.hadoop.fs.RawLocalFileSystem.pathToFile(RawLocalFileSystem.java:81) > at > org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:613) > at > org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:831) > at > org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:608) > at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1425) > at > org.apache.hadoop.fs.RawLocalFileSystem.open(RawLocalFileSystem.java:198) > at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:768) > at > org.apache.tez.runtime.library.common.sort.impl.TezSpillRecord.(TezSpillRecord.java:64) > at > org.apache.tez.runtime.library.common.sort.impl.TezSpillRecord.(TezSpillRecord.java:56) > at > org.apache.tez.runtime.library.common.sort.impl.TezSpillRecord.(TezSpillRecord.java:51) > at > org.apache.tez.runtime.library.output.OrderedPartitionedKVOutput.generateEvents(OrderedPartitionedKVOutput.java:209) > at > org.apache.tez.runtime.library.output.OrderedPartitionedKVOutput.close(OrderedPartitionedKVOutput.java:186) > at > org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.cleanup(LogicalIOProcessorRuntimeTask.java:849) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:104) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:35) > at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > {noformat} > Can this be fixed or not logged? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-2950) Poor performance of UnorderedPartitionedKVWriter
[ https://issues.apache.org/jira/browse/TEZ-2950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15012429#comment-15012429 ] Gopal V commented on TEZ-2950: -- bq, enable final merge in output = false doesn't necessarily solve this. That has the same issues of partial failures which exists with pipelined shuffle. The fetcher can start serving out chunks of the data and then have the source fail, which will cause the task fetching the data to fail (chunks for the same input from different attempts of the source). The downstream only starts receiving events if the source task completes successfully - this was done to allow for speculative execution. > Poor performance of UnorderedPartitionedKVWriter > > > Key: TEZ-2950 > URL: https://issues.apache.org/jira/browse/TEZ-2950 > Project: Apache Tez > Issue Type: Bug >Reporter: Rohini Palaniswamy > > Came across a job which was taking a long time in > UnorderedPartitionedKVWriter.mergeAll. It was decompressing and reading data > from spill files (8500 spills) and then writing the final compressed merge > file. Why do we need spill files for UnorderedPartitionedKVWriter? Why not > just buffer and keep directly writing to the final file which will save a lot > of time. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-2950) Poor performance of UnorderedPartitionedKVWriter
[ https://issues.apache.org/jira/browse/TEZ-2950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15012371#comment-15012371 ] Gopal V commented on TEZ-2950: -- That is a per-edge configuration. > Poor performance of UnorderedPartitionedKVWriter > > > Key: TEZ-2950 > URL: https://issues.apache.org/jira/browse/TEZ-2950 > Project: Apache Tez > Issue Type: Bug >Reporter: Rohini Palaniswamy > > Came across a job which was taking a long time in > UnorderedPartitionedKVWriter.mergeAll. It was decompressing and reading data > from spill files (8500 spills) and then writing the final compressed merge > file. Why do we need spill files for UnorderedPartitionedKVWriter? Why not > just buffer and keep directly writing to the final file which will save a lot > of time. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-2480) TEZ-2003: exception when closing output (ignored)
[ https://issues.apache.org/jira/browse/TEZ-2480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15012273#comment-15012273 ] Rajesh Balamohan commented on TEZ-2480: --- lgtm. +1 > TEZ-2003: exception when closing output (ignored) > - > > Key: TEZ-2480 > URL: https://issues.apache.org/jira/browse/TEZ-2480 > Project: Apache Tez > Issue Type: Bug >Affects Versions: TEZ-2003 >Reporter: Sergey Shelukhin >Assignee: Siddharth Seth > Attachments: TEZ-2480.1.txt > > > Happens a lot in some queries: > {noformat} > sershe_20150522112029_d0863b33-8d2f-4b4c-b013-9ef70a2bc586:1_Map 1_8_0)] WARN > org.apache.tez.runtime.LogicalIOProcessorRuntimeTask: Ignoring exception when > closing output Reducer 2(cleanup). Exception > class=java.lang.NullPointerException, message=null > java.lang.NullPointerException > at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:618) > at > org.apache.hadoop.fs.RawLocalFileSystem.pathToFile(RawLocalFileSystem.java:81) > at > org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:613) > at > org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:831) > at > org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:608) > at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1425) > at > org.apache.hadoop.fs.RawLocalFileSystem.open(RawLocalFileSystem.java:198) > at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:768) > at > org.apache.tez.runtime.library.common.sort.impl.TezSpillRecord.(TezSpillRecord.java:64) > at > org.apache.tez.runtime.library.common.sort.impl.TezSpillRecord.(TezSpillRecord.java:56) > at > org.apache.tez.runtime.library.common.sort.impl.TezSpillRecord.(TezSpillRecord.java:51) > at > org.apache.tez.runtime.library.output.OrderedPartitionedKVOutput.generateEvents(OrderedPartitionedKVOutput.java:209) > at > org.apache.tez.runtime.library.output.OrderedPartitionedKVOutput.close(OrderedPartitionedKVOutput.java:186) > at > org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.cleanup(LogicalIOProcessorRuntimeTask.java:849) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:104) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:35) > at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > {noformat} > Can this be fixed or not logged? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (TEZ-2951) Progressive allocation of buffers for Unordered Partitioned Output
Siddharth Seth created TEZ-2951: --- Summary: Progressive allocation of buffers for Unordered Partitioned Output Key: TEZ-2951 URL: https://issues.apache.org/jira/browse/TEZ-2951 Project: Apache Tez Issue Type: Improvement Reporter: Siddharth Seth Similar to TEZ-2244. In the case of UnorderedPartitionOutput - the default is to use 2 buffers. This can be changed to a higher value when using pipelined shuffle - and have memory allocated only when required. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-2950) Poor performance of UnorderedPartitionedKVWriter
[ https://issues.apache.org/jira/browse/TEZ-2950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15012279#comment-15012279 ] Rohini Palaniswamy commented on TEZ-2950: - Copying response from [~sseth] on d...@tez.apache.org below. In case of UnorderedKVWriter (non-partitioned), a single file is used - to which new entries are appended. For the partitioned case - using a single file is not as straightforward, since the number of elements and size of each partition is not known up front. Generating a single file per partition can cause an explosion in the number of files, as well as the number of streams open in parallel (OOM). The current partitioned writer writes data into the in-memory buffer and then spills this into files with individual partitions consolidated together. Without pipelined shuffle - a single file needs to be generated for a single task, which is where the merge step comes in - in case the buffer is large. With pipelined shuffle - there's almost no extra cost, since there's no final merge - and each element is written out exactly once. That said, optimizations are possible depending upon the use case. e.g. For a small number of partitions - it's reasonable to write out a file per partition. However, the ShuffleHandle and shuffle code will need to change to handle this. Pipelined Shuffle/avoiding a final merge has some limitations in case of failures and partial chunks being transferred over. It should be possible to work around these by modifying the receiving side to process each input only when all data for that source has been received. Either way, non trivial changes are required to make this more efficient. In this particular case, how many partitions were generated, and what was the size of the unordered output buffer ? Increasing the buffer size for this particular job can help mitigating the problem - maybe not with 8500 spills though. > Poor performance of UnorderedPartitionedKVWriter > > > Key: TEZ-2950 > URL: https://issues.apache.org/jira/browse/TEZ-2950 > Project: Apache Tez > Issue Type: Bug >Reporter: Rohini Palaniswamy > > Came across a job which was taking a long time in > UnorderedPartitionedKVWriter.mergeAll. It was decompressing and reading data > from spill files (8500 spills) and then writing the final compressed merge > file. Why do we need spill files for UnorderedPartitionedKVWriter? Why not > just buffer and keep directly writing to the final file which will save a lot > of time. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-2950) Poor performance of UnorderedPartitionedKVWriter
[ https://issues.apache.org/jira/browse/TEZ-2950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15012280#comment-15012280 ] Rohini Palaniswamy commented on TEZ-2950: - The approach is proving very bad in terms of performance as decompressing and merging large amount of spills is taking a long time for 999 partitions. We have a join followed by UNION. Join vertex uses UnorderedKVWriter for output as no sort is required to input to Union. For merging of 8436 spills, it is taking 30 mins. 2015-11-18 21:01:25,904 [INFO] [main] |resources.MemoryDistributor|: InitialMemoryDistributor (isEnabled=true) invoked with: numInputs=2, numOutputs=1, JVM.maxFree=3102212096, allocatorClassName=org.apache.tez.runtime.library.resources.WeightedScalingMemoryDistributor 2015-11-18 21:01:26,295 [INFO] [TezChild] |resources.MemoryDistributor|: InitialRequests=[scope-6987:OUTPUT:104857600:org.apache.tez.runtime.library.output.UnorderedPartitionedKVOutput], [scope-6974:INPUT:1861327360:org.apache.tez.runtime.library.input.OrderedGroupedKVInput], [scope-6957:INPUT:1861327360:org.apache.tez.runtime.library.input.OrderedGroupedKVInput] 2015-11-18 21:01:26,303 [INFO] [TezChild] |resources.WeightedScalingMemoryDistributor|: ScaleRatiosUsed=[PARTITIONED_UNSORTED_OUTPUT:1][UNSORTED_OUTPUT:1][UNSORTED_INPUT:1][SORTED_OUTPUT:12][SORTED_MERGED_INPUT:12][PROCESSOR:1][OTHER:1] 2015-11-18 21:01:26,307 [INFO] [TezChild] |resources.WeightedScalingMemoryDistributor|: InitialReservationFraction=0.5, AdditionalReservationFractionForIOs=0.045, finalReserveFractionUsed=0.545 2015-11-18 21:01:26,308 [INFO] [TezChild] |resources.WeightedScalingMemoryDistributor|: Scaling Requests. NumRequests: 3, numScaledRequests: 25, TotalRequested: 3827512320, TotalRequestedScaled: 1.7910685696E9, TotalJVMHeap: 3102212096, TotalAvailable: 1411506503, TotalRequested/TotalJVMHeap:1.23 2015-11-18 21:01:26,308 [INFO] [TezChild] |resources.MemoryDistributor|: Allocations=[scope-6987:org.apache.tez.runtime.library.output.UnorderedPartitionedKVOutput:OUTPUT:104857600:3305449], [scope-6974:org.apache.tez.runtime.library.input.OrderedGroupedKVInput:INPUT:1861327360:704100526], [scope-6957:org.apache.tez.runtime.library.input.OrderedGroupedKVInput:INPUT:1861327360:704100526] 2015-11-18 21:02:49,010 [INFO] [TezChild] |writers.UnorderedPartitionedKVWriter|: scope_6987: numBuffers=2, sizePerBuffer=1652724, skipBuffers=false, pipelinedShuffle=false, numPartitions=999 .. 2015-11-18 21:21:03,353 [INFO] [UnorderedOutSpiller {scope_6987}] |writers.UnorderedPartitionedKVWriter|: scope_6987: Finished spill 8436 2015-11-18 21:21:04,236 [INFO] [TezChild] |task.TezTaskRunner|: Closing task, taskAttemptId=attempt_1444575566264_610936_1_28_000475_0 .. 2015-11-18 21:21:04,238 [INFO] [TezChild] |writers.UnorderedPartitionedKVWriter|: scope_6987: Waiting for all spills to complete : Pending : 0 2015-11-18 21:21:04,238 [INFO] [TezChild] |writers.UnorderedPartitionedKVWriter|: scope_6987: All spills complete 2015-11-18 21:54:44,047 [INFO] [TezChild] |writers.UnorderedPartitionedKVWriter|: scope_6987: Finished final spill after merging : 8438 spills > Poor performance of UnorderedPartitionedKVWriter > > > Key: TEZ-2950 > URL: https://issues.apache.org/jira/browse/TEZ-2950 > Project: Apache Tez > Issue Type: Bug >Reporter: Rohini Palaniswamy > > Came across a job which was taking a long time in > UnorderedPartitionedKVWriter.mergeAll. It was decompressing and reading data > from spill files (8500 spills) and then writing the final compressed merge > file. Why do we need spill files for UnorderedPartitionedKVWriter? Why not > just buffer and keep directly writing to the final file which will save a lot > of time. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-2950) Poor performance of UnorderedPartitionedKVWriter
[ https://issues.apache.org/jira/browse/TEZ-2950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15012338#comment-15012338 ] Rohini Palaniswamy commented on TEZ-2950: - bq. set tez.runtime.enable.final-merge.in.output = false; Doubt can do that as it will affect other parts of the DAG which have OrderedPartitioned . > Poor performance of UnorderedPartitionedKVWriter > > > Key: TEZ-2950 > URL: https://issues.apache.org/jira/browse/TEZ-2950 > Project: Apache Tez > Issue Type: Bug >Reporter: Rohini Palaniswamy > > Came across a job which was taking a long time in > UnorderedPartitionedKVWriter.mergeAll. It was decompressing and reading data > from spill files (8500 spills) and then writing the final compressed merge > file. Why do we need spill files for UnorderedPartitionedKVWriter? Why not > just buffer and keep directly writing to the final file which will save a lot > of time. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (TEZ-2950) Poor performance of UnorderedPartitionedKVWriter
[ https://issues.apache.org/jira/browse/TEZ-2950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15012429#comment-15012429 ] Gopal V edited comment on TEZ-2950 at 11/19/15 12:09 AM: - bq. enable final merge in output = false doesn't necessarily solve this. That has the same issues of partial failures which exists with pipelined shuffle. The fetcher can start serving out chunks of the data and then have the source fail, which will cause the task fetching the data to fail (chunks for the same input from different attempts of the source). The downstream only starts receiving events if the source task completes successfully - this was done to allow for speculative execution. was (Author: gopalv): bq, enable final merge in output = false doesn't necessarily solve this. That has the same issues of partial failures which exists with pipelined shuffle. The fetcher can start serving out chunks of the data and then have the source fail, which will cause the task fetching the data to fail (chunks for the same input from different attempts of the source). The downstream only starts receiving events if the source task completes successfully - this was done to allow for speculative execution. > Poor performance of UnorderedPartitionedKVWriter > > > Key: TEZ-2950 > URL: https://issues.apache.org/jira/browse/TEZ-2950 > Project: Apache Tez > Issue Type: Bug >Reporter: Rohini Palaniswamy > > Came across a job which was taking a long time in > UnorderedPartitionedKVWriter.mergeAll. It was decompressing and reading data > from spill files (8500 spills) and then writing the final compressed merge > file. Why do we need spill files for UnorderedPartitionedKVWriter? Why not > just buffer and keep directly writing to the final file which will save a lot > of time. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TEZ-2480) Exception when closing output is ignored
[ https://issues.apache.org/jira/browse/TEZ-2480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Shah updated TEZ-2480: - Summary: Exception when closing output is ignored (was: Exception when closing output (ignored)) > Exception when closing output is ignored > > > Key: TEZ-2480 > URL: https://issues.apache.org/jira/browse/TEZ-2480 > Project: Apache Tez > Issue Type: Bug >Affects Versions: TEZ-2003 >Reporter: Sergey Shelukhin >Assignee: Siddharth Seth > Attachments: TEZ-2480.1.txt > > > Happens a lot in some queries: > {noformat} > sershe_20150522112029_d0863b33-8d2f-4b4c-b013-9ef70a2bc586:1_Map 1_8_0)] WARN > org.apache.tez.runtime.LogicalIOProcessorRuntimeTask: Ignoring exception when > closing output Reducer 2(cleanup). Exception > class=java.lang.NullPointerException, message=null > java.lang.NullPointerException > at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:618) > at > org.apache.hadoop.fs.RawLocalFileSystem.pathToFile(RawLocalFileSystem.java:81) > at > org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:613) > at > org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:831) > at > org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:608) > at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1425) > at > org.apache.hadoop.fs.RawLocalFileSystem.open(RawLocalFileSystem.java:198) > at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:768) > at > org.apache.tez.runtime.library.common.sort.impl.TezSpillRecord.(TezSpillRecord.java:64) > at > org.apache.tez.runtime.library.common.sort.impl.TezSpillRecord.(TezSpillRecord.java:56) > at > org.apache.tez.runtime.library.common.sort.impl.TezSpillRecord.(TezSpillRecord.java:51) > at > org.apache.tez.runtime.library.output.OrderedPartitionedKVOutput.generateEvents(OrderedPartitionedKVOutput.java:209) > at > org.apache.tez.runtime.library.output.OrderedPartitionedKVOutput.close(OrderedPartitionedKVOutput.java:186) > at > org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.cleanup(LogicalIOProcessorRuntimeTask.java:849) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:104) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:35) > at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > {noformat} > Can this be fixed or not logged? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-2950) Poor performance of UnorderedPartitionedKVWriter
[ https://issues.apache.org/jira/browse/TEZ-2950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15012424#comment-15012424 ] Siddharth Seth commented on TEZ-2950: - [~gopalv] - enable final merge in output = false doesn't necessarily solve this. That has the same issues of partial failures which exists with pipelined shuffle. The fetcher can start serving out chunks of the data and then have the source fail, which will cause the task fetching the data to fail (chunks for the same input from different attempts of the source). [~rohini] - the total size of the unordered buffer is getting scaled down to ~3MB from the initially requested 100MB. The job has an initial configuration of io.sort.mb =~ 1800MB, unordered.buffer.size=100MB. With two OrderedOutputs - the unordered output gets scaled down. For the particular job 1) Increase the size of the unordered buffer (1800 / 100 seems skewed anyway) 2) Change the scaling ratios. Currently: PARTITIONED_UNSORTED_OUTPUT:1, SORTED_OUTPUT:12, PARTITIONED_UNSORTED can be increased to prevent it from being scaled down a lot. > Poor performance of UnorderedPartitionedKVWriter > > > Key: TEZ-2950 > URL: https://issues.apache.org/jira/browse/TEZ-2950 > Project: Apache Tez > Issue Type: Bug >Reporter: Rohini Palaniswamy > > Came across a job which was taking a long time in > UnorderedPartitionedKVWriter.mergeAll. It was decompressing and reading data > from spill files (8500 spills) and then writing the final compressed merge > file. Why do we need spill files for UnorderedPartitionedKVWriter? Why not > just buffer and keep directly writing to the final file which will save a lot > of time. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (TEZ-2950) Poor performance of UnorderedPartitionedKVWriter
[ https://issues.apache.org/jira/browse/TEZ-2950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Eagles reassigned TEZ-2950: Assignee: Jonathan Eagles > Poor performance of UnorderedPartitionedKVWriter > > > Key: TEZ-2950 > URL: https://issues.apache.org/jira/browse/TEZ-2950 > Project: Apache Tez > Issue Type: Bug >Reporter: Rohini Palaniswamy >Assignee: Jonathan Eagles > > Came across a job which was taking a long time in > UnorderedPartitionedKVWriter.mergeAll. It was decompressing and reading data > from spill files (8500 spills) and then writing the final compressed merge > file. Why do we need spill files for UnorderedPartitionedKVWriter? Why not > just buffer and keep directly writing to the final file which will save a lot > of time. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TEZ-2950) Poor performance of UnorderedPartitionedKVWriter
[ https://issues.apache.org/jira/browse/TEZ-2950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Eagles updated TEZ-2950: - Assignee: (was: Jonathan Eagles) > Poor performance of UnorderedPartitionedKVWriter > > > Key: TEZ-2950 > URL: https://issues.apache.org/jira/browse/TEZ-2950 > Project: Apache Tez > Issue Type: Bug >Reporter: Rohini Palaniswamy > > Came across a job which was taking a long time in > UnorderedPartitionedKVWriter.mergeAll. It was decompressing and reading data > from spill files (8500 spills) and then writing the final compressed merge > file. Why do we need spill files for UnorderedPartitionedKVWriter? Why not > just buffer and keep directly writing to the final file which will save a lot > of time. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-2948) Stop using dagName in the dagComplete notification to TaskCommunicators
[ https://issues.apache.org/jira/browse/TEZ-2948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15011977#comment-15011977 ] Siddharth Seth commented on TEZ-2948: - The test failure is unrelated. Committing. Thanks for the review. > Stop using dagName in the dagComplete notification to TaskCommunicators > --- > > Key: TEZ-2948 > URL: https://issues.apache.org/jira/browse/TEZ-2948 > Project: Apache Tez > Issue Type: Task >Affects Versions: 0.8.0-alpha >Reporter: Siddharth Seth >Assignee: Siddharth Seth > Attachments: TEZ-2948.1.txt > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)