[jira] [Updated] (TEZ-2401) Tez UI: All-dag page has duration keep counting for KILLED dag.
[ https://issues.apache.org/jira/browse/TEZ-2401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prakash Ramachandran updated TEZ-2401: -- Summary: Tez UI: All-dag page has duration keep counting for KILLED dag. (was: All-dag page has duration keep counting for KILLED dag.) Tez UI: All-dag page has duration keep counting for KILLED dag. --- Key: TEZ-2401 URL: https://issues.apache.org/jira/browse/TEZ-2401 Project: Apache Tez Issue Type: Bug Components: UI Affects Versions: 0.7.0 Reporter: Tassapol Athiapinya Assignee: Prakash Ramachandran Priority: Critical Attachments: TEZ-2401.1.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TEZ-2401) All-dag page has duration keep counting for KILLED dag.
[ https://issues.apache.org/jira/browse/TEZ-2401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prakash Ramachandran updated TEZ-2401: -- Attachment: TEZ-2401.1.patch trivial patch [~Sreenath] please review. All-dag page has duration keep counting for KILLED dag. --- Key: TEZ-2401 URL: https://issues.apache.org/jira/browse/TEZ-2401 Project: Apache Tez Issue Type: Bug Components: UI Affects Versions: 0.7.0 Reporter: Tassapol Athiapinya Assignee: Prakash Ramachandran Priority: Critical Attachments: TEZ-2401.1.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (TEZ-2406) TEZ-UI: Display per-io counter columns in task and attempt pages under vertex
Sreenath Somarajapuram created TEZ-2406: --- Summary: TEZ-UI: Display per-io counter columns in task and attempt pages under vertex Key: TEZ-2406 URL: https://issues.apache.org/jira/browse/TEZ-2406 Project: Apache Tez Issue Type: Bug Reporter: Sreenath Somarajapuram - We will auto-populate all the counter names including io counter names to the tasks (under a vertex) and task attempts (under task, vertex). - To enable navigation the counter names will be searchable in the dropdown for the counter selection. - Per-io counter names will not be stored in the personalization settings given they are dag / vertex specific. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TEZ-2404) Handle DataMovementEvent before its TaskAttemptCompletedEvent
[ https://issues.apache.org/jira/browse/TEZ-2404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeff Zhang updated TEZ-2404: Attachment: TEZ-2404-1.patch Handle DataMovementEvent before its TaskAttemptCompletedEvent - Key: TEZ-2404 URL: https://issues.apache.org/jira/browse/TEZ-2404 Project: Apache Tez Issue Type: Bug Reporter: Jeff Zhang Assignee: Jeff Zhang Attachments: TEZ-2404-1.patch TEZ-2325 route TASK_ATTEMPT_COMPLETED_EVENT directly to the attempt, but it would cause recovery issue. Recovery need that DataMovement event is handled before TaskAttemptCompletedEvent, otherwise DataMovement event may be lost in recovering and cause the its dependent tasks hang. 2 Ways to fix this issue. 1. Still route TaskAtttemptCompletedEvent in Vertex 2. route DataMovementEvent before TaskAttemptCompeltedEvent in TezTaskAttemptListener -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-2405) PipelinedSorter can throw NPE with custom compartor
[ https://issues.apache.org/jira/browse/TEZ-2405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14526314#comment-14526314 ] Gopal V commented on TEZ-2405: -- [~rajesh.balamohan]: the patch looks good - +1 But the code confusion remains - we have to investigate dropping the old MR InputBuffer impl which we can't fix anymore. {code} public class InputBuffer extends FilterInputStream { ... public void reset(byte[] input, int start, int length) { this.buf = input; this.count = start+length; this.pos = start; ... } public int getPosition() { return pos; } public int getLength() { return count; } {code} This makes it obvious that InputBuffer.getLength() is not similar to any other getLength calls, but instead is a capacity parameter of unknown clarity (i.e the other areas of the byte[] array might be owned by other buffers). Post 0.7.x, we can rewrite this codepath to avoid this particular anti-pattern, by dropping references to the old DataInputBuffer impl. PipelinedSorter can throw NPE with custom compartor --- Key: TEZ-2405 URL: https://issues.apache.org/jira/browse/TEZ-2405 Project: Apache Tez Issue Type: Bug Reporter: Rajesh Balamohan Assignee: Rajesh Balamohan Priority: Critical Attachments: TEZ-2405.1.patch If custom comparators are used, PipelinedSorter can throw NPE depending on custom comparator implementations. {noformat} ], TaskAttempt 1 failed, info=[Error: Failure while running task:java.lang.NullPointerException at org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter$SpanIterator.compareTo(PipelinedSorter.java:837) at org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter$SpanIterator.compareTo(PipelinedSorter.java:767) at java.util.PriorityQueue.siftUpComparable(PriorityQueue.java:637) at java.util.PriorityQueue.siftUp(PriorityQueue.java:629) at java.util.PriorityQueue.offer(PriorityQueue.java:329) at java.util.PriorityQueue.add(PriorityQueue.java:306) at org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter$SpanMerger.add(PipelinedSorter.java:996) at org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter$SpanMerger.next(PipelinedSorter.java:1065) at org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter$PartitionFilter.next(PipelinedSorter.java:936) at org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter.spill(PipelinedSorter.java:366) at org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter.flush(PipelinedSorter.java:406) at org.apache.tez.runtime.library.output.OrderedPartitionedKVOutput.close(OrderedPartitionedKVOutput.java:183) at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.close(LogicalIOProcessorRuntimeTask.java:355) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:181) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:171) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:171) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:167) at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Failed: TEZ-2404 PreCommit Build #609
Jira: https://issues.apache.org/jira/browse/TEZ-2404 Build: https://builds.apache.org/job/PreCommit-TEZ-Build/609/ ### ## LAST 60 LINES OF THE CONSOLE ### [...truncated 88 lines...] == == Determining number of patched javac warnings. == == /home/jenkins/tools/maven/latest/bin/mvn clean test -DskipTests -Ptest-patch /home/jenkins/jenkins-slave/workspace/PreCommit-TEZ-Build/../patchprocess/patchJavacWarnings.txt 21 {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12730126/TEZ-2404-1.patch against master revision f6ea0fb. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:red}-1 javac{color:red}. The patch appears to cause the build to fail. Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/609//console This message is automatically generated. == == Adding comment to Jira. == == Comment added. fb5338efc023373eccdb268ccffb4b5e279534c9 logged out == == Finished build. == == Build step 'Execute shell' marked build as failure Archiving artifacts Sending artifact delta relative to PreCommit-TEZ-Build #608 Archived 3 artifacts Archive block size is 32768 Received 0 blocks and 760141 bytes Compression is 0.0% Took 0.67 sec [description-setter] Could not determine description. Recording test results Email was triggered for: Failure Sending email for trigger: Failure ### ## FAILED TESTS (if any) ## No tests ran.
[jira] [Created] (TEZ-2407) Drop references to the old DataInputBuffer impl in PipelinedSorter
Rajesh Balamohan created TEZ-2407: - Summary: Drop references to the old DataInputBuffer impl in PipelinedSorter Key: TEZ-2407 URL: https://issues.apache.org/jira/browse/TEZ-2407 Project: Apache Tez Issue Type: Bug Reporter: Rajesh Balamohan -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TEZ-2406) TEZ-UI: Display per-io counter columns in task and attempt pages under vertex
[ https://issues.apache.org/jira/browse/TEZ-2406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sreenath Somarajapuram updated TEZ-2406: Attachment: TEZ-2406.1.patch [~pramachandran] Please help to get the patch in. TEZ-UI: Display per-io counter columns in task and attempt pages under vertex - Key: TEZ-2406 URL: https://issues.apache.org/jira/browse/TEZ-2406 Project: Apache Tez Issue Type: Bug Reporter: Sreenath Somarajapuram Assignee: Sreenath Somarajapuram Attachments: TEZ-2406.1.patch - We will auto-populate all the counter names including io counter names to the tasks (under a vertex) and task attempts (under task, vertex). - To enable navigation the counter names will be searchable in the dropdown for the counter selection. - Per-io counter names will not be stored in the personalization settings given they are dag / vertex specific. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TEZ-2406) TEZ-UI: Display per-io counter columns in task and attempt pages under vertex
[ https://issues.apache.org/jira/browse/TEZ-2406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sreenath Somarajapuram updated TEZ-2406: Description: - We will auto-populate all the counter names including io counter names to the tasks (under a vertex) and task attempts (under task, vertex). - To enable navigation the column names will be searchable in the pop-up for column selection. - Per-io counter names will not be stored in the personalization settings given they are dag / vertex specific. was: - We will auto-populate all the counter names including io counter names to the tasks (under a vertex) and task attempts (under task, vertex). - To enable navigation the counter names will be searchable in the dropdown for the counter selection. - Per-io counter names will not be stored in the personalization settings given they are dag / vertex specific. TEZ-UI: Display per-io counter columns in task and attempt pages under vertex - Key: TEZ-2406 URL: https://issues.apache.org/jira/browse/TEZ-2406 Project: Apache Tez Issue Type: Bug Reporter: Sreenath Somarajapuram Assignee: Sreenath Somarajapuram Attachments: TEZ-2406.1.patch - We will auto-populate all the counter names including io counter names to the tasks (under a vertex) and task attempts (under task, vertex). - To enable navigation the column names will be searchable in the pop-up for column selection. - Per-io counter names will not be stored in the personalization settings given they are dag / vertex specific. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TEZ-2406) TEZ-UI: Display per-io counter columns in task and attempt pages under vertex
[ https://issues.apache.org/jira/browse/TEZ-2406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sreenath Somarajapuram updated TEZ-2406: Affects Version/s: 0.7.0 TEZ-UI: Display per-io counter columns in task and attempt pages under vertex - Key: TEZ-2406 URL: https://issues.apache.org/jira/browse/TEZ-2406 Project: Apache Tez Issue Type: Bug Affects Versions: 0.7.0 Reporter: Sreenath Somarajapuram Assignee: Sreenath Somarajapuram Attachments: TEZ-2406.1.patch - We will auto-populate all the counter names including io counter names to the tasks (under a vertex) and task attempts (under task, vertex). - To enable navigation the column names will be searchable in the pop-up for column selection. - Per-io counter names will not be stored in the personalization settings given they are dag / vertex specific. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-2401) Tez UI: All-dag page has duration keep counting for KILLED dag.
[ https://issues.apache.org/jira/browse/TEZ-2401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14526336#comment-14526336 ] Sreenath Somarajapuram commented on TEZ-2401: - Do we have a purpose for // unixtimestamp is in seconds. javascript expects milliseconds. if (endTime startTime || !!endTime) { end = new Date().getTime(); } Tez UI: All-dag page has duration keep counting for KILLED dag. --- Key: TEZ-2401 URL: https://issues.apache.org/jira/browse/TEZ-2401 Project: Apache Tez Issue Type: Bug Components: UI Affects Versions: 0.7.0 Reporter: Tassapol Athiapinya Assignee: Prakash Ramachandran Priority: Critical Attachments: TEZ-2401.1.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-2401) Tez UI: All-dag page has duration keep counting for KILLED dag.
[ https://issues.apache.org/jira/browse/TEZ-2401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14526342#comment-14526342 ] Prakash Ramachandran commented on TEZ-2401: --- it has been changed to {code} if (endTime startTime) { {code} in the patch was more for getting current running time where applicable. ex formatDuration(startTime, -1) will give time till now. Tez UI: All-dag page has duration keep counting for KILLED dag. --- Key: TEZ-2401 URL: https://issues.apache.org/jira/browse/TEZ-2401 Project: Apache Tez Issue Type: Bug Components: UI Affects Versions: 0.7.0 Reporter: Tassapol Athiapinya Assignee: Prakash Ramachandran Priority: Critical Attachments: TEZ-2401.1.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-2406) TEZ-UI: Display per-io counter columns in task and attempt pages under vertex
[ https://issues.apache.org/jira/browse/TEZ-2406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14526605#comment-14526605 ] Prakash Ramachandran commented on TEZ-2406: --- patch generally looks fine. * checkbox for select all should have a label, and also its positioned properly on chrome. * the message columnSelectorMessage and the function to extract the names of per io counters can be shared across the views. * also will it be possible to highlight (color?) the per-io counters in the selection box so that user is aware whch ones are they? TEZ-UI: Display per-io counter columns in task and attempt pages under vertex - Key: TEZ-2406 URL: https://issues.apache.org/jira/browse/TEZ-2406 Project: Apache Tez Issue Type: Bug Affects Versions: 0.7.0 Reporter: Sreenath Somarajapuram Assignee: Sreenath Somarajapuram Attachments: TEZ-2406.1.patch - We will auto-populate all the counter names including io counter names to the tasks (under a vertex) and task attempts (under task, vertex). - To enable navigation the column names will be searchable in the pop-up for column selection. - Per-io counter names will not be stored in the personalization settings given they are dag / vertex specific. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TEZ-2401) Tez UI: All-dag page has duration keep counting for KILLED dag.
[ https://issues.apache.org/jira/browse/TEZ-2401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prakash Ramachandran updated TEZ-2401: -- Attachment: TEZ-2401.2.patch thanks [~Sreenath] addressed review comments Tez UI: All-dag page has duration keep counting for KILLED dag. --- Key: TEZ-2401 URL: https://issues.apache.org/jira/browse/TEZ-2401 Project: Apache Tez Issue Type: Bug Components: UI Affects Versions: 0.7.0 Reporter: Tassapol Athiapinya Assignee: Prakash Ramachandran Priority: Critical Attachments: TEZ-2401.1.patch, TEZ-2401.2.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TEZ-2076) Tez framework to extract/analyze data stored in ATS for specific dag
[ https://issues.apache.org/jira/browse/TEZ-2076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rajesh Balamohan updated TEZ-2076: -- Attachment: TEZ-2076.10.patch ATSImportTool - Fixed docs - Fixed logging in case of exception - Fixed x.y.z for version info - Made the packaging as a fat jar. (--atsAddress=http://atsServer:port can be provided in command line as optional parameter if needed. Otherwise, it would be picked up from $HADOOP_CONF_DIR location) - Usage: {noformat} usage: java -cp $HADOOP_CONF_DIR/:./target/tez-history-parser-x.y.z-SNAPSHOT-jar-with-dependencies.jar org.apache.tez.history.ATSImportTool --atsAddress atsAddress Optional. ATS address (e.g http://clusterATSNode:8188) --dagId dagId DagId that needs to be downloaded --downloadDir downloadDir download directory where data needs to be downloaded --helpprint help {noformat} What happens when some of the data is downloaded but some fails to? - This would require parsing of downloaded data (e.g, ATS goes down in the middle of download). Currently this is not checked would throw exception. However, we would get partial data (i.e as and when a batch is downloaded, it gets written to zip file). Not sure if we need a this feature to validate. I believe exception should be good for v1. What happens if the tool is run when a dag is still in progress? Will it give invalid data back? Should that case be handled by throwing an error or just having the user warned as needed? - Currently, if data is available (even partial in the case of running jobs) it would be downloaded. Is the suggestion not to download if job is in progress (e.g RUNNING, INITING, SUBMITTED)? Maybe BaseInfo and then use abstract class? - Fixed. Renamed AbstractInfo to BaseInfo Should all info objects representing the data be moved to a package say parser.datamodel ? - Moved all info objects ot parser.datamodel. Also created BaseParser which can link task, vertex, dag etc for reverse lookups. How is versioning being handled in the serialized zip structure? Also, why json as compared to say a protobuf structure? - No explicit version is maintained in zip structure. Adding tez-version be helpful? - Moving back and forth from DAG--TaskAttempt and TaskAttempt--DAG can be complex in protobuf. Hence the objects are maintained as POJO in-memory structure after parsing JSON. What if there are 100,000 attempts? or more? Does this require a large memory footprint? - No, zip file can have numerous number of small part files. Each of them can contain some amount of task, attempt, vertex, dag information. As and when the part file is parsed, the JSON object pertaining to that part file is released. So there wouldn't be much pressure during parsing. However, the DAG in-memory representation (POJO) can differ based on the size of of the jobs. I will post the memory details soon. Should serialized data be loaded on an demand basis? Or does the analyser always take an initial hit to load all data into memory? - It might be memory effecient, but would make it hard for analysis. For analysis, we would like to move back and forth from DAG--TaskAttempt and vice-versa. This would call for all objects to be present in memory. It seems like we have 2 data models. The runtime model and the analyser data model. It is going to be hard to keep them in sync. Any suggestions on how we can re-use a common model? - No; ATS data is parsed and represented as in-memory POJOs via parser. Analyzer would work on the in-memory (read only) structures. Irrespective of any other changes in ATS, in-memory representations of DAG,Vertex, Task, TaskAttempts should not change. getAbsoluteSubmitTime() - is there a non-absolute timestamp elsewhere? Maybe simplify function names? - Yes, getSubmitTime() would return the timing w.r.t to DAG start time. This would be useful when drawing swimlane diagrams for instance. Renamed to getAbsStartTime() for now (any suggestions?) Could you clarify why most classes are marked public? - All info objects would be public (evolving) as the analyzer code would rely on these in-memory objects. void setTaskInfo(TaskInfo taskInfo) - As mentioned earlier, zip file can have arbitary number of part files. Each part file is parsed and an in-memory POJO is created. Before returning the final DAG (in-memory structure), we need to link task to attempts, vertex to DAG etc. These links happen via these methods which are not publicly exposed. it would be good to try the tool with invalid data, corrupt zip files, etc to ensure that there is useful error messages. - In case of corrupt file, it would throw exception. E.g {noformat} Exception in thread main org.apache.tez.dag.api.TezException: java.util.zip.ZipException: error in opening zip
Failed: TEZ-2401 PreCommit Build #613
Jira: https://issues.apache.org/jira/browse/TEZ-2401 Build: https://builds.apache.org/job/PreCommit-TEZ-Build/613/ ### ## LAST 60 LINES OF THE CONSOLE ### [...truncated 2779 lines...] {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12730188/TEZ-2401.2.patch against master revision c411e4e. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/613//testReport/ Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/613//console This message is automatically generated. == == Adding comment to Jira. == == Comment added. 67a235d89a770d60cab98d55b71a4022a84c1d8c logged out == == Finished build. == == Build step 'Execute shell' marked build as failure Archiving artifacts Sending artifact delta relative to PreCommit-TEZ-Build #612 Archived 44 artifacts Archive block size is 32768 Received 6 blocks and 2566260 bytes Compression is 7.1% Took 1.4 sec [description-setter] Could not determine description. Recording test results Email was triggered for: Failure Sending email for trigger: Failure ### ## FAILED TESTS (if any) ## All tests passed
[jira] [Commented] (TEZ-2076) Tez framework to extract/analyze data stored in ATS for specific dag
[ https://issues.apache.org/jira/browse/TEZ-2076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14526833#comment-14526833 ] TezQA commented on TEZ-2076: {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12730195/TEZ-2076.10.patch against master revision c411e4e. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/614//testReport/ Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/614//console This message is automatically generated. Tez framework to extract/analyze data stored in ATS for specific dag Key: TEZ-2076 URL: https://issues.apache.org/jira/browse/TEZ-2076 Project: Apache Tez Issue Type: Improvement Reporter: Rajesh Balamohan Assignee: Rajesh Balamohan Attachments: TEZ-2076.1.patch, TEZ-2076.10.patch, TEZ-2076.2.patch, TEZ-2076.3.patch, TEZ-2076.4.patch, TEZ-2076.5.patch, TEZ-2076.6.patch, TEZ-2076.7.patch, TEZ-2076.8.patch, TEZ-2076.9.patch, TEZ-2076.WIP.2.patch, TEZ-2076.WIP.3.patch, TEZ-2076.WIP.patch - Users should be able to download ATS data pertaining to a DAG from Tez-UI (more like a zip file containing DAG/Vertex/Task/TaskAttempt info). - This can be plugged to an analyzer which parses the data, adds semantics and provides an in-memory representation for further analysis. - This will enable to write different analyzer rules, which can be run on top of this in-memory representation to come up with analysis on the DAG. - Results of this analyzer rules can be rendered on to UI (standalone webapp) later point in time. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-2404) Handle DataMovementEvent before its TaskAttemptCompletedEvent
[ https://issues.apache.org/jira/browse/TEZ-2404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14526916#comment-14526916 ] Bikas Saha commented on TEZ-2404: - TEZ-1897 is not enabled yet. So we dont have to fix this immediately. We can use the time to explore other solutions that dont involve routing the same event twice. E.g. when the task completes then it sends an event to its vertex so that the vertex can increment its completed task count. Can that be used to mark the successful attempt as done in the history logs by the vertex? Logically, from what I see, the vertex is using the task attempt completed event as a marker for the successful attempts history event completion, right? This approach may mean that an unsuccessful attempt will not have a completion marker. Will that be a problem? Maybe not, since we dont care about those attempts anyways. For work preserving AM restart we can discard these events if the running task has not reconnected with the AM. In the non-work-preserving AM restart case we can always discard these events. Handle DataMovementEvent before its TaskAttemptCompletedEvent - Key: TEZ-2404 URL: https://issues.apache.org/jira/browse/TEZ-2404 Project: Apache Tez Issue Type: Bug Reporter: Jeff Zhang Assignee: Jeff Zhang Attachments: TEZ-2404-1.patch, TEZ-2404-2.patch TEZ-2325 route TASK_ATTEMPT_COMPLETED_EVENT directly to the attempt, but it would cause recovery issue. Recovery need that DataMovement event is handled before TaskAttemptCompletedEvent, otherwise DataMovement event may be lost in recovering and cause the its dependent tasks hang. 2 Ways to fix this issue. 1. Still route TaskAtttemptCompletedEvent in Vertex 2. route DataMovementEvent before TaskAttemptCompeltedEvent in TezTaskAttemptListener -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TEZ-2379) org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: T_ATTEMPT_KILLED at KILLED
[ https://issues.apache.org/jira/browse/TEZ-2379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Shah updated TEZ-2379: - Attachment: TEZ-2379.2.patch Attached patch with handling for killed attempt on failed/killed task states. This seems safer as killed and failed are already terminal states. Handling killed at succeeded is already handled properly. [~bikassaha] please review org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: T_ATTEMPT_KILLED at KILLED -- Key: TEZ-2379 URL: https://issues.apache.org/jira/browse/TEZ-2379 Project: Apache Tez Issue Type: Bug Reporter: Rajesh Balamohan Assignee: Hitesh Shah Priority: Blocker Attachments: TEZ-2379.1.patch, TEZ-2379.2.patch {noformat} 2015-04-28 04:49:32,455 ERROR [Dispatcher thread: Central] impl.TaskImpl: Can't handle this event at current state for task_1429683757595_0479_1_03_13 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: T_ATTEMPT_KILLED at KILLED at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.tez.state.StateMachineTez.doTransition(StateMachineTez.java:57) at org.apache.tez.dag.app.dag.impl.TaskImpl.handle(TaskImpl.java:853) at org.apache.tez.dag.app.dag.impl.TaskImpl.handle(TaskImpl.java:106) at org.apache.tez.dag.app.DAGAppMaster$TaskEventDispatcher.handle(DAGAppMaster.java:1874) at org.apache.tez.dag.app.DAGAppMaster$TaskEventDispatcher.handle(DAGAppMaster.java:1860) at org.apache.tez.common.AsyncDispatcher.dispatch(AsyncDispatcher.java:182) at org.apache.tez.common.AsyncDispatcher$1.run(AsyncDispatcher.java:113) at java.lang.Thread.run(Thread.java:745) {noformat} Additional notes: Hive - latest build Tez - master tpch-200 gb scale q_17 (kill the job in the middle of execution) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-2407) Drop references to the old DataInputBuffer impl in PipelinedSorter
[ https://issues.apache.org/jira/browse/TEZ-2407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14526921#comment-14526921 ] Hitesh Shah commented on TEZ-2407: -- Any reason why should not be targeted to 0.7.0 or a 0.7.x release? Drop references to the old DataInputBuffer impl in PipelinedSorter -- Key: TEZ-2407 URL: https://issues.apache.org/jira/browse/TEZ-2407 Project: Apache Tez Issue Type: Bug Reporter: Rajesh Balamohan -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-2405) PipelinedSorter can throw NPE with custom compartor
[ https://issues.apache.org/jira/browse/TEZ-2405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14526922#comment-14526922 ] Hitesh Shah commented on TEZ-2405: -- Does this affect anyone using pipelinedsorter in 0.5 or 0.6? PipelinedSorter can throw NPE with custom compartor --- Key: TEZ-2405 URL: https://issues.apache.org/jira/browse/TEZ-2405 Project: Apache Tez Issue Type: Bug Reporter: Rajesh Balamohan Assignee: Rajesh Balamohan Priority: Critical Fix For: 0.7.0 Attachments: TEZ-2405.1.patch If custom comparators are used, PipelinedSorter can throw NPE depending on custom comparator implementations. {noformat} ], TaskAttempt 1 failed, info=[Error: Failure while running task:java.lang.NullPointerException at org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter$SpanIterator.compareTo(PipelinedSorter.java:837) at org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter$SpanIterator.compareTo(PipelinedSorter.java:767) at java.util.PriorityQueue.siftUpComparable(PriorityQueue.java:637) at java.util.PriorityQueue.siftUp(PriorityQueue.java:629) at java.util.PriorityQueue.offer(PriorityQueue.java:329) at java.util.PriorityQueue.add(PriorityQueue.java:306) at org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter$SpanMerger.add(PipelinedSorter.java:996) at org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter$SpanMerger.next(PipelinedSorter.java:1065) at org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter$PartitionFilter.next(PipelinedSorter.java:936) at org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter.spill(PipelinedSorter.java:366) at org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter.flush(PipelinedSorter.java:406) at org.apache.tez.runtime.library.output.OrderedPartitionedKVOutput.close(OrderedPartitionedKVOutput.java:183) at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.close(LogicalIOProcessorRuntimeTask.java:355) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:181) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:171) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:171) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:167) at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TEZ-2325) Route status update event directly to the attempt
[ https://issues.apache.org/jira/browse/TEZ-2325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Shah updated TEZ-2325: - Priority: Major (was: Critical) Route status update event directly to the attempt -- Key: TEZ-2325 URL: https://issues.apache.org/jira/browse/TEZ-2325 Project: Apache Tez Issue Type: Bug Reporter: Bikas Saha Assignee: Prakash Ramachandran Fix For: 0.7.0 Attachments: TEZ-2325.1.patch, TEZ-2325.2.patch, TEZ-2325.3.patch, TEZ-2325.4.patch Today, all events from the attempt heartbeat are routed to the vertex. then the vertex routes (if any) status update events to the attempt. This is unnecessary and potentially creates out of order scenarios. We could route the status update events directly to attempts. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TEZ-2325) Route status update event directly to the attempt
[ https://issues.apache.org/jira/browse/TEZ-2325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Shah updated TEZ-2325: - Priority: Critical (was: Major) Route status update event directly to the attempt -- Key: TEZ-2325 URL: https://issues.apache.org/jira/browse/TEZ-2325 Project: Apache Tez Issue Type: Bug Reporter: Bikas Saha Assignee: Prakash Ramachandran Priority: Critical Fix For: 0.7.0 Attachments: TEZ-2325.1.patch, TEZ-2325.2.patch, TEZ-2325.3.patch, TEZ-2325.4.patch Today, all events from the attempt heartbeat are routed to the vertex. then the vertex routes (if any) status update events to the attempt. This is unnecessary and potentially creates out of order scenarios. We could route the status update events directly to attempts. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-2404) Handle DataMovementEvent before its TaskAttemptCompletedEvent
[ https://issues.apache.org/jira/browse/TEZ-2404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14526931#comment-14526931 ] Hitesh Shah commented on TEZ-2404: -- BUmping up priority as this means recovery is potentially broken. [~zjffdu] It looks like we need a recovery related test to ensure that data movements events are always stored before a task completion event. Handle DataMovementEvent before its TaskAttemptCompletedEvent - Key: TEZ-2404 URL: https://issues.apache.org/jira/browse/TEZ-2404 Project: Apache Tez Issue Type: Bug Reporter: Jeff Zhang Assignee: Jeff Zhang Priority: Critical Attachments: TEZ-2404-1.patch, TEZ-2404-2.patch TEZ-2325 route TASK_ATTEMPT_COMPLETED_EVENT directly to the attempt, but it would cause recovery issue. Recovery need that DataMovement event is handled before TaskAttemptCompletedEvent, otherwise DataMovement event may be lost in recovering and cause the its dependent tasks hang. 2 Ways to fix this issue. 1. Still route TaskAtttemptCompletedEvent in Vertex 2. route DataMovementEvent before TaskAttemptCompeltedEvent in TezTaskAttemptListener -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (TEZ-2404) Handle DataMovementEvent before its TaskAttemptCompletedEvent
[ https://issues.apache.org/jira/browse/TEZ-2404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14526931#comment-14526931 ] Hitesh Shah edited comment on TEZ-2404 at 5/4/15 5:54 PM: -- BUmping up priority as this means recovery is potentially broken. [~zjffdu] It looks like we need a recovery related test to ensure that all data movements events are always stored before a task completion event. was (Author: hitesh): BUmping up priority as this means recovery is potentially broken. [~zjffdu] It looks like we need a recovery related test to ensure that data movements events are always stored before a task completion event. Handle DataMovementEvent before its TaskAttemptCompletedEvent - Key: TEZ-2404 URL: https://issues.apache.org/jira/browse/TEZ-2404 Project: Apache Tez Issue Type: Bug Reporter: Jeff Zhang Assignee: Jeff Zhang Priority: Critical Attachments: TEZ-2404-1.patch, TEZ-2404-2.patch TEZ-2325 route TASK_ATTEMPT_COMPLETED_EVENT directly to the attempt, but it would cause recovery issue. Recovery need that DataMovement event is handled before TaskAttemptCompletedEvent, otherwise DataMovement event may be lost in recovering and cause the its dependent tasks hang. 2 Ways to fix this issue. 1. Still route TaskAtttemptCompletedEvent in Vertex 2. route DataMovementEvent before TaskAttemptCompeltedEvent in TezTaskAttemptListener -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-2379) org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: T_ATTEMPT_KILLED at KILLED
[ https://issues.apache.org/jira/browse/TEZ-2379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14526935#comment-14526935 ] Bikas Saha commented on TEZ-2379: - lgtm pending jenkins. If possible, could you put a comment in the task impl state machine summarizing the other scenario where we could ignore attempt killed in the attempt if the attempt is succeeded. In case, we hit this issue in the future for some other scenario, that may provide some context to simplify the debugging. org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: T_ATTEMPT_KILLED at KILLED -- Key: TEZ-2379 URL: https://issues.apache.org/jira/browse/TEZ-2379 Project: Apache Tez Issue Type: Bug Reporter: Rajesh Balamohan Assignee: Hitesh Shah Priority: Blocker Attachments: TEZ-2379.1.patch, TEZ-2379.2.patch {noformat} 2015-04-28 04:49:32,455 ERROR [Dispatcher thread: Central] impl.TaskImpl: Can't handle this event at current state for task_1429683757595_0479_1_03_13 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: T_ATTEMPT_KILLED at KILLED at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.tez.state.StateMachineTez.doTransition(StateMachineTez.java:57) at org.apache.tez.dag.app.dag.impl.TaskImpl.handle(TaskImpl.java:853) at org.apache.tez.dag.app.dag.impl.TaskImpl.handle(TaskImpl.java:106) at org.apache.tez.dag.app.DAGAppMaster$TaskEventDispatcher.handle(DAGAppMaster.java:1874) at org.apache.tez.dag.app.DAGAppMaster$TaskEventDispatcher.handle(DAGAppMaster.java:1860) at org.apache.tez.common.AsyncDispatcher.dispatch(AsyncDispatcher.java:182) at org.apache.tez.common.AsyncDispatcher$1.run(AsyncDispatcher.java:113) at java.lang.Thread.run(Thread.java:745) {noformat} Additional notes: Hive - latest build Tez - master tpch-200 gb scale q_17 (kill the job in the middle of execution) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-2379) org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: T_ATTEMPT_KILLED at KILLED
[ https://issues.apache.org/jira/browse/TEZ-2379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14526952#comment-14526952 ] Hitesh Shah commented on TEZ-2379: -- Will update the final patch with the relevant note related to the kill transition via killUnfinishedAttempt race. org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: T_ATTEMPT_KILLED at KILLED -- Key: TEZ-2379 URL: https://issues.apache.org/jira/browse/TEZ-2379 Project: Apache Tez Issue Type: Bug Reporter: Rajesh Balamohan Assignee: Hitesh Shah Priority: Blocker Attachments: TEZ-2379.1.patch, TEZ-2379.2.patch {noformat} 2015-04-28 04:49:32,455 ERROR [Dispatcher thread: Central] impl.TaskImpl: Can't handle this event at current state for task_1429683757595_0479_1_03_13 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: T_ATTEMPT_KILLED at KILLED at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.tez.state.StateMachineTez.doTransition(StateMachineTez.java:57) at org.apache.tez.dag.app.dag.impl.TaskImpl.handle(TaskImpl.java:853) at org.apache.tez.dag.app.dag.impl.TaskImpl.handle(TaskImpl.java:106) at org.apache.tez.dag.app.DAGAppMaster$TaskEventDispatcher.handle(DAGAppMaster.java:1874) at org.apache.tez.dag.app.DAGAppMaster$TaskEventDispatcher.handle(DAGAppMaster.java:1860) at org.apache.tez.common.AsyncDispatcher.dispatch(AsyncDispatcher.java:182) at org.apache.tez.common.AsyncDispatcher$1.run(AsyncDispatcher.java:113) at java.lang.Thread.run(Thread.java:745) {noformat} Additional notes: Hive - latest build Tez - master tpch-200 gb scale q_17 (kill the job in the middle of execution) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TEZ-2379) org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: T_ATTEMPT_KILLED at KILLED
[ https://issues.apache.org/jira/browse/TEZ-2379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Shah updated TEZ-2379: - Attachment: TEZ-2379.3.patch Final patch with doc comment. org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: T_ATTEMPT_KILLED at KILLED -- Key: TEZ-2379 URL: https://issues.apache.org/jira/browse/TEZ-2379 Project: Apache Tez Issue Type: Bug Reporter: Rajesh Balamohan Assignee: Hitesh Shah Priority: Blocker Attachments: TEZ-2379.1.patch, TEZ-2379.2.patch, TEZ-2379.3.patch {noformat} 2015-04-28 04:49:32,455 ERROR [Dispatcher thread: Central] impl.TaskImpl: Can't handle this event at current state for task_1429683757595_0479_1_03_13 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: T_ATTEMPT_KILLED at KILLED at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.tez.state.StateMachineTez.doTransition(StateMachineTez.java:57) at org.apache.tez.dag.app.dag.impl.TaskImpl.handle(TaskImpl.java:853) at org.apache.tez.dag.app.dag.impl.TaskImpl.handle(TaskImpl.java:106) at org.apache.tez.dag.app.DAGAppMaster$TaskEventDispatcher.handle(DAGAppMaster.java:1874) at org.apache.tez.dag.app.DAGAppMaster$TaskEventDispatcher.handle(DAGAppMaster.java:1860) at org.apache.tez.common.AsyncDispatcher.dispatch(AsyncDispatcher.java:182) at org.apache.tez.common.AsyncDispatcher$1.run(AsyncDispatcher.java:113) at java.lang.Thread.run(Thread.java:745) {noformat} Additional notes: Hive - latest build Tez - master tpch-200 gb scale q_17 (kill the job in the middle of execution) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-2379) org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: T_ATTEMPT_KILLED at KILLED
[ https://issues.apache.org/jira/browse/TEZ-2379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14527033#comment-14527033 ] Siddharth Seth commented on TEZ-2379: - One thing to consider here is that the individual state machines should be complete in themselves, and should not make assumptions about other state machines. This makes them a lot easier to reason about (we aren't there yet though) TaskImpl - Already knows how to handle ATTEMPT_KILLED and ATTEMPT_FAILED in the SUCCESS state. It'll, however, error out in the FAILED or KILLED state - but there's nothing to be done there if these events are received. TaskAttemptImpl - If moving from an one 'external' state to another - should inform the Task, and let it deal with the state change. org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: T_ATTEMPT_KILLED at KILLED -- Key: TEZ-2379 URL: https://issues.apache.org/jira/browse/TEZ-2379 Project: Apache Tez Issue Type: Bug Reporter: Rajesh Balamohan Assignee: Hitesh Shah Priority: Blocker Attachments: TEZ-2379.1.patch, TEZ-2379.2.patch, TEZ-2379.3.patch {noformat} 2015-04-28 04:49:32,455 ERROR [Dispatcher thread: Central] impl.TaskImpl: Can't handle this event at current state for task_1429683757595_0479_1_03_13 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: T_ATTEMPT_KILLED at KILLED at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.tez.state.StateMachineTez.doTransition(StateMachineTez.java:57) at org.apache.tez.dag.app.dag.impl.TaskImpl.handle(TaskImpl.java:853) at org.apache.tez.dag.app.dag.impl.TaskImpl.handle(TaskImpl.java:106) at org.apache.tez.dag.app.DAGAppMaster$TaskEventDispatcher.handle(DAGAppMaster.java:1874) at org.apache.tez.dag.app.DAGAppMaster$TaskEventDispatcher.handle(DAGAppMaster.java:1860) at org.apache.tez.common.AsyncDispatcher.dispatch(AsyncDispatcher.java:182) at org.apache.tez.common.AsyncDispatcher$1.run(AsyncDispatcher.java:113) at java.lang.Thread.run(Thread.java:745) {noformat} Additional notes: Hive - latest build Tez - master tpch-200 gb scale q_17 (kill the job in the middle of execution) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Failed: TEZ-2379 PreCommit Build #615
Jira: https://issues.apache.org/jira/browse/TEZ-2379 Build: https://builds.apache.org/job/PreCommit-TEZ-Build/615/ ### ## LAST 60 LINES OF THE CONSOLE ### [...truncated 2584 lines...] {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12730216/TEZ-2379.2.patch against master revision c411e4e. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in : org.apache.tez.test.TestFaultTolerance Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/615//testReport/ Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/615//console This message is automatically generated. == == Adding comment to Jira. == == Comment added. 005e3e67c11cbc11968bd2e985d4dadadc43f6bd logged out == == Finished build. == == Build step 'Execute shell' marked build as failure Archiving artifacts Sending artifact delta relative to PreCommit-TEZ-Build #614 Archived 44 artifacts Archive block size is 32768 Received 26 blocks and 1887404 bytes Compression is 31.1% Took 1.5 sec [description-setter] Could not determine description. Recording test results Email was triggered for: Failure Sending email for trigger: Failure ### ## FAILED TESTS (if any) ## 6 tests failed. REGRESSION: org.apache.tez.test.TestFaultTolerance.testBasicInputFailureWithExit Error Message: test timed out after 6 milliseconds Stack Trace: java.lang.Exception: test timed out after 6 milliseconds at java.lang.Thread.sleep(Native Method) at org.apache.hadoop.ipc.Client$Connection.handleConnectionFailure(Client.java:853) at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:626) at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:705) at org.apache.hadoop.ipc.Client$Connection.access$2800(Client.java:368) at org.apache.hadoop.ipc.Client.getConnection(Client.java:1521) at org.apache.hadoop.ipc.Client.call(Client.java:1438) at org.apache.hadoop.ipc.Client.call(Client.java:1399) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232) at com.sun.proxy.$Proxy91.getDAGStatus(Unknown Source) at org.apache.tez.dag.api.client.rpc.DAGClientRPCImpl.getDAGStatusViaAM(DAGClientRPCImpl.java:175) at org.apache.tez.dag.api.client.rpc.DAGClientRPCImpl.getDAGStatus(DAGClientRPCImpl.java:94) at org.apache.tez.dag.api.client.DAGClientImpl.getDAGStatusViaAM(DAGClientImpl.java:350) at org.apache.tez.dag.api.client.DAGClientImpl.getDAGStatusInternal(DAGClientImpl.java:217) at org.apache.tez.dag.api.client.DAGClientImpl.getDAGStatus(DAGClientImpl.java:262) at org.apache.tez.test.TestFaultTolerance.runDAGAndVerify(TestFaultTolerance.java:127) at org.apache.tez.test.TestFaultTolerance.runDAGAndVerify(TestFaultTolerance.java:114) at org.apache.tez.test.TestFaultTolerance.testBasicInputFailureWithExit(TestFaultTolerance.java:248) REGRESSION: org.apache.tez.test.TestFaultTolerance.testInputFailureRerunCanSendOutputToTwoDownstreamVertices Error Message: test timed out after 6 milliseconds Stack Trace: java.lang.Exception: test timed out after 6 milliseconds at java.lang.Thread.sleep(Native Method) at org.apache.hadoop.ipc.Client$Connection.handleConnectionFailure(Client.java:853) at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:626)
[jira] [Commented] (TEZ-2379) org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: T_ATTEMPT_KILLED at KILLED
[ https://issues.apache.org/jira/browse/TEZ-2379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14527041#comment-14527041 ] TezQA commented on TEZ-2379: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12730216/TEZ-2379.2.patch against master revision c411e4e. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in : org.apache.tez.test.TestFaultTolerance Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/615//testReport/ Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/615//console This message is automatically generated. org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: T_ATTEMPT_KILLED at KILLED -- Key: TEZ-2379 URL: https://issues.apache.org/jira/browse/TEZ-2379 Project: Apache Tez Issue Type: Bug Reporter: Rajesh Balamohan Assignee: Hitesh Shah Priority: Blocker Attachments: TEZ-2379.1.patch, TEZ-2379.2.patch, TEZ-2379.3.patch {noformat} 2015-04-28 04:49:32,455 ERROR [Dispatcher thread: Central] impl.TaskImpl: Can't handle this event at current state for task_1429683757595_0479_1_03_13 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: T_ATTEMPT_KILLED at KILLED at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.tez.state.StateMachineTez.doTransition(StateMachineTez.java:57) at org.apache.tez.dag.app.dag.impl.TaskImpl.handle(TaskImpl.java:853) at org.apache.tez.dag.app.dag.impl.TaskImpl.handle(TaskImpl.java:106) at org.apache.tez.dag.app.DAGAppMaster$TaskEventDispatcher.handle(DAGAppMaster.java:1874) at org.apache.tez.dag.app.DAGAppMaster$TaskEventDispatcher.handle(DAGAppMaster.java:1860) at org.apache.tez.common.AsyncDispatcher.dispatch(AsyncDispatcher.java:182) at org.apache.tez.common.AsyncDispatcher$1.run(AsyncDispatcher.java:113) at java.lang.Thread.run(Thread.java:745) {noformat} Additional notes: Hive - latest build Tez - master tpch-200 gb scale q_17 (kill the job in the middle of execution) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-1897) Create a concurrent version of AsyncDispatcher
[ https://issues.apache.org/jira/browse/TEZ-1897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14527068#comment-14527068 ] Siddharth Seth commented on TEZ-1897: - Looks like this went in with concurrentDispatchers enabled. Can you please undo that bit. Create a concurrent version of AsyncDispatcher -- Key: TEZ-1897 URL: https://issues.apache.org/jira/browse/TEZ-1897 Project: Apache Tez Issue Type: Task Reporter: Bikas Saha Assignee: Bikas Saha Fix For: 0.7.0 Attachments: TEZ-1897.1.patch, TEZ-1897.2.patch, TEZ-1897.3.patch, TEZ-1897.4.patch, TEZ-1897.5.patch, TEZ-1897.6.patch, TEZ-1897.7.patch, TEZ-1897.8.patch Currently, it processes events on a single thread. For events that can be executed in parallel, e.g. vertex manager events, allowing higher concurrency may be beneficial. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-1897) Create a concurrent version of AsyncDispatcher
[ https://issues.apache.org/jira/browse/TEZ-1897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14527081#comment-14527081 ] Bikas Saha commented on TEZ-1897: - Thanks for catching it. My bad. Fixed. commit 5218f481dba2a26c3aa5dd8f69285ab9da419dd1 Author: Bikas Saha bi...@apache.org Date: Mon May 4 12:05:39 2015 -0700 TEZ-1897 addendum to turn off by default . Create a concurrent version of AsyncDispatcher (bikas) Create a concurrent version of AsyncDispatcher -- Key: TEZ-1897 URL: https://issues.apache.org/jira/browse/TEZ-1897 Project: Apache Tez Issue Type: Task Reporter: Bikas Saha Assignee: Bikas Saha Fix For: 0.7.0 Attachments: TEZ-1897.1.patch, TEZ-1897.2.patch, TEZ-1897.3.patch, TEZ-1897.4.patch, TEZ-1897.5.patch, TEZ-1897.6.patch, TEZ-1897.7.patch, TEZ-1897.8.patch Currently, it processes events on a single thread. For events that can be executed in parallel, e.g. vertex manager events, allowing higher concurrency may be beneficial. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Failed: TEZ-2379 PreCommit Build #616
Jira: https://issues.apache.org/jira/browse/TEZ-2379 Build: https://builds.apache.org/job/PreCommit-TEZ-Build/616/ ### ## LAST 60 LINES OF THE CONSOLE ### [...truncated 2584 lines...] {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12730222/TEZ-2379.3.patch against master revision c411e4e. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in : org.apache.tez.test.TestFaultTolerance Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/616//testReport/ Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/616//console This message is automatically generated. == == Adding comment to Jira. == == Comment added. cc2efeeb76b37c65ffb7373e0d2780bdd0d8ade5 logged out == == Finished build. == == Build step 'Execute shell' marked build as failure Archiving artifacts Sending artifact delta relative to PreCommit-TEZ-Build #614 Archived 44 artifacts Archive block size is 32768 Received 6 blocks and 2540593 bytes Compression is 7.2% Took 1.8 sec [description-setter] Could not determine description. Recording test results Email was triggered for: Failure Sending email for trigger: Failure ### ## FAILED TESTS (if any) ## 6 tests failed. FAILED: org.apache.tez.test.TestFaultTolerance.testBasicInputFailureWithExit Error Message: test timed out after 6 milliseconds Stack Trace: java.lang.Exception: test timed out after 6 milliseconds at java.lang.Thread.sleep(Native Method) at org.apache.hadoop.ipc.Client$Connection.handleConnectionFailure(Client.java:853) at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:626) at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:705) at org.apache.hadoop.ipc.Client$Connection.access$2800(Client.java:368) at org.apache.hadoop.ipc.Client.getConnection(Client.java:1521) at org.apache.hadoop.ipc.Client.call(Client.java:1438) at org.apache.hadoop.ipc.Client.call(Client.java:1399) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232) at com.sun.proxy.$Proxy91.getDAGStatus(Unknown Source) at org.apache.tez.dag.api.client.rpc.DAGClientRPCImpl.getDAGStatusViaAM(DAGClientRPCImpl.java:175) at org.apache.tez.dag.api.client.rpc.DAGClientRPCImpl.getDAGStatus(DAGClientRPCImpl.java:94) at org.apache.tez.dag.api.client.DAGClientImpl.getDAGStatusViaAM(DAGClientImpl.java:350) at org.apache.tez.dag.api.client.DAGClientImpl.getDAGStatusInternal(DAGClientImpl.java:217) at org.apache.tez.dag.api.client.DAGClientImpl.getDAGStatus(DAGClientImpl.java:262) at org.apache.tez.test.TestFaultTolerance.runDAGAndVerify(TestFaultTolerance.java:127) at org.apache.tez.test.TestFaultTolerance.runDAGAndVerify(TestFaultTolerance.java:114) at org.apache.tez.test.TestFaultTolerance.testBasicInputFailureWithExit(TestFaultTolerance.java:248) FAILED: org.apache.tez.test.TestFaultTolerance.testInputFailureRerunCanSendOutputToTwoDownstreamVertices Error Message: test timed out after 6 milliseconds Stack Trace: java.lang.Exception: test timed out after 6 milliseconds at java.lang.Thread.sleep(Native Method) at org.apache.hadoop.ipc.Client$Connection.handleConnectionFailure(Client.java:853) at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:626) at
[jira] [Commented] (TEZ-2379) org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: T_ATTEMPT_KILLED at KILLED
[ https://issues.apache.org/jira/browse/TEZ-2379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14527095#comment-14527095 ] TezQA commented on TEZ-2379: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12730222/TEZ-2379.3.patch against master revision c411e4e. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in : org.apache.tez.test.TestFaultTolerance Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/616//testReport/ Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/616//console This message is automatically generated. org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: T_ATTEMPT_KILLED at KILLED -- Key: TEZ-2379 URL: https://issues.apache.org/jira/browse/TEZ-2379 Project: Apache Tez Issue Type: Bug Reporter: Rajesh Balamohan Assignee: Hitesh Shah Priority: Blocker Attachments: TEZ-2379.1.patch, TEZ-2379.2.patch, TEZ-2379.3.patch {noformat} 2015-04-28 04:49:32,455 ERROR [Dispatcher thread: Central] impl.TaskImpl: Can't handle this event at current state for task_1429683757595_0479_1_03_13 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: T_ATTEMPT_KILLED at KILLED at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.tez.state.StateMachineTez.doTransition(StateMachineTez.java:57) at org.apache.tez.dag.app.dag.impl.TaskImpl.handle(TaskImpl.java:853) at org.apache.tez.dag.app.dag.impl.TaskImpl.handle(TaskImpl.java:106) at org.apache.tez.dag.app.DAGAppMaster$TaskEventDispatcher.handle(DAGAppMaster.java:1874) at org.apache.tez.dag.app.DAGAppMaster$TaskEventDispatcher.handle(DAGAppMaster.java:1860) at org.apache.tez.common.AsyncDispatcher.dispatch(AsyncDispatcher.java:182) at org.apache.tez.common.AsyncDispatcher$1.run(AsyncDispatcher.java:113) at java.lang.Thread.run(Thread.java:745) {noformat} Additional notes: Hive - latest build Tez - master tpch-200 gb scale q_17 (kill the job in the middle of execution) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-2379) org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: T_ATTEMPT_KILLED at KILLED
[ https://issues.apache.org/jira/browse/TEZ-2379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14527102#comment-14527102 ] Hitesh Shah commented on TEZ-2379: -- Re-ran TestFaultTolerance locally without any problems. Looks like it probably failed due to the concurrent AsyncDispatcher being turned on by default. org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: T_ATTEMPT_KILLED at KILLED -- Key: TEZ-2379 URL: https://issues.apache.org/jira/browse/TEZ-2379 Project: Apache Tez Issue Type: Bug Reporter: Rajesh Balamohan Assignee: Hitesh Shah Priority: Blocker Attachments: TEZ-2379.1.patch, TEZ-2379.2.patch, TEZ-2379.3.patch {noformat} 2015-04-28 04:49:32,455 ERROR [Dispatcher thread: Central] impl.TaskImpl: Can't handle this event at current state for task_1429683757595_0479_1_03_13 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: T_ATTEMPT_KILLED at KILLED at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.tez.state.StateMachineTez.doTransition(StateMachineTez.java:57) at org.apache.tez.dag.app.dag.impl.TaskImpl.handle(TaskImpl.java:853) at org.apache.tez.dag.app.dag.impl.TaskImpl.handle(TaskImpl.java:106) at org.apache.tez.dag.app.DAGAppMaster$TaskEventDispatcher.handle(DAGAppMaster.java:1874) at org.apache.tez.dag.app.DAGAppMaster$TaskEventDispatcher.handle(DAGAppMaster.java:1860) at org.apache.tez.common.AsyncDispatcher.dispatch(AsyncDispatcher.java:182) at org.apache.tez.common.AsyncDispatcher$1.run(AsyncDispatcher.java:113) at java.lang.Thread.run(Thread.java:745) {noformat} Additional notes: Hive - latest build Tez - master tpch-200 gb scale q_17 (kill the job in the middle of execution) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-2407) Drop references to the old DataInputBuffer impl in PipelinedSorter
[ https://issues.apache.org/jira/browse/TEZ-2407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14527112#comment-14527112 ] Gopal V commented on TEZ-2407: -- This is code-cleanliness refactoring - this does not add performance or stability fixes. I'm 90% done with my scale stability testing of the new sorter, so late refactoring has the potential to only introduce bugs deep inside the sorter. I don't have enough weeks of testing left on my end and all this might do is make code readable at best and break the sorters at worst. We can retarget this for 0.7.x if you think there's enough QA weeks left to catch any late issues this might introduce. Drop references to the old DataInputBuffer impl in PipelinedSorter -- Key: TEZ-2407 URL: https://issues.apache.org/jira/browse/TEZ-2407 Project: Apache Tez Issue Type: Bug Reporter: Rajesh Balamohan -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-2405) PipelinedSorter can throw NPE with custom compartor
[ https://issues.apache.org/jira/browse/TEZ-2405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14527124#comment-14527124 ] Gopal V commented on TEZ-2405: -- [~hitesh]: nope, this was introduced during 0.7 release cycle in my WIP patch for TEZ-1593. The TEZ-1593 issue was identified in 0.6 but was not fixed in 0.6 release cycle as we wanted to do core fixes at the beginning of a release cycle rather than at the end. PipelinedSorter can throw NPE with custom compartor --- Key: TEZ-2405 URL: https://issues.apache.org/jira/browse/TEZ-2405 Project: Apache Tez Issue Type: Bug Reporter: Rajesh Balamohan Assignee: Rajesh Balamohan Priority: Critical Fix For: 0.7.0 Attachments: TEZ-2405.1.patch If custom comparators are used, PipelinedSorter can throw NPE depending on custom comparator implementations. {noformat} ], TaskAttempt 1 failed, info=[Error: Failure while running task:java.lang.NullPointerException at org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter$SpanIterator.compareTo(PipelinedSorter.java:837) at org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter$SpanIterator.compareTo(PipelinedSorter.java:767) at java.util.PriorityQueue.siftUpComparable(PriorityQueue.java:637) at java.util.PriorityQueue.siftUp(PriorityQueue.java:629) at java.util.PriorityQueue.offer(PriorityQueue.java:329) at java.util.PriorityQueue.add(PriorityQueue.java:306) at org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter$SpanMerger.add(PipelinedSorter.java:996) at org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter$SpanMerger.next(PipelinedSorter.java:1065) at org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter$PartitionFilter.next(PipelinedSorter.java:936) at org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter.spill(PipelinedSorter.java:366) at org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter.flush(PipelinedSorter.java:406) at org.apache.tez.runtime.library.output.OrderedPartitionedKVOutput.close(OrderedPartitionedKVOutput.java:183) at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.close(LogicalIOProcessorRuntimeTask.java:355) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:181) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:171) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:171) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:167) at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-776) Reduce AM mem usage caused by storing TezEvents
[ https://issues.apache.org/jira/browse/TEZ-776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14527677#comment-14527677 ] Siddharth Seth commented on TEZ-776: - Minor: {code}Target input indices. The number must match the number of events{code} count / size of array may make this a little clearer. 'number' is a little vague. - Precondition checks to verify this ? - BroadcastEdgeManger - commonRouteMeta setup via prepareForRouting. Not sure access this structure at a later point is thread safe. This goes away anyway if Broadcast/OneToOne are left unchanged. - Bunch of repeated code between OneToOne, Broadcast, ScatterGather etc in Edge.java. Looks like it's all the same (exploding the EventRouteMetadata) - Not sure if the thread safety applies to ScatterGather as well. That seems to be making changes within a lock though. Seems fairly complicated, assuming that's all for caching and efficiency ? - There's several methods on EdgeManagerPluginContextOnDemand which don't need to be implemented/extended (The method on EdgeManagerPluginContext should be sufficient). - e.g. initialize(), getContext, some of the routing methods - I'm still concerned about the access to taskEvents (taskEvents.size() and taskEvents.get()). This is an array list getting populated in one thread, and accessed in 30 others without a lock. ArrayList isn't supposed to be thread safe afaik. Will let someone else chime in here. On TEZ-2409. I think it'll be better to get that done here itself. It's probably 10 more lines, and removes the changes on Broadcast/OneToOne. 2409 becomes a blocker for 0.7 anyway - and would end up reverting undoing changes made here. The overall functionality is already tested by the various jobs that we run. Reduce AM mem usage caused by storing TezEvents --- Key: TEZ-776 URL: https://issues.apache.org/jira/browse/TEZ-776 Project: Apache Tez Issue Type: Sub-task Reporter: Siddharth Seth Assignee: Bikas Saha Attachments: TEZ-776.1.patch, TEZ-776.10.patch, TEZ-776.11.patch, TEZ-776.2.patch, TEZ-776.3.patch, TEZ-776.4.patch, TEZ-776.5.patch, TEZ-776.6.A.patch, TEZ-776.6.B.patch, TEZ-776.7.patch, TEZ-776.8.patch, TEZ-776.9.patch, TEZ-776.ondemand.1.patch, TEZ-776.ondemand.2.patch, TEZ-776.ondemand.3.patch, TEZ-776.ondemand.4.patch, TEZ-776.ondemand.5.patch, TEZ-776.ondemand.6.patch, TEZ-776.ondemand.7.patch, TEZ-776.ondemand.patch, With_Patch_AM_hotspots.png, With_Patch_AM_profile.png, Without_patch_AM_CPU_Usage.png, events-problem-solutions.txt, with_patch_jmc_output_of_AM.png, without_patch_jmc_output_of_AM.png This is open ended at the moment. A fair chunk of the AM heap is taken up by TezEvents (specifically DataMovementEvents - 64 bytes per event). Depending on the connection pattern - this puts limits on the number of tasks that can be processed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-2221) VertexGroup name should be unqiue
[ https://issues.apache.org/jira/browse/TEZ-2221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14527830#comment-14527830 ] Hitesh Shah commented on TEZ-2221: -- This implies that oA ( or oB ) cannot belong to 2 different vertex groups and therefore the check currently implemented probably needs to be changed to account for this and not be based on vertex members of the group. VertexGroup name should be unqiue - Key: TEZ-2221 URL: https://issues.apache.org/jira/browse/TEZ-2221 Project: Apache Tez Issue Type: Bug Reporter: Jeff Zhang Assignee: Jeff Zhang Fix For: 0.7.0, 0.5.4, 0.6.1 Attachments: TEZ-2221-1.patch, TEZ-2221-2.patch, TEZ-2221-3.patch, TEZ-2221-4.patch VertexGroupCommitStartedEvent VertexGroupCommitFinishedEvent use vertex group name to identify the vertex group commit, the same name of vertex group will conflict. While in the current equals hashCode of VertexGroup, vertex group name and members name are used. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-2221) VertexGroup name should be unqiue
[ https://issues.apache.org/jira/browse/TEZ-2221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14527804#comment-14527804 ] Bikas Saha commented on TEZ-2221: - The commit behavior is different. Only the participating outputs of a vertex are committed when a vertex group commits. A vertex can be part of 2 vertex groups A and B with outputs oA and oB for each group respectively. oA is committed when A finishes and oB is committed when oB is committed. VertexGroup name should be unqiue - Key: TEZ-2221 URL: https://issues.apache.org/jira/browse/TEZ-2221 Project: Apache Tez Issue Type: Bug Reporter: Jeff Zhang Assignee: Jeff Zhang Fix For: 0.7.0, 0.5.4, 0.6.1 Attachments: TEZ-2221-1.patch, TEZ-2221-2.patch, TEZ-2221-3.patch, TEZ-2221-4.patch VertexGroupCommitStartedEvent VertexGroupCommitFinishedEvent use vertex group name to identify the vertex group commit, the same name of vertex group will conflict. While in the current equals hashCode of VertexGroup, vertex group name and members name are used. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (TEZ-2221) VertexGroup name should be unqiue
[ https://issues.apache.org/jira/browse/TEZ-2221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14527830#comment-14527830 ] Hitesh Shah edited comment on TEZ-2221 at 5/5/15 3:06 AM: -- This implies that oA ( or oB ) cannot belong to 2 different vertex groups and therefore the check currently implemented probably needs to be changed to account for this and not be based on vertex members of the group. [~bikassaha] [~zjffdu] if the above is correct, it seems that we should revert this commit? Agree? was (Author: hitesh): This implies that oA ( or oB ) cannot belong to 2 different vertex groups and therefore the check currently implemented probably needs to be changed to account for this and not be based on vertex members of the group. VertexGroup name should be unqiue - Key: TEZ-2221 URL: https://issues.apache.org/jira/browse/TEZ-2221 Project: Apache Tez Issue Type: Bug Reporter: Jeff Zhang Assignee: Jeff Zhang Fix For: 0.7.0, 0.5.4, 0.6.1 Attachments: TEZ-2221-1.patch, TEZ-2221-2.patch, TEZ-2221-3.patch, TEZ-2221-4.patch VertexGroupCommitStartedEvent VertexGroupCommitFinishedEvent use vertex group name to identify the vertex group commit, the same name of vertex group will conflict. While in the current equals hashCode of VertexGroup, vertex group name and members name are used. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-2221) VertexGroup name should be unqiue
[ https://issues.apache.org/jira/browse/TEZ-2221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14527869#comment-14527869 ] Jeff Zhang commented on TEZ-2221: - I think this is must to have to disallow {code} dag.createVertexGroup(group_1, v1,v2); dag.createVertexGroup(group_1, v2,v3); {code} and this is nice to have to disallow for avoiding any conflict between the 2 vertex group with same members. Although currently there's no conflicts, VertexGroup#addDataSink is a potential one if adding same output to the 2 vertex group with same members, but the conflict will be detected by Vertex#addAdditionalDataSink) {code} dag.createVertexGroup(group_1, v1,v2); dag.createVertexGroup(group_2, v1,v2); {code} Since case 1 (must to have) impact the pig and pig don't use case 2, why not keep this patch ? VertexGroup name should be unqiue - Key: TEZ-2221 URL: https://issues.apache.org/jira/browse/TEZ-2221 Project: Apache Tez Issue Type: Bug Reporter: Jeff Zhang Assignee: Jeff Zhang Fix For: 0.7.0, 0.5.4, 0.6.1 Attachments: TEZ-2221-1.patch, TEZ-2221-2.patch, TEZ-2221-3.patch, TEZ-2221-4.patch VertexGroupCommitStartedEvent VertexGroupCommitFinishedEvent use vertex group name to identify the vertex group commit, the same name of vertex group will conflict. While in the current equals hashCode of VertexGroup, vertex group name and members name are used. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (TEZ-2410) VertexGroupCommitFinishedEvent is not logged correctly
Jeff Zhang created TEZ-2410: --- Summary: VertexGroupCommitFinishedEvent is not logged correctly Key: TEZ-2410 URL: https://issues.apache.org/jira/browse/TEZ-2410 Project: Apache Tez Issue Type: Bug Affects Versions: 0.7.0 Reporter: Jeff Zhang Assignee: Jeff Zhang Fix For: 0.7.0 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-2408) TestTaskAttempt fails to compile against hadoop-2.4 and hadoop-2.2
[ https://issues.apache.org/jira/browse/TEZ-2408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14527690#comment-14527690 ] TezQA commented on TEZ-2408: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12730285/TEZ-2408.1.patch against master revision e762a35. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:red}-1 javac{color}. The applied patch generated 161 javac compiler warnings (more than the master's current 156 warnings). {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/617//testReport/ Javac warnings: https://builds.apache.org/job/PreCommit-TEZ-Build/617//artifact/patchprocess/diffJavacWarnings.txt Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/617//console This message is automatically generated. TestTaskAttempt fails to compile against hadoop-2.4 and hadoop-2.2 --- Key: TEZ-2408 URL: https://issues.apache.org/jira/browse/TEZ-2408 Project: Apache Tez Issue Type: Bug Reporter: Hitesh Shah Assignee: Hitesh Shah Priority: Minor Attachments: TEZ-2408.1.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Failed: TEZ-2408 PreCommit Build #617
Jira: https://issues.apache.org/jira/browse/TEZ-2408 Build: https://builds.apache.org/job/PreCommit-TEZ-Build/617/ ### ## LAST 60 LINES OF THE CONSOLE ### [...truncated 2783 lines...] {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12730285/TEZ-2408.1.patch against master revision e762a35. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:red}-1 javac{color}. The applied patch generated 161 javac compiler warnings (more than the master's current 156 warnings). {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/617//testReport/ Javac warnings: https://builds.apache.org/job/PreCommit-TEZ-Build/617//artifact/patchprocess/diffJavacWarnings.txt Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/617//console This message is automatically generated. == == Adding comment to Jira. == == Comment added. d2600b3b33265b486e8394a6a086b16465d0ed8f logged out == == Finished build. == == Build step 'Execute shell' marked build as failure Archiving artifacts Sending artifact delta relative to PreCommit-TEZ-Build #614 Archived 45 artifacts Archive block size is 32768 Received 4 blocks and 2626264 bytes Compression is 4.8% Took 1.5 sec [description-setter] Could not determine description. Recording test results Email was triggered for: Failure Sending email for trigger: Failure ### ## FAILED TESTS (if any) ## All tests passed
[jira] [Commented] (TEZ-2221) VertexGroup name should be unqiue
[ https://issues.apache.org/jira/browse/TEZ-2221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14527692#comment-14527692 ] Hitesh Shah commented on TEZ-2221: -- I guess the question boils down to what the behavior should be. When a vertex group is committed, each vertex in it is committed. If the vertex also belongs to another group, what happens? Should a vertex be allowed to belong to 2 vertex groups? If yes, how should its commit be handled? The above check were to ensure some bits of verification for this case but probably need to be enhanced for more stringent checks. VertexGroup name should be unqiue - Key: TEZ-2221 URL: https://issues.apache.org/jira/browse/TEZ-2221 Project: Apache Tez Issue Type: Bug Reporter: Jeff Zhang Assignee: Jeff Zhang Fix For: 0.7.0, 0.5.4, 0.6.1 Attachments: TEZ-2221-1.patch, TEZ-2221-2.patch, TEZ-2221-3.patch, TEZ-2221-4.patch VertexGroupCommitStartedEvent VertexGroupCommitFinishedEvent use vertex group name to identify the vertex group commit, the same name of vertex group will conflict. While in the current equals hashCode of VertexGroup, vertex group name and members name are used. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-2408) TestTaskAttempt fails to compile against hadoop-2.4 and hadoop-2.2
[ https://issues.apache.org/jira/browse/TEZ-2408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14527694#comment-14527694 ] Hitesh Shah commented on TEZ-2408: -- Committing shortly. Thanks for the review [~bikassaha]. New warnings are due to the use of deprecated apis to retain compatibility. TestTaskAttempt fails to compile against hadoop-2.4 and hadoop-2.2 --- Key: TEZ-2408 URL: https://issues.apache.org/jira/browse/TEZ-2408 Project: Apache Tez Issue Type: Bug Reporter: Hitesh Shah Assignee: Hitesh Shah Priority: Minor Attachments: TEZ-2408.1.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Failed: TEZ-776 PreCommit Build #618
Jira: https://issues.apache.org/jira/browse/TEZ-776 Build: https://builds.apache.org/job/PreCommit-TEZ-Build/618/ ### ## LAST 60 LINES OF THE CONSOLE ### [...truncated 2812 lines...] {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12730303/TEZ-776.11.patch against master revision 210619a. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 8 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:red}-1 findbugs{color}. The patch appears to introduce 3 new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/618//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-TEZ-Build/618//artifact/patchprocess/newPatchFindbugsWarningstez-api.html Findbugs warnings: https://builds.apache.org/job/PreCommit-TEZ-Build/618//artifact/patchprocess/newPatchFindbugsWarningstez-dag.html Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/618//console This message is automatically generated. == == Adding comment to Jira. == == Comment added. 403d14acfeed1e196ad7b5958877262739f5fb0a logged out == == Finished build. == == Build step 'Execute shell' marked build as failure Archiving artifacts Sending artifact delta relative to PreCommit-TEZ-Build #614 Archived 44 artifacts Archive block size is 32768 Received 22 blocks and 2056741 bytes Compression is 26.0% Took 0.48 sec [description-setter] Could not determine description. Recording test results Email was triggered for: Failure Sending email for trigger: Failure ### ## FAILED TESTS (if any) ## All tests passed
[jira] [Commented] (TEZ-776) Reduce AM mem usage caused by storing TezEvents
[ https://issues.apache.org/jira/browse/TEZ-776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14527766#comment-14527766 ] TezQA commented on TEZ-776: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12730303/TEZ-776.11.patch against master revision 210619a. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 8 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:red}-1 findbugs{color}. The patch appears to introduce 3 new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/618//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-TEZ-Build/618//artifact/patchprocess/newPatchFindbugsWarningstez-api.html Findbugs warnings: https://builds.apache.org/job/PreCommit-TEZ-Build/618//artifact/patchprocess/newPatchFindbugsWarningstez-dag.html Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/618//console This message is automatically generated. Reduce AM mem usage caused by storing TezEvents --- Key: TEZ-776 URL: https://issues.apache.org/jira/browse/TEZ-776 Project: Apache Tez Issue Type: Sub-task Reporter: Siddharth Seth Assignee: Bikas Saha Attachments: TEZ-776.1.patch, TEZ-776.10.patch, TEZ-776.11.patch, TEZ-776.2.patch, TEZ-776.3.patch, TEZ-776.4.patch, TEZ-776.5.patch, TEZ-776.6.A.patch, TEZ-776.6.B.patch, TEZ-776.7.patch, TEZ-776.8.patch, TEZ-776.9.patch, TEZ-776.ondemand.1.patch, TEZ-776.ondemand.2.patch, TEZ-776.ondemand.3.patch, TEZ-776.ondemand.4.patch, TEZ-776.ondemand.5.patch, TEZ-776.ondemand.6.patch, TEZ-776.ondemand.7.patch, TEZ-776.ondemand.patch, With_Patch_AM_hotspots.png, With_Patch_AM_profile.png, Without_patch_AM_CPU_Usage.png, events-problem-solutions.txt, with_patch_jmc_output_of_AM.png, without_patch_jmc_output_of_AM.png This is open ended at the moment. A fair chunk of the AM heap is taken up by TezEvents (specifically DataMovementEvents - 64 bytes per event). Depending on the connection pattern - this puts limits on the number of tasks that can be processed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TEZ-2366) Pig tez MiniTezCluster unit tests fail intermittently after TEZ-2333
[ https://issues.apache.org/jira/browse/TEZ-2366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prakash Ramachandran updated TEZ-2366: -- Attachment: TEZ-2366.1.patch Pig tez MiniTezCluster unit tests fail intermittently after TEZ-2333 Key: TEZ-2366 URL: https://issues.apache.org/jira/browse/TEZ-2366 Project: Apache Tez Issue Type: Bug Reporter: Daniel Dai Assignee: Prakash Ramachandran Priority: Critical Attachments: TEZ-2366.1.patch, TEZ-2366.test.txt, TEZ-2366.wip.1.patch There are around 20 unit tests (out of around 2000) fail intermittently after TEZ-2333. Here is a stack: {code} org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find output/attempt_1429899954360_0001_1_01_00_1_10003/file.out.index in any of the configured local directories at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathToRead(LocalDirAllocator.java:449) at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathToRead(LocalDirAllocator.java:164) at org.apache.tez.runtime.library.common.shuffle.Fetcher.getShuffleInputFileName(Fetcher.java:611) at org.apache.tez.runtime.library.common.shuffle.Fetcher.getTezIndexRecord(Fetcher.java:591) at org.apache.tez.runtime.library.common.shuffle.Fetcher.doLocalDiskFetch(Fetcher.java:536) at org.apache.tez.runtime.library.common.shuffle.Fetcher.setupLocalDiskFetch(Fetcher.java:517) at org.apache.tez.runtime.library.common.shuffle.Fetcher.callInternal(Fetcher.java:190) at org.apache.tez.runtime.library.common.shuffle.Fetcher.callInternal(Fetcher.java:72) at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) {code} To reproduce that in Pig test, using the following commands: svn co http://svn.apache.org/repos/asf/pig/trunk ant -Dhadoopversion=23 -Dtest.exec.type=tez -Dtestcase=TestTezAutoParallelism test Note in Pig codebase, we already set TEZ_RUNTIME_OPTIMIZE_LOCAL_FETCH to true (http://svn.apache.org/viewvc/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/tez/TezLauncher.java?view=markup). I tried changing TEZ_RUNTIME_OPTIMIZE_LOCAL_FETCH to false in Pig and does not help. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TEZ-776) Reduce AM mem usage caused by storing TezEvents
[ https://issues.apache.org/jira/browse/TEZ-776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bikas Saha updated TEZ-776: --- Attachment: TEZ-776.12.patch Reduce AM mem usage caused by storing TezEvents --- Key: TEZ-776 URL: https://issues.apache.org/jira/browse/TEZ-776 Project: Apache Tez Issue Type: Sub-task Reporter: Siddharth Seth Assignee: Bikas Saha Attachments: TEZ-776.1.patch, TEZ-776.10.patch, TEZ-776.11.patch, TEZ-776.12.patch, TEZ-776.2.patch, TEZ-776.3.patch, TEZ-776.4.patch, TEZ-776.5.patch, TEZ-776.6.A.patch, TEZ-776.6.B.patch, TEZ-776.7.patch, TEZ-776.8.patch, TEZ-776.9.patch, TEZ-776.ondemand.1.patch, TEZ-776.ondemand.2.patch, TEZ-776.ondemand.3.patch, TEZ-776.ondemand.4.patch, TEZ-776.ondemand.5.patch, TEZ-776.ondemand.6.patch, TEZ-776.ondemand.7.patch, TEZ-776.ondemand.patch, With_Patch_AM_hotspots.png, With_Patch_AM_profile.png, Without_patch_AM_CPU_Usage.png, events-problem-solutions.txt, with_patch_jmc_output_of_AM.png, without_patch_jmc_output_of_AM.png This is open ended at the moment. A fair chunk of the AM heap is taken up by TezEvents (specifically DataMovementEvents - 64 bytes per event). Depending on the connection pattern - this puts limits on the number of tasks that can be processed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-776) Reduce AM mem usage caused by storing TezEvents
[ https://issues.apache.org/jira/browse/TEZ-776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14527845#comment-14527845 ] Bikas Saha commented on TEZ-776: prepareForRouting is guarded by synchronized in Edge which creates a read write barrier. Agree about duplication, but each case has minor differences in which indices to use or which events to create and hence hard to merge. Once we move away from event creation in the AM, there will be more scope to reduce duplication. Trying to keep the new abstract class for ODR complete in itself with an eventual goal of not deriving from the legacy class. The array list size read is thread safe. There is only 1 writer which prevents concurrent modification. The size in an array/linked list is an int that is atomically modified. There have been no issues in numerous stress simulations and large jobs. Broadcast edge manager cannot continue to use legacy routing since every consumer task needs events from every producer task leading to memory reference overhead proportional to MxN, which is large for large jobs. I wish I could share your optimism on TEZ-2409 being 10 lines of code but I am afraid I have tried to do it and found it to be a little more involved than that. Besides 10 lines of code would need many more lines of new tests. This does not have to be a blocker for 0.7.0 since its an internal framework change and can be done in 0.7.1 Uploaded new patch with fixes. [~hitesh] [~rajesh.balamohan] There have been fixes for your review comments made in subsequent patches. Do you want to look at them? Reduce AM mem usage caused by storing TezEvents --- Key: TEZ-776 URL: https://issues.apache.org/jira/browse/TEZ-776 Project: Apache Tez Issue Type: Sub-task Reporter: Siddharth Seth Assignee: Bikas Saha Attachments: TEZ-776.1.patch, TEZ-776.10.patch, TEZ-776.11.patch, TEZ-776.12.patch, TEZ-776.2.patch, TEZ-776.3.patch, TEZ-776.4.patch, TEZ-776.5.patch, TEZ-776.6.A.patch, TEZ-776.6.B.patch, TEZ-776.7.patch, TEZ-776.8.patch, TEZ-776.9.patch, TEZ-776.ondemand.1.patch, TEZ-776.ondemand.2.patch, TEZ-776.ondemand.3.patch, TEZ-776.ondemand.4.patch, TEZ-776.ondemand.5.patch, TEZ-776.ondemand.6.patch, TEZ-776.ondemand.7.patch, TEZ-776.ondemand.patch, With_Patch_AM_hotspots.png, With_Patch_AM_profile.png, Without_patch_AM_CPU_Usage.png, events-problem-solutions.txt, with_patch_jmc_output_of_AM.png, without_patch_jmc_output_of_AM.png This is open ended at the moment. A fair chunk of the AM heap is taken up by TezEvents (specifically DataMovementEvents - 64 bytes per event). Depending on the connection pattern - this puts limits on the number of tasks that can be processed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (TEZ-2221) VertexGroup name should be unqiue
[ https://issues.apache.org/jira/browse/TEZ-2221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14527846#comment-14527846 ] Bikas Saha edited comment on TEZ-2221 at 5/5/15 3:33 AM: - By definition oA and oB cannot be part of 2 different groups because they are added to vertexGroups in the API using VertexGroup#addDataSink. So its impossible for the same output/edge to be part of 2 vertex groups. was (Author: bikassaha): By definition oA and oB cannot be part of 2 different groups because they are added to vertexGroups in the API. So its impossible for the same output/edge to be part of 2 vertex groups. VertexGroup name should be unqiue - Key: TEZ-2221 URL: https://issues.apache.org/jira/browse/TEZ-2221 Project: Apache Tez Issue Type: Bug Reporter: Jeff Zhang Assignee: Jeff Zhang Fix For: 0.7.0, 0.5.4, 0.6.1 Attachments: TEZ-2221-1.patch, TEZ-2221-2.patch, TEZ-2221-3.patch, TEZ-2221-4.patch VertexGroupCommitStartedEvent VertexGroupCommitFinishedEvent use vertex group name to identify the vertex group commit, the same name of vertex group will conflict. While in the current equals hashCode of VertexGroup, vertex group name and members name are used. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-2221) VertexGroup name should be unqiue
[ https://issues.apache.org/jira/browse/TEZ-2221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14527846#comment-14527846 ] Bikas Saha commented on TEZ-2221: - By definition oA and oB cannot be part of 2 different groups because they are added to vertexGroups in the API. So its impossible for the same output/edge to be part of 2 vertex groups. VertexGroup name should be unqiue - Key: TEZ-2221 URL: https://issues.apache.org/jira/browse/TEZ-2221 Project: Apache Tez Issue Type: Bug Reporter: Jeff Zhang Assignee: Jeff Zhang Fix For: 0.7.0, 0.5.4, 0.6.1 Attachments: TEZ-2221-1.patch, TEZ-2221-2.patch, TEZ-2221-3.patch, TEZ-2221-4.patch VertexGroupCommitStartedEvent VertexGroupCommitFinishedEvent use vertex group name to identify the vertex group commit, the same name of vertex group will conflict. While in the current equals hashCode of VertexGroup, vertex group name and members name are used. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-2221) VertexGroup name should be unqiue
[ https://issues.apache.org/jira/browse/TEZ-2221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14527850#comment-14527850 ] Bikas Saha commented on TEZ-2221: - Unless there is a technical reason to not support v1,v2 in multiple vertex groups simultaneously, we should support it. If this jira has committed something to the contrary then we could revert the changes and redo them before a release. VertexGroups might be our cheaper answer to multiple edges between the same vertices. So lets not curtail any functionality that exists today by design or accident :) VertexGroup name should be unqiue - Key: TEZ-2221 URL: https://issues.apache.org/jira/browse/TEZ-2221 Project: Apache Tez Issue Type: Bug Reporter: Jeff Zhang Assignee: Jeff Zhang Fix For: 0.7.0, 0.5.4, 0.6.1 Attachments: TEZ-2221-1.patch, TEZ-2221-2.patch, TEZ-2221-3.patch, TEZ-2221-4.patch VertexGroupCommitStartedEvent VertexGroupCommitFinishedEvent use vertex group name to identify the vertex group commit, the same name of vertex group will conflict. While in the current equals hashCode of VertexGroup, vertex group name and members name are used. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TEZ-2392) Have all readers throw an Exception on incorrect next() usage
[ https://issues.apache.org/jira/browse/TEZ-2392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rajesh Balamohan updated TEZ-2392: -- Attachment: TEZ-2392.3.patch Thanks @sseth, [~hitesh]. - Yes, the condition in valuesIterator is unavoidable. - Added comment in MRInput.getReader() - Missed out minor test case TestUnorderedKVReader.java in earlier patch. Added it in latest patch. Will commit it once pre-commit passes. Have all readers throw an Exception on incorrect next() usage - Key: TEZ-2392 URL: https://issues.apache.org/jira/browse/TEZ-2392 Project: Apache Tez Issue Type: Improvement Reporter: Siddharth Seth Assignee: Rajesh Balamohan Priority: Critical Attachments: TEZ-2392.1.patch, TEZ-2392.2.patch, TEZ-2392.3.patch Follow up from TEZ-2348. Marking as critical since this is a behaviour change, and we should get it in early. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-2221) VertexGroup name should be unqiue
[ https://issues.apache.org/jira/browse/TEZ-2221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14527872#comment-14527872 ] Bikas Saha commented on TEZ-2221: - If VertexGroup(A, v1) and VertexGroup(B, v1) and connecting both to v2 allows for multiple edges between v1 and v2 then we should allow 2. Thats the simplest solution to the multiple edges issues. But this needs to be verified. VertexGroup name should be unqiue - Key: TEZ-2221 URL: https://issues.apache.org/jira/browse/TEZ-2221 Project: Apache Tez Issue Type: Bug Reporter: Jeff Zhang Assignee: Jeff Zhang Fix For: 0.7.0, 0.5.4, 0.6.1 Attachments: TEZ-2221-1.patch, TEZ-2221-2.patch, TEZ-2221-3.patch, TEZ-2221-4.patch VertexGroupCommitStartedEvent VertexGroupCommitFinishedEvent use vertex group name to identify the vertex group commit, the same name of vertex group will conflict. While in the current equals hashCode of VertexGroup, vertex group name and members name are used. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TEZ-2410) VertexGroupCommitFinishedEvent is not logged correctly
[ https://issues.apache.org/jira/browse/TEZ-2410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bikas Saha updated TEZ-2410: Priority: Blocker (was: Major) VertexGroupCommitFinishedEvent is not logged correctly -- Key: TEZ-2410 URL: https://issues.apache.org/jira/browse/TEZ-2410 Project: Apache Tez Issue Type: Bug Affects Versions: 0.7.0 Reporter: Jeff Zhang Assignee: Jeff Zhang Priority: Blocker Fix For: 0.7.0 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-2366) Pig tez MiniTezCluster unit tests fail intermittently after TEZ-2333
[ https://issues.apache.org/jira/browse/TEZ-2366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14527881#comment-14527881 ] Hitesh Shah commented on TEZ-2366: -- [~pramachandran] Can you confirm that this path is not invoked in local mode? Pig tez MiniTezCluster unit tests fail intermittently after TEZ-2333 Key: TEZ-2366 URL: https://issues.apache.org/jira/browse/TEZ-2366 Project: Apache Tez Issue Type: Bug Reporter: Daniel Dai Assignee: Prakash Ramachandran Priority: Critical Attachments: TEZ-2366.1.patch, TEZ-2366.test.txt, TEZ-2366.wip.1.patch There are around 20 unit tests (out of around 2000) fail intermittently after TEZ-2333. Here is a stack: {code} org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find output/attempt_1429899954360_0001_1_01_00_1_10003/file.out.index in any of the configured local directories at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathToRead(LocalDirAllocator.java:449) at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathToRead(LocalDirAllocator.java:164) at org.apache.tez.runtime.library.common.shuffle.Fetcher.getShuffleInputFileName(Fetcher.java:611) at org.apache.tez.runtime.library.common.shuffle.Fetcher.getTezIndexRecord(Fetcher.java:591) at org.apache.tez.runtime.library.common.shuffle.Fetcher.doLocalDiskFetch(Fetcher.java:536) at org.apache.tez.runtime.library.common.shuffle.Fetcher.setupLocalDiskFetch(Fetcher.java:517) at org.apache.tez.runtime.library.common.shuffle.Fetcher.callInternal(Fetcher.java:190) at org.apache.tez.runtime.library.common.shuffle.Fetcher.callInternal(Fetcher.java:72) at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) {code} To reproduce that in Pig test, using the following commands: svn co http://svn.apache.org/repos/asf/pig/trunk ant -Dhadoopversion=23 -Dtest.exec.type=tez -Dtestcase=TestTezAutoParallelism test Note in Pig codebase, we already set TEZ_RUNTIME_OPTIMIZE_LOCAL_FETCH to true (http://svn.apache.org/viewvc/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/tez/TezLauncher.java?view=markup). I tried changing TEZ_RUNTIME_OPTIMIZE_LOCAL_FETCH to false in Pig and does not help. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (TEZ-2366) Pig tez MiniTezCluster unit tests fail intermittently after TEZ-2333
[ https://issues.apache.org/jira/browse/TEZ-2366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14527881#comment-14527881 ] Hitesh Shah edited comment on TEZ-2366 at 5/5/15 4:26 AM: -- [~pramachandran] Can you confirm that this path is not invoked in local mode? The shuffle meta data will not be present in local mode. In any case, maybe to be safe, it might be better to write more defensive code for retrieving the shuffle port and if shuffle port is not available, then disable local fetch. was (Author: hitesh): [~pramachandran] Can you confirm that this path is not invoked in local mode? Pig tez MiniTezCluster unit tests fail intermittently after TEZ-2333 Key: TEZ-2366 URL: https://issues.apache.org/jira/browse/TEZ-2366 Project: Apache Tez Issue Type: Bug Reporter: Daniel Dai Assignee: Prakash Ramachandran Priority: Critical Attachments: TEZ-2366.1.patch, TEZ-2366.test.txt, TEZ-2366.wip.1.patch There are around 20 unit tests (out of around 2000) fail intermittently after TEZ-2333. Here is a stack: {code} org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find output/attempt_1429899954360_0001_1_01_00_1_10003/file.out.index in any of the configured local directories at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathToRead(LocalDirAllocator.java:449) at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathToRead(LocalDirAllocator.java:164) at org.apache.tez.runtime.library.common.shuffle.Fetcher.getShuffleInputFileName(Fetcher.java:611) at org.apache.tez.runtime.library.common.shuffle.Fetcher.getTezIndexRecord(Fetcher.java:591) at org.apache.tez.runtime.library.common.shuffle.Fetcher.doLocalDiskFetch(Fetcher.java:536) at org.apache.tez.runtime.library.common.shuffle.Fetcher.setupLocalDiskFetch(Fetcher.java:517) at org.apache.tez.runtime.library.common.shuffle.Fetcher.callInternal(Fetcher.java:190) at org.apache.tez.runtime.library.common.shuffle.Fetcher.callInternal(Fetcher.java:72) at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) {code} To reproduce that in Pig test, using the following commands: svn co http://svn.apache.org/repos/asf/pig/trunk ant -Dhadoopversion=23 -Dtest.exec.type=tez -Dtestcase=TestTezAutoParallelism test Note in Pig codebase, we already set TEZ_RUNTIME_OPTIMIZE_LOCAL_FETCH to true (http://svn.apache.org/viewvc/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/tez/TezLauncher.java?view=markup). I tried changing TEZ_RUNTIME_OPTIMIZE_LOCAL_FETCH to false in Pig and does not help. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TEZ-2411) Offload DataMovement event creation from the AM to the tasks
[ https://issues.apache.org/jira/browse/TEZ-2411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bikas Saha updated TEZ-2411: Description: Today the AM creates a new DataMovement event from the original event sent by the producer task and supplements the new event with source/target indices for the consumer task. This new event creation can be offloaded to the task runtime and thus save CPU cycles on the AM for the object creation. Secondly, the original event can be kept in serialized form inside the AM and sent as is to the task over the RPC, thus potentially saving serde CPU for these events in addition to the object creation CPU. This can help when there is a high concurrency of running tasks in a job. Say 1 tasks running in parallel and sending events to the AM. (was: Today the AM creates a new DataMovement event from the original event sent by the producer task and supplements the new event with source/target indices for the consumer task. This new event creation can be offloaded to the task runtime and thus save CPU cycles on the AM for the object creation. Secondly, the original event can be kept in serialized form inside the AM and sent as is to the task over the RPC, thus potentially saving serde CPU for these events in addition to the object creation CPU.) Offload DataMovement event creation from the AM to the tasks Key: TEZ-2411 URL: https://issues.apache.org/jira/browse/TEZ-2411 Project: Apache Tez Issue Type: Task Reporter: Bikas Saha Assignee: Bikas Saha Today the AM creates a new DataMovement event from the original event sent by the producer task and supplements the new event with source/target indices for the consumer task. This new event creation can be offloaded to the task runtime and thus save CPU cycles on the AM for the object creation. Secondly, the original event can be kept in serialized form inside the AM and sent as is to the task over the RPC, thus potentially saving serde CPU for these events in addition to the object creation CPU. This can help when there is a high concurrency of running tasks in a job. Say 1 tasks running in parallel and sending events to the AM. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-2369) Add a few unit tests for RootInputInitializerManager
[ https://issues.apache.org/jira/browse/TEZ-2369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14527884#comment-14527884 ] Hitesh Shah commented on TEZ-2369: -- The patch does not include the Integer successfulAttempt = vertexSuccessfulAttemptMap.get(taskId.getId()); change. Maybe that should be added back as it seems a safe enough change that can be backported to older branches. +1 for the unit test change. Add a few unit tests for RootInputInitializerManager Key: TEZ-2369 URL: https://issues.apache.org/jira/browse/TEZ-2369 Project: Apache Tez Issue Type: Bug Reporter: Siddharth Seth Assignee: Siddharth Seth Attachments: TEZ-2369.1.txt, TEZ-2369.2.txt {code} - Integer successfulAttempt = vertexSuccessfulAttemptMap.get(taskId); + Integer successfulAttempt = vertexSuccessfulAttemptMap.get(taskId.getId()); {code} This could cause events to be sent multiple times. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Success: TEZ-2392 PreCommit Build #620
Jira: https://issues.apache.org/jira/browse/TEZ-2392 Build: https://builds.apache.org/job/PreCommit-TEZ-Build/620/ ### ## LAST 60 LINES OF THE CONSOLE ### [...truncated 2801 lines...] [INFO] Final Memory: 71M/958M [INFO] {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12730353/TEZ-2392.3.patch against master revision 210619a. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 5 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/620//testReport/ Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/620//console This message is automatically generated. == == Adding comment to Jira. == == Comment added. 859aba3b0f982f71e2bd7f5ab9fdaaa7af6f2484 logged out == == Finished build. == == Archiving artifacts Sending artifact delta relative to PreCommit-TEZ-Build #614 Archived 44 artifacts Archive block size is 32768 Received 4 blocks and 2630228 bytes Compression is 4.7% Took 0.62 sec Description set: TEZ-2392 Recording test results Email was triggered for: Success Sending email for trigger: Success ### ## FAILED TESTS (if any) ## All tests passed
[jira] [Commented] (TEZ-2392) Have all readers throw an Exception on incorrect next() usage
[ https://issues.apache.org/jira/browse/TEZ-2392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14527935#comment-14527935 ] TezQA commented on TEZ-2392: {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12730353/TEZ-2392.3.patch against master revision 210619a. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 5 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/620//testReport/ Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/620//console This message is automatically generated. Have all readers throw an Exception on incorrect next() usage - Key: TEZ-2392 URL: https://issues.apache.org/jira/browse/TEZ-2392 Project: Apache Tez Issue Type: Improvement Reporter: Siddharth Seth Assignee: Rajesh Balamohan Priority: Critical Attachments: TEZ-2392.1.patch, TEZ-2392.2.patch, TEZ-2392.3.patch Follow up from TEZ-2348. Marking as critical since this is a behaviour change, and we should get it in early. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-2076) Tez framework to extract/analyze data stored in ATS for specific dag
[ https://issues.apache.org/jira/browse/TEZ-2076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14527932#comment-14527932 ] Hitesh Shah commented on TEZ-2076: -- bq. the zip doesn't need versioning, because it is an ATS dump of all known Tez keys. [~gopalv] Thanks for the clarification. Missed the bit about the zip entry. And agreed if the ats entity json is being written as is, it would effectively be versioned based on teh version of the ATS api ( and the data within it versioned by the generation code itself ) Tez framework to extract/analyze data stored in ATS for specific dag Key: TEZ-2076 URL: https://issues.apache.org/jira/browse/TEZ-2076 Project: Apache Tez Issue Type: Improvement Reporter: Rajesh Balamohan Assignee: Rajesh Balamohan Attachments: TEZ-2076.1.patch, TEZ-2076.10.patch, TEZ-2076.2.patch, TEZ-2076.3.patch, TEZ-2076.4.patch, TEZ-2076.5.patch, TEZ-2076.6.patch, TEZ-2076.7.patch, TEZ-2076.8.patch, TEZ-2076.9.patch, TEZ-2076.WIP.2.patch, TEZ-2076.WIP.3.patch, TEZ-2076.WIP.patch - Users should be able to download ATS data pertaining to a DAG from Tez-UI (more like a zip file containing DAG/Vertex/Task/TaskAttempt info). - This can be plugged to an analyzer which parses the data, adds semantics and provides an in-memory representation for further analysis. - This will enable to write different analyzer rules, which can be run on top of this in-memory representation to come up with analysis on the DAG. - Results of this analyzer rules can be rendered on to UI (standalone webapp) later point in time. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-2221) VertexGroup name should be unqiue
[ https://issues.apache.org/jira/browse/TEZ-2221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14527937#comment-14527937 ] Jeff Zhang commented on TEZ-2221: - {code} If VertexGroup(A, v1) and VertexGroup(B, v1) and connecting both to v2 allows for multiple edges between v1 and v2 then we should allow 2. {code} This looks more like hack or workaround for multiple edges. If we need to support multiple edges, may need to create more elegant API. VertexGroup name should be unqiue - Key: TEZ-2221 URL: https://issues.apache.org/jira/browse/TEZ-2221 Project: Apache Tez Issue Type: Bug Reporter: Jeff Zhang Assignee: Jeff Zhang Fix For: 0.7.0, 0.5.4, 0.6.1 Attachments: TEZ-2221-1.patch, TEZ-2221-2.patch, TEZ-2221-3.patch, TEZ-2221-4.patch VertexGroupCommitStartedEvent VertexGroupCommitFinishedEvent use vertex group name to identify the vertex group commit, the same name of vertex group will conflict. While in the current equals hashCode of VertexGroup, vertex group name and members name are used. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-776) Reduce AM mem usage caused by storing TezEvents
[ https://issues.apache.org/jira/browse/TEZ-776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14527957#comment-14527957 ] TezQA commented on TEZ-776: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12730357/TEZ-776.12.patch against master revision 210619a. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 8 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:red}-1 findbugs{color}. The patch appears to introduce 1 new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/621//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-TEZ-Build/621//artifact/patchprocess/newPatchFindbugsWarningstez-dag.html Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/621//console This message is automatically generated. Reduce AM mem usage caused by storing TezEvents --- Key: TEZ-776 URL: https://issues.apache.org/jira/browse/TEZ-776 Project: Apache Tez Issue Type: Sub-task Reporter: Siddharth Seth Assignee: Bikas Saha Attachments: TEZ-776.1.patch, TEZ-776.10.patch, TEZ-776.11.patch, TEZ-776.12.patch, TEZ-776.2.patch, TEZ-776.3.patch, TEZ-776.4.patch, TEZ-776.5.patch, TEZ-776.6.A.patch, TEZ-776.6.B.patch, TEZ-776.7.patch, TEZ-776.8.patch, TEZ-776.9.patch, TEZ-776.ondemand.1.patch, TEZ-776.ondemand.2.patch, TEZ-776.ondemand.3.patch, TEZ-776.ondemand.4.patch, TEZ-776.ondemand.5.patch, TEZ-776.ondemand.6.patch, TEZ-776.ondemand.7.patch, TEZ-776.ondemand.patch, With_Patch_AM_hotspots.png, With_Patch_AM_profile.png, Without_patch_AM_CPU_Usage.png, events-problem-solutions.txt, with_patch_jmc_output_of_AM.png, without_patch_jmc_output_of_AM.png This is open ended at the moment. A fair chunk of the AM heap is taken up by TezEvents (specifically DataMovementEvents - 64 bytes per event). Depending on the connection pattern - this puts limits on the number of tasks that can be processed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Failed: TEZ-776 PreCommit Build #621
Jira: https://issues.apache.org/jira/browse/TEZ-776 Build: https://builds.apache.org/job/PreCommit-TEZ-Build/621/ ### ## LAST 60 LINES OF THE CONSOLE ### [...truncated 2808 lines...] {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12730357/TEZ-776.12.patch against master revision 210619a. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 8 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:red}-1 findbugs{color}. The patch appears to introduce 1 new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/621//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-TEZ-Build/621//artifact/patchprocess/newPatchFindbugsWarningstez-dag.html Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/621//console This message is automatically generated. == == Adding comment to Jira. == == Comment added. 982fcba0ffad0431e426d9b5ef984d841278279b logged out == == Finished build. == == Build step 'Execute shell' marked build as failure Archiving artifacts Sending artifact delta relative to PreCommit-TEZ-Build #620 Archived 44 artifacts Archive block size is 32768 Received 4 blocks and 2636731 bytes Compression is 4.7% Took 1.5 sec [description-setter] Could not determine description. Recording test results Email was triggered for: Failure Sending email for trigger: Failure ### ## FAILED TESTS (if any) ## All tests passed
[jira] [Commented] (TEZ-2393) Tez pickup PATH env from gateway machine
[ https://issues.apache.org/jira/browse/TEZ-2393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14527296#comment-14527296 ] Jason Lowe commented on TEZ-2393: - I think the main problems will be from anyone who expected the old behavior. For example, -Dsome.mapred.or.tez.property='$bar' today expands to what the client has for bar rather than what the container does. Today if one wants to explicitly have the variable expanded by the container launch process then they can use this syntax instead: {noformat} -Dsome.mapred.or.tez.property='{{bar}}' {noformat} I agree the existing behavior seems like a bug, but I don't know how many users are relying on the current behavior. Note that org.apache.hadoop.yarn.util.Apps.setEnvFromInputString in YARN has the same issues, and that's the one currently used by MapReduce. Tez pickup PATH env from gateway machine Key: TEZ-2393 URL: https://issues.apache.org/jira/browse/TEZ-2393 Project: Apache Tez Issue Type: Bug Reporter: Daniel Dai Assignee: Hitesh Shah Attachments: TEZ-2393.1.patch I found this issue on Windows. When I do: set PATH=C:\dummy;%PATH% Then run a tez job. C:\dummy appears in PATH of the vertex container. This is surprising since we don't expect frontend PATH will propagate to backend. [~hitesh] tried it on Linux and found the same behavior. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-2408) TestTaskAttempt fails to compile against hadoop-2.4 and hadoop-2.2
[ https://issues.apache.org/jira/browse/TEZ-2408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14527324#comment-14527324 ] Bikas Saha commented on TEZ-2408: - lgtm. I remember fixing these (perhaps was TestTaskImpl) TestTaskAttempt fails to compile against hadoop-2.4 and hadoop-2.2 --- Key: TEZ-2408 URL: https://issues.apache.org/jira/browse/TEZ-2408 Project: Apache Tez Issue Type: Bug Reporter: Hitesh Shah Assignee: Hitesh Shah Priority: Minor Attachments: TEZ-2408.1.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-2221) VertexGroup name should be unqiue
[ https://issues.apache.org/jira/browse/TEZ-2221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14527425#comment-14527425 ] Rohini Palaniswamy commented on TEZ-2221: - bq. what happens if someone does the following. This should also be disallowed. Correct? {code} dag.createVertexGroup(group_1, v1,v2); dag.createVertexGroup(group_2, v1,v2); {code} [~daijy] pointed out this breaks a lot of Pig scripts on Tez with UnionOptimizer as we have multiple outputs from each vertex and we create a vertex group for each of those output now. For eg: union followed by order by. There will be one sample output and one partitioner output from the union vertex going to two different downstream vertices. With the UnionOptimizer, the union is removed and two vertex groups are created. If this is disallowed we will have to reuse the same Vertex group to route multiple outputs. GroupInputEdge.create(VertexGroup inputVertexGroup, Vertex outputVertex, EdgeProperty edgeProperty, InputDescriptor mergedInput) API seem to allow that. Will doing that work and that is how you want us to construct the plan? Consider another case of union followed by replicate join with two tables followed by order by. The plan will consist of 8 vertices - V1 (Load) + V2 (Load) + V3 (union) + V4a (Replicate join T1 load) + V4b (Replicate join T2 load) + V5 (partitioner) + V6 (sampler) + V7 (order by) with V1,V2-V3, V4a-V3, V4b-V3, V4-V5, V4-V6, V6-V5, V5-V7. Optimized plan will become V4a - (V1,V2 vertex group) , V4b - (V1,V2 vertex group) , (V1,V2 vertex group) - V5, (V1,V2 vertex group) - V6, V6-V5, V5-V7. So using one vertex group for routing multiple outputs and multiple inputs is how we are expected to construct the plan? VertexGroup name should be unqiue - Key: TEZ-2221 URL: https://issues.apache.org/jira/browse/TEZ-2221 Project: Apache Tez Issue Type: Bug Reporter: Jeff Zhang Assignee: Jeff Zhang Fix For: 0.7.0, 0.5.4, 0.6.1 Attachments: TEZ-2221-1.patch, TEZ-2221-2.patch, TEZ-2221-3.patch, TEZ-2221-4.patch VertexGroupCommitStartedEvent VertexGroupCommitFinishedEvent use vertex group name to identify the vertex group commit, the same name of vertex group will conflict. While in the current equals hashCode of VertexGroup, vertex group name and members name are used. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (TEZ-2221) VertexGroup name should be unqiue
[ https://issues.apache.org/jira/browse/TEZ-2221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14527425#comment-14527425 ] Rohini Palaniswamy edited comment on TEZ-2221 at 5/4/15 10:12 PM: -- bq. what happens if someone does the following. This should also be disallowed. Correct? {code} dag.createVertexGroup(group_1, v1,v2); dag.createVertexGroup(group_2, v1,v2); {code} [~daijy] pointed out this breaks a lot of Pig scripts on Tez with UnionOptimizer as we have multiple outputs from each vertex and we create a vertex group for each of those output now. For eg: union followed by order by. There will be one sample output and one partitioner output from the union vertex going to two different downstream vertices. With the UnionOptimizer, the union is removed and two vertex groups are created. If this is disallowed we will have to reuse the same Vertex group to route multiple outputs. GroupInputEdge.create(VertexGroup inputVertexGroup, Vertex outputVertex, EdgeProperty edgeProperty, InputDescriptor mergedInput) API seem to allow that. Will doing that work and that is how you want us to construct the plan? Consider another case of union followed by replicate join with two tables followed by order by. The plan will consist of 8 vertices - V1 (Load) + V2 (Load) + V3 (union) + V4 (Replicate join T1 load) + V5 (Replicate join T2 load) + V6 (partitioner) + V7 (sampler) + V8 (order by) with V1,V2-V3, V4-V3, V5-V3, V3-V6, V3-V7, V7-V6, V6-V8. Optimized plan will become V4-(V1,V2 vertex group) , V5-(V1,V2 vertex group) , (V1,V2 vertex group) - V6, (V1,V2 vertex group) - V7, V7-V6, V6-V8. So using one vertex group for routing multiple outputs and multiple inputs is how we are expected to construct the plan? was (Author: rohini): bq. what happens if someone does the following. This should also be disallowed. Correct? {code} dag.createVertexGroup(group_1, v1,v2); dag.createVertexGroup(group_2, v1,v2); {code} [~daijy] pointed out this breaks a lot of Pig scripts on Tez with UnionOptimizer as we have multiple outputs from each vertex and we create a vertex group for each of those output now. For eg: union followed by order by. There will be one sample output and one partitioner output from the union vertex going to two different downstream vertices. With the UnionOptimizer, the union is removed and two vertex groups are created. If this is disallowed we will have to reuse the same Vertex group to route multiple outputs. GroupInputEdge.create(VertexGroup inputVertexGroup, Vertex outputVertex, EdgeProperty edgeProperty, InputDescriptor mergedInput) API seem to allow that. Will doing that work and that is how you want us to construct the plan? Consider another case of union followed by replicate join with two tables followed by order by. The plan will consist of 8 vertices - V1 (Load) + V2 (Load) + V3 (union) + V4a (Replicate join T1 load) + V4b (Replicate join T2 load) + V5 (partitioner) + V6 (sampler) + V7 (order by) with V1,V2-V3, V4a-V3, V4b-V3, V4-V5, V4-V6, V6-V5, V5-V7. Optimized plan will become V4a - (V1,V2 vertex group) , V4b - (V1,V2 vertex group) , (V1,V2 vertex group) - V5, (V1,V2 vertex group) - V6, V6-V5, V5-V7. So using one vertex group for routing multiple outputs and multiple inputs is how we are expected to construct the plan? VertexGroup name should be unqiue - Key: TEZ-2221 URL: https://issues.apache.org/jira/browse/TEZ-2221 Project: Apache Tez Issue Type: Bug Reporter: Jeff Zhang Assignee: Jeff Zhang Fix For: 0.7.0, 0.5.4, 0.6.1 Attachments: TEZ-2221-1.patch, TEZ-2221-2.patch, TEZ-2221-3.patch, TEZ-2221-4.patch VertexGroupCommitStartedEvent VertexGroupCommitFinishedEvent use vertex group name to identify the vertex group commit, the same name of vertex group will conflict. While in the current equals hashCode of VertexGroup, vertex group name and members name are used. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TEZ-776) Reduce AM mem usage caused by storing TezEvents
[ https://issues.apache.org/jira/browse/TEZ-776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bikas Saha updated TEZ-776: --- Attachment: TEZ-776.11.patch Uploading new patch that creates a new abstract class for on-demand routing APIs, leaving the legacy plugin API unchanged. Opened TEZ-2409 to make changes for supporting different plugins on the same vertex. Hopefully this addresses any remaining concerns. Reduce AM mem usage caused by storing TezEvents --- Key: TEZ-776 URL: https://issues.apache.org/jira/browse/TEZ-776 Project: Apache Tez Issue Type: Sub-task Reporter: Siddharth Seth Assignee: Bikas Saha Attachments: TEZ-776.1.patch, TEZ-776.10.patch, TEZ-776.11.patch, TEZ-776.2.patch, TEZ-776.3.patch, TEZ-776.4.patch, TEZ-776.5.patch, TEZ-776.6.A.patch, TEZ-776.6.B.patch, TEZ-776.7.patch, TEZ-776.8.patch, TEZ-776.9.patch, TEZ-776.ondemand.1.patch, TEZ-776.ondemand.2.patch, TEZ-776.ondemand.3.patch, TEZ-776.ondemand.4.patch, TEZ-776.ondemand.5.patch, TEZ-776.ondemand.6.patch, TEZ-776.ondemand.7.patch, TEZ-776.ondemand.patch, With_Patch_AM_hotspots.png, With_Patch_AM_profile.png, Without_patch_AM_CPU_Usage.png, events-problem-solutions.txt, with_patch_jmc_output_of_AM.png, without_patch_jmc_output_of_AM.png This is open ended at the moment. A fair chunk of the AM heap is taken up by TezEvents (specifically DataMovementEvents - 64 bytes per event). Depending on the connection pattern - this puts limits on the number of tasks that can be processed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-2221) VertexGroup name should be unqiue
[ https://issues.apache.org/jira/browse/TEZ-2221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14527448#comment-14527448 ] Hitesh Shah commented on TEZ-2221: -- [~rohini] Yes - I believe both are being dis-allowed i.e. {code} dag.createVertexGroup(group_1, v1,v2); dag.createVertexGroup(group_2, v1,v2); {code} and {code} dag.createVertexGroup(group_1, v1,v2); dag.createVertexGroup(group_1, v2,v3); {code} VertexGroup name should be unqiue - Key: TEZ-2221 URL: https://issues.apache.org/jira/browse/TEZ-2221 Project: Apache Tez Issue Type: Bug Reporter: Jeff Zhang Assignee: Jeff Zhang Fix For: 0.7.0, 0.5.4, 0.6.1 Attachments: TEZ-2221-1.patch, TEZ-2221-2.patch, TEZ-2221-3.patch, TEZ-2221-4.patch VertexGroupCommitStartedEvent VertexGroupCommitFinishedEvent use vertex group name to identify the vertex group commit, the same name of vertex group will conflict. While in the current equals hashCode of VertexGroup, vertex group name and members name are used. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-2221) VertexGroup name should be unqiue
[ https://issues.apache.org/jira/browse/TEZ-2221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14527463#comment-14527463 ] Hitesh Shah commented on TEZ-2221: -- \cc [~bikassaha] in case he has any input. VertexGroup name should be unqiue - Key: TEZ-2221 URL: https://issues.apache.org/jira/browse/TEZ-2221 Project: Apache Tez Issue Type: Bug Reporter: Jeff Zhang Assignee: Jeff Zhang Fix For: 0.7.0, 0.5.4, 0.6.1 Attachments: TEZ-2221-1.patch, TEZ-2221-2.patch, TEZ-2221-3.patch, TEZ-2221-4.patch VertexGroupCommitStartedEvent VertexGroupCommitFinishedEvent use vertex group name to identify the vertex group commit, the same name of vertex group will conflict. While in the current equals hashCode of VertexGroup, vertex group name and members name are used. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-2237) Complex DAG freezes and fails (was BufferTooSmallException raised in UnorderedPartitionedKVWriter then DAG lingers)
[ https://issues.apache.org/jira/browse/TEZ-2237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14527511#comment-14527511 ] Rajesh Balamohan commented on TEZ-2237: --- lgtm. +1. Might need to fix log statement Setting all {} partitions as empty for non-started output: in TEZ-2237.2.branch6.txt before committing. Complex DAG freezes and fails (was BufferTooSmallException raised in UnorderedPartitionedKVWriter then DAG lingers) --- Key: TEZ-2237 URL: https://issues.apache.org/jira/browse/TEZ-2237 Project: Apache Tez Issue Type: Bug Affects Versions: 0.6.0 Environment: Debian Linux jessie OpenJDK Runtime Environment (build 1.8.0_40-internal-b27) OpenJDK 64-Bit Server VM (build 25.40-b25, mixed mode) 7 * Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz, 16/24 GB RAM per node, 1*system disk + 4*1 or 2 TiB HDD for HDFS local (on-prem, dedicated hardware) Scalding 0.13.1 modified with https://github.com/twitter/scalding/pull/1220 to run Cascading 3.0.0-wip-90 with TEZ 0.6.0 Reporter: Cyrille Chépélov Assignee: Siddharth Seth Priority: Critical Attachments: TEZ-2237-hack.branch6.txt, TEZ-2237-hack.master.txt, TEZ-2237.1.master.txt, TEZ-2237.2.branch6.txt, TEZ-2237.2.master.txt, TEZ-2237.test.2_branch0.6.txt, all_stacks.lst, alloc_mem.png, alloc_vcores.png, application_142732418_1444.yarn-logs.red.txt.gz, application_142732418_1908.red.txt.bz2, application_1427964335235_2070.txt.red.txt.bz2, appmastersyslog_dag_1427282048097_0215_1.red.txt.gz, appmastersyslog_dag_1427282048097_0237_1.red.txt.gz, gc_count_MRAppMaster.png, mem_free.png, noopexample_2237.txt, oneOutOfTwoOutputsStarted.txt, ordered-grouped-kv-input-traces.diff, output-starts.txt, start_containers.png, stop_containers.png, syslog_attempt_1427282048097_0215_1_21_14_0.red.txt.gz, syslog_attempt_1427282048097_0237_1_70_28_0.red.txt.gz, yarn_rm_flips.png On a specific DAG with many vertices (actually part of a larger meta-DAG), after about a hour of processing, several BufferTooSmallException are raised in UnorderedPartitionedKVWriter (about one every two or three spills). Once these exceptions are raised, the DAG remains indefinitely active, tying up memory and CPU resources as far as YARN is concerned, while little if any actual processing takes place. It seems two separate issues are at hand: 1. BufferTooSmallException are raised even though, small as the actually allocated buffers seem to be (around a couple megabytes were allotted whereas 100MiB were requested), the actual keys and values are never bigger than 24 and 1024 bytes respectively. 2. In the event BufferTooSmallExceptions are raised, the DAG fails to stop (stop requests appear to be sent 7 hours after the BTSE exceptions are raised, but 9 hours after these stop requests, the DAG was still lingering on with all containers present tying up memory and CPU allocations) The emergence of the BTSE prevent the Cascade to complete, preventing from validating the results compared to traditional MR1-based results. The lack of conclusion renders the cluster queue unavailable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-2392) Have all readers throw an Exception on incorrect next() usage
[ https://issues.apache.org/jira/browse/TEZ-2392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14527538#comment-14527538 ] Siddharth Seth commented on TEZ-2392: - Patch looks good to me - ValuesIterator has the check in the main path, but I'm not sure that can be avoided. +1 In case of the MRReaders, the check being after recordReader.next leaves this open to an exception from user code. Should just document this (in MRInput.getReader()) - An exception will be thrown if next() is invoked after false, either from the framework or from the underlying InputFormat. Have all readers throw an Exception on incorrect next() usage - Key: TEZ-2392 URL: https://issues.apache.org/jira/browse/TEZ-2392 Project: Apache Tez Issue Type: Improvement Reporter: Siddharth Seth Assignee: Rajesh Balamohan Priority: Critical Attachments: TEZ-2392.1.patch, TEZ-2392.2.patch Follow up from TEZ-2348. Marking as critical since this is a behaviour change, and we should get it in early. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-2221) VertexGroup name should be unqiue
[ https://issues.apache.org/jira/browse/TEZ-2221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14527549#comment-14527549 ] Bikas Saha commented on TEZ-2221: - Disallowing this should be ok and sounds related to the jira since the output committer is identified by the vertex group name. {code} dag.createVertexGroup(group_1, v1,v2); dag.createVertexGroup(group_1, v2,v3); {code} Would like to understand why this is being disallowed? From what I see this would work for the async commit logic, since each async commit per output per vertex in the group. So separating by group name should be ok. {code} dag.createVertexGroup(group_1, v1,v2); dag.createVertexGroup(group_2, v1,v2); {code} Is there any use case that can be supported here but not by combining them? VertexGroup name should be unqiue - Key: TEZ-2221 URL: https://issues.apache.org/jira/browse/TEZ-2221 Project: Apache Tez Issue Type: Bug Reporter: Jeff Zhang Assignee: Jeff Zhang Fix For: 0.7.0, 0.5.4, 0.6.1 Attachments: TEZ-2221-1.patch, TEZ-2221-2.patch, TEZ-2221-3.patch, TEZ-2221-4.patch VertexGroupCommitStartedEvent VertexGroupCommitFinishedEvent use vertex group name to identify the vertex group commit, the same name of vertex group will conflict. While in the current equals hashCode of VertexGroup, vertex group name and members name are used. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TEZ-2237) Valid events should be sent out when an Output is not started
[ https://issues.apache.org/jira/browse/TEZ-2237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth updated TEZ-2237: Summary: Valid events should be sent out when an Output is not started (was: Complex DAG freezes and fails (was BufferTooSmallException raised in UnorderedPartitionedKVWriter then DAG lingers)) Valid events should be sent out when an Output is not started - Key: TEZ-2237 URL: https://issues.apache.org/jira/browse/TEZ-2237 Project: Apache Tez Issue Type: Bug Affects Versions: 0.6.0 Environment: Debian Linux jessie OpenJDK Runtime Environment (build 1.8.0_40-internal-b27) OpenJDK 64-Bit Server VM (build 25.40-b25, mixed mode) 7 * Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz, 16/24 GB RAM per node, 1*system disk + 4*1 or 2 TiB HDD for HDFS local (on-prem, dedicated hardware) Scalding 0.13.1 modified with https://github.com/twitter/scalding/pull/1220 to run Cascading 3.0.0-wip-90 with TEZ 0.6.0 Reporter: Cyrille Chépélov Assignee: Siddharth Seth Priority: Critical Attachments: TEZ-2237-hack.branch6.txt, TEZ-2237-hack.master.txt, TEZ-2237.1.master.txt, TEZ-2237.2.branch6.txt, TEZ-2237.2.master.txt, TEZ-2237.3.branch6.txt, TEZ-2237.test.2_branch0.6.txt, all_stacks.lst, alloc_mem.png, alloc_vcores.png, application_142732418_1444.yarn-logs.red.txt.gz, application_142732418_1908.red.txt.bz2, application_1427964335235_2070.txt.red.txt.bz2, appmastersyslog_dag_1427282048097_0215_1.red.txt.gz, appmastersyslog_dag_1427282048097_0237_1.red.txt.gz, gc_count_MRAppMaster.png, mem_free.png, noopexample_2237.txt, oneOutOfTwoOutputsStarted.txt, ordered-grouped-kv-input-traces.diff, output-starts.txt, start_containers.png, stop_containers.png, syslog_attempt_1427282048097_0215_1_21_14_0.red.txt.gz, syslog_attempt_1427282048097_0237_1_70_28_0.red.txt.gz, yarn_rm_flips.png On a specific DAG with many vertices (actually part of a larger meta-DAG), after about a hour of processing, several BufferTooSmallException are raised in UnorderedPartitionedKVWriter (about one every two or three spills). Once these exceptions are raised, the DAG remains indefinitely active, tying up memory and CPU resources as far as YARN is concerned, while little if any actual processing takes place. It seems two separate issues are at hand: 1. BufferTooSmallException are raised even though, small as the actually allocated buffers seem to be (around a couple megabytes were allotted whereas 100MiB were requested), the actual keys and values are never bigger than 24 and 1024 bytes respectively. 2. In the event BufferTooSmallExceptions are raised, the DAG fails to stop (stop requests appear to be sent 7 hours after the BTSE exceptions are raised, but 9 hours after these stop requests, the DAG was still lingering on with all containers present tying up memory and CPU allocations) The emergence of the BTSE prevent the Cascade to complete, preventing from validating the results compared to traditional MR1-based results. The lack of conclusion renders the cluster queue unavailable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TEZ-2408) TestTaskAttempt fails to compile against hadoop-2.4 and hadoop-2.2
[ https://issues.apache.org/jira/browse/TEZ-2408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Shah updated TEZ-2408: - Affects Version/s: (was: 0.7.0) TestTaskAttempt fails to compile against hadoop-2.4 and hadoop-2.2 --- Key: TEZ-2408 URL: https://issues.apache.org/jira/browse/TEZ-2408 Project: Apache Tez Issue Type: Bug Reporter: Hitesh Shah Assignee: Hitesh Shah Priority: Minor Attachments: TEZ-2408.1.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TEZ-2408) TestTaskAttempt fails to compile against hadoop-2.4 and hadoop-2.2
[ https://issues.apache.org/jira/browse/TEZ-2408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Shah updated TEZ-2408: - Affects Version/s: 0.7.0 TestTaskAttempt fails to compile against hadoop-2.4 and hadoop-2.2 --- Key: TEZ-2408 URL: https://issues.apache.org/jira/browse/TEZ-2408 Project: Apache Tez Issue Type: Bug Reporter: Hitesh Shah Assignee: Hitesh Shah Priority: Minor Attachments: TEZ-2408.1.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TEZ-2408) TestTaskAttempt fails to compile against hadoop-2.4 and hadoop-2.2
[ https://issues.apache.org/jira/browse/TEZ-2408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Shah updated TEZ-2408: - Target Version/s: 0.7.0 TestTaskAttempt fails to compile against hadoop-2.4 and hadoop-2.2 --- Key: TEZ-2408 URL: https://issues.apache.org/jira/browse/TEZ-2408 Project: Apache Tez Issue Type: Bug Reporter: Hitesh Shah Assignee: Hitesh Shah Priority: Minor Attachments: TEZ-2408.1.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-2221) VertexGroup name should be unqiue
[ https://issues.apache.org/jira/browse/TEZ-2221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14527479#comment-14527479 ] Rohini Palaniswamy commented on TEZ-2221: - bq. dag.createVertexGroup(group_1, v1,v2); dag.createVertexGroup(group_2, v1,v2); It should be a simple change for us to reuse the vertex group. But since we have never used it that way want to ensure that Tez will be fine if we constructed plans like that. bq. dag.createVertexGroup(group_1, v1,v2); dag.createVertexGroup(group_1, v2,v3); We are not reusing group names anywhere. So that is not an issue for us. VertexGroup name should be unqiue - Key: TEZ-2221 URL: https://issues.apache.org/jira/browse/TEZ-2221 Project: Apache Tez Issue Type: Bug Reporter: Jeff Zhang Assignee: Jeff Zhang Fix For: 0.7.0, 0.5.4, 0.6.1 Attachments: TEZ-2221-1.patch, TEZ-2221-2.patch, TEZ-2221-3.patch, TEZ-2221-4.patch VertexGroupCommitStartedEvent VertexGroupCommitFinishedEvent use vertex group name to identify the vertex group commit, the same name of vertex group will conflict. While in the current equals hashCode of VertexGroup, vertex group name and members name are used. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TEZ-2237) Complex DAG freezes and fails (was BufferTooSmallException raised in UnorderedPartitionedKVWriter then DAG lingers)
[ https://issues.apache.org/jira/browse/TEZ-2237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth updated TEZ-2237: Attachment: TEZ-2237.3.branch6.txt Patch with the log line fixed for branch-0.6. Thanks for the review [~rajesh.balamohan], reporting and helping try out the fix [~cchepelov]. Committing this. Complex DAG freezes and fails (was BufferTooSmallException raised in UnorderedPartitionedKVWriter then DAG lingers) --- Key: TEZ-2237 URL: https://issues.apache.org/jira/browse/TEZ-2237 Project: Apache Tez Issue Type: Bug Affects Versions: 0.6.0 Environment: Debian Linux jessie OpenJDK Runtime Environment (build 1.8.0_40-internal-b27) OpenJDK 64-Bit Server VM (build 25.40-b25, mixed mode) 7 * Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz, 16/24 GB RAM per node, 1*system disk + 4*1 or 2 TiB HDD for HDFS local (on-prem, dedicated hardware) Scalding 0.13.1 modified with https://github.com/twitter/scalding/pull/1220 to run Cascading 3.0.0-wip-90 with TEZ 0.6.0 Reporter: Cyrille Chépélov Assignee: Siddharth Seth Priority: Critical Attachments: TEZ-2237-hack.branch6.txt, TEZ-2237-hack.master.txt, TEZ-2237.1.master.txt, TEZ-2237.2.branch6.txt, TEZ-2237.2.master.txt, TEZ-2237.3.branch6.txt, TEZ-2237.test.2_branch0.6.txt, all_stacks.lst, alloc_mem.png, alloc_vcores.png, application_142732418_1444.yarn-logs.red.txt.gz, application_142732418_1908.red.txt.bz2, application_1427964335235_2070.txt.red.txt.bz2, appmastersyslog_dag_1427282048097_0215_1.red.txt.gz, appmastersyslog_dag_1427282048097_0237_1.red.txt.gz, gc_count_MRAppMaster.png, mem_free.png, noopexample_2237.txt, oneOutOfTwoOutputsStarted.txt, ordered-grouped-kv-input-traces.diff, output-starts.txt, start_containers.png, stop_containers.png, syslog_attempt_1427282048097_0215_1_21_14_0.red.txt.gz, syslog_attempt_1427282048097_0237_1_70_28_0.red.txt.gz, yarn_rm_flips.png On a specific DAG with many vertices (actually part of a larger meta-DAG), after about a hour of processing, several BufferTooSmallException are raised in UnorderedPartitionedKVWriter (about one every two or three spills). Once these exceptions are raised, the DAG remains indefinitely active, tying up memory and CPU resources as far as YARN is concerned, while little if any actual processing takes place. It seems two separate issues are at hand: 1. BufferTooSmallException are raised even though, small as the actually allocated buffers seem to be (around a couple megabytes were allotted whereas 100MiB were requested), the actual keys and values are never bigger than 24 and 1024 bytes respectively. 2. In the event BufferTooSmallExceptions are raised, the DAG fails to stop (stop requests appear to be sent 7 hours after the BTSE exceptions are raised, but 9 hours after these stop requests, the DAG was still lingering on with all containers present tying up memory and CPU allocations) The emergence of the BTSE prevent the Cascade to complete, preventing from validating the results compared to traditional MR1-based results. The lack of conclusion renders the cluster queue unavailable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-2198) Fix sorter spill counts
[ https://issues.apache.org/jira/browse/TEZ-2198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14527423#comment-14527423 ] Hitesh Shah commented on TEZ-2198: -- \cc [~gopalv] [~sseth] for review Fix sorter spill counts --- Key: TEZ-2198 URL: https://issues.apache.org/jira/browse/TEZ-2198 Project: Apache Tez Issue Type: Improvement Reporter: Rajesh Balamohan Assignee: Rajesh Balamohan Attachments: TEZ-2198.1.patch, TEZ-2198.2.patch, TEZ-2198.3.patch, TEZ-2198.4.patch, no_additional_spills_eg_pipelined_shuffle.png, with_additional_spills.png Prior to pipelined shuffle, tez merged all spilled data into a single file. This ended up creating one index file and one output file. In this context, TaskCounter.ADDITIONAL_SPILL_COUNT was referred as the number of additional spills and there was no counter needed to track the number of merges. With pipelined shuffle, there is no final merge and ADDITIONAL_SPILL_COUNT would be misleading, as these spills are direct output files which are consumed by the consumers. It would be good to have the following - ADDITIONAL_SPILL_COUNT: represents the spills that are needed by the task to generate the final merged output - TOTAL_SPILLS: represents the total number of shuffle directories (index + output files) that got created at the end of processing. For e.g, Assume sorter generated 5 spills in an attempt Without pipelining: == ADDITIONAL_SPILL_COUNT = 5 -- Additional spills involved in sorting TOTAL_SPILLS = 1 -- Final merged output With pipelining: ADDITIONAL_SPILL_COUNT = 0 -- Additional spills involved in sorting TOTAL_SPILLS = 5 --- all spills are final output -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TEZ-2379) org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: T_ATTEMPT_KILLED at KILLED
[ https://issues.apache.org/jira/browse/TEZ-2379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Shah updated TEZ-2379: - Attachment: TEZ-2379.branch-0.5.patch org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: T_ATTEMPT_KILLED at KILLED -- Key: TEZ-2379 URL: https://issues.apache.org/jira/browse/TEZ-2379 Project: Apache Tez Issue Type: Bug Reporter: Rajesh Balamohan Assignee: Hitesh Shah Priority: Blocker Attachments: TEZ-2379.1.patch, TEZ-2379.2.patch, TEZ-2379.3.patch, TEZ-2379.branch-0.5.patch {noformat} 2015-04-28 04:49:32,455 ERROR [Dispatcher thread: Central] impl.TaskImpl: Can't handle this event at current state for task_1429683757595_0479_1_03_13 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: T_ATTEMPT_KILLED at KILLED at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.tez.state.StateMachineTez.doTransition(StateMachineTez.java:57) at org.apache.tez.dag.app.dag.impl.TaskImpl.handle(TaskImpl.java:853) at org.apache.tez.dag.app.dag.impl.TaskImpl.handle(TaskImpl.java:106) at org.apache.tez.dag.app.DAGAppMaster$TaskEventDispatcher.handle(DAGAppMaster.java:1874) at org.apache.tez.dag.app.DAGAppMaster$TaskEventDispatcher.handle(DAGAppMaster.java:1860) at org.apache.tez.common.AsyncDispatcher.dispatch(AsyncDispatcher.java:182) at org.apache.tez.common.AsyncDispatcher$1.run(AsyncDispatcher.java:113) at java.lang.Thread.run(Thread.java:745) {noformat} Additional notes: Hive - latest build Tez - master tpch-200 gb scale q_17 (kill the job in the middle of execution) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TEZ-2408) TestTaskAttempt fails to compile against hadoop-2.4 and hadoop-2.2
[ https://issues.apache.org/jira/browse/TEZ-2408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Shah updated TEZ-2408: - Attachment: TEZ-2408.1.patch [~bikassaha] [~sseth] review please. TestTaskAttempt fails to compile against hadoop-2.4 and hadoop-2.2 --- Key: TEZ-2408 URL: https://issues.apache.org/jira/browse/TEZ-2408 Project: Apache Tez Issue Type: Bug Reporter: Hitesh Shah Assignee: Hitesh Shah Priority: Minor Attachments: TEZ-2408.1.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (TEZ-2407) Drop references to the old DataInputBuffer impl in PipelinedSorter
[ https://issues.apache.org/jira/browse/TEZ-2407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14527144#comment-14527144 ] Gopal V edited comment on TEZ-2407 at 5/4/15 7:49 PM: -- No, the issue is that {{DataInputBuffer::getLength()}} has bad semantics - it returns capacity instead of length of data. We are always forced to do {{DataInputBuffer::getLength() - DataInputBuffer::getPosition()}} to get the accurate value that's an easy thing to forget. Since {{DataInputBuffer}} comes from hadoop, we can't change the original - however, we can make our code more readable as it is a simple class to replace make getLength() meaningful. was (Author: gopalv): No, the issue is that {{DataInputBuffer::getLength()}} has bad semantics - it returns capacity instead of length of data. Since that comes from hadoop, we can't change the original - however, we can make our code more readable as it is a simple class to replace make getLength() meaningful. Drop references to the old DataInputBuffer impl in PipelinedSorter -- Key: TEZ-2407 URL: https://issues.apache.org/jira/browse/TEZ-2407 Project: Apache Tez Issue Type: Bug Reporter: Rajesh Balamohan -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (TEZ-2408) TestTaskAttempt fails to compile against hadoop-2.4 and hadoop-2.2
Hitesh Shah created TEZ-2408: Summary: TestTaskAttempt fails to compile against hadoop-2.4 and hadoop-2.2 Key: TEZ-2408 URL: https://issues.apache.org/jira/browse/TEZ-2408 Project: Apache Tez Issue Type: Bug Reporter: Hitesh Shah Assignee: Hitesh Shah -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TEZ-2408) TestTaskAttempt fails to compile against hadoop-2.4 and hadoop-2.2
[ https://issues.apache.org/jira/browse/TEZ-2408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Shah updated TEZ-2408: - Priority: Minor (was: Major) TestTaskAttempt fails to compile against hadoop-2.4 and hadoop-2.2 --- Key: TEZ-2408 URL: https://issues.apache.org/jira/browse/TEZ-2408 Project: Apache Tez Issue Type: Bug Reporter: Hitesh Shah Assignee: Hitesh Shah Priority: Minor -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-1529) ATS and TezClient integration in secure kerberos enabled cluster
[ https://issues.apache.org/jira/browse/TEZ-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14527361#comment-14527361 ] Zhijie Shen commented on TEZ-1529: -- The patch looks good to me overall. Two nits: 1. In getJsonRootEntity, you may need to handle UndeclaredThrowableException too. 2. I think we can reuse the http client. It's not necessary one client per request. {code} 540 httpClient = new Client(new URLConnectionClientHandler(new TimelineUrlConnectionFactory()), 541 config); {code} ATS and TezClient integration in secure kerberos enabled cluster - Key: TEZ-1529 URL: https://issues.apache.org/jira/browse/TEZ-1529 Project: Apache Tez Issue Type: Bug Reporter: Prakash Ramachandran Assignee: Prakash Ramachandran Priority: Blocker Attachments: TEZ-1529.1.patch This is a follow up for TEZ-1495 which address ATS - TezClient integration. however it does not enable it in secure kerberos enabled cluster. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (TEZ-2409) Allow different edges to have different routing plugins
Bikas Saha created TEZ-2409: --- Summary: Allow different edges to have different routing plugins Key: TEZ-2409 URL: https://issues.apache.org/jira/browse/TEZ-2409 Project: Apache Tez Issue Type: Task Reporter: Bikas Saha Assignee: Bikas Saha It may be useful to allow different edge manager plugin types based on different requirements. In order to support this, we would need to support different plugins per edge for routing the events on that edge. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-2407) Drop references to the old DataInputBuffer impl in PipelinedSorter
[ https://issues.apache.org/jira/browse/TEZ-2407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14527135#comment-14527135 ] Hitesh Shah commented on TEZ-2407: -- If refactor then 0.8 makes sense. Was not sure if this was related to any memory related cleanup based on the jira title. Drop references to the old DataInputBuffer impl in PipelinedSorter -- Key: TEZ-2407 URL: https://issues.apache.org/jira/browse/TEZ-2407 Project: Apache Tez Issue Type: Bug Reporter: Rajesh Balamohan -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-2407) Drop references to the old DataInputBuffer impl in PipelinedSorter
[ https://issues.apache.org/jira/browse/TEZ-2407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14527144#comment-14527144 ] Gopal V commented on TEZ-2407: -- No, the issue is that {{DataInputBuffer::getLength()}} has bad semantics - it returns capacity instead of length of data. Since that comes from hadoop, we can't change the original - however, we can make our code more readable as it is a simple class to replace make getLength() meaningful. Drop references to the old DataInputBuffer impl in PipelinedSorter -- Key: TEZ-2407 URL: https://issues.apache.org/jira/browse/TEZ-2407 Project: Apache Tez Issue Type: Bug Reporter: Rajesh Balamohan -- This message was sent by Atlassian JIRA (v6.3.4#6332)