[jira] [Commented] (TEZ-3718) Better handling of 'bad' nodes
[ https://issues.apache.org/jira/browse/TEZ-3718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16548343#comment-16548343 ] Zhiyuan Yang commented on TEZ-3718: --- This one has been pending review for long. Review is greatly appreciated. But feel free to drop this from the release. > Better handling of 'bad' nodes > -- > > Key: TEZ-3718 > URL: https://issues.apache.org/jira/browse/TEZ-3718 > Project: Apache Tez > Issue Type: Improvement >Reporter: Siddharth Seth >Assignee: Zhiyuan Yang >Priority: Major > Attachments: TEZ-3718.1.patch, TEZ-3718.2.patch, TEZ-3718.3.patch, > TEZ-3718.4.patch > > > At the moment, the default behaviour in case of a node being marked bad is to > do nothing other than not schedule new tasks on this node. > The alternate, via config, is to retroactively kill every task which ran on > the node, which causes far too many unnecessary re-runs. > Proposing the following changes. > 1. KILL fragments which are currently in the RUNNING state (instead of > relying on a timeout which leads to the attempt being marked as FAILED after > the timeout interval. > 2. Keep track of these failed nodes, and use this as input to the failure > heuristics. Normally source tasks require multiple consumers to report > failure for them to be marked as bad. If a single consumer reports failure > against a source which ran on a bad node, consider it bad and re-schedule > immediately. (Otherwise failures can take a while to propagate, and jobs get > a lot slower). > [~jlowe] - think you've looked at this in the past. Any thoughts/suggestions. > What I'm seeing is retroactive failures taking a long time to apply, and > restart sources which ran on a bad node. Also running tasks being counted as > FAILURES instead of KILLS. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (TEZ-3694) Adopt YARN-5007 in MiniTezCluster
[ https://issues.apache.org/jira/browse/TEZ-3694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhiyuan Yang updated TEZ-3694: -- Attachment: TEZ-3694.2.patch > Adopt YARN-5007 in MiniTezCluster > - > > Key: TEZ-3694 > URL: https://issues.apache.org/jira/browse/TEZ-3694 > Project: Apache Tez > Issue Type: Bug >Reporter: Zhiyuan Yang >Assignee: Zhiyuan Yang >Priority: Major > Attachments: TEZ-3694.1.patch, TEZ-3694.2.patch > > > Master branch won't build on hadoop trunk because YARN-5007 removed enableAHS > param from MiniYarnCluster ctor, which breaks MiniTezCluster. We should adopt > the change and use config to enable timeline service. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (TEZ-3694) Adopt YARN-5007 in MiniTezCluster
[ https://issues.apache.org/jira/browse/TEZ-3694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16548335#comment-16548335 ] Zhiyuan Yang commented on TEZ-3694: --- I see hadoop version has been raised to 3.0.3 in TEZ-3955. Probably the patch here already work. Let me kick off another jenkins run for it. But feel free to drop this from the release. > Adopt YARN-5007 in MiniTezCluster > - > > Key: TEZ-3694 > URL: https://issues.apache.org/jira/browse/TEZ-3694 > Project: Apache Tez > Issue Type: Bug >Reporter: Zhiyuan Yang >Assignee: Zhiyuan Yang >Priority: Major > Attachments: TEZ-3694.1.patch, TEZ-3694.2.patch > > > Master branch won't build on hadoop trunk because YARN-5007 removed enableAHS > param from MiniYarnCluster ctor, which breaks MiniTezCluster. We should adopt > the change and use config to enable timeline service. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Closed] (TEZ-3803) Tasks can get killed due to insufficient progress while waiting for shuffle inputs to complete
[ https://issues.apache.org/jira/browse/TEZ-3803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhiyuan Yang closed TEZ-3803. - > Tasks can get killed due to insufficient progress while waiting for shuffle > inputs to complete > -- > > Key: TEZ-3803 > URL: https://issues.apache.org/jira/browse/TEZ-3803 > Project: Apache Tez > Issue Type: Bug >Reporter: Kuhu Shukla >Assignee: Kuhu Shukla >Priority: Critical > Fix For: 0.9.1 > > Attachments: TEZ-3803.001.patch, TEZ-3803.002.patch, > TEZ-3803.003.patch, TEZ-3803.004.patch, TEZ-3803.005.patch > > > In a scenario where a downstream task has no slow start and gets started > before all its shuffle inputs are done, the task can timeout as the wait does > not notify progress( set the "progress is being made bit") like it does in > MapReduce. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Closed] (TEZ-3832) TEZ DAG status shows SUCCEEDED for SUCCEEDED_WITH_FAILURES final status
[ https://issues.apache.org/jira/browse/TEZ-3832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhiyuan Yang closed TEZ-3832. - > TEZ DAG status shows SUCCEEDED for SUCCEEDED_WITH_FAILURES final status > --- > > Key: TEZ-3832 > URL: https://issues.apache.org/jira/browse/TEZ-3832 > Project: Apache Tez > Issue Type: Bug > Components: UI >Reporter: Jonathan Eagles >Assignee: Jonathan Eagles > Fix For: 0.9.1 > > Attachments: TEZ-3832.001.patch > > > This is a regression from Tez 0.7 UI. Relevant changes are made to the > dag/index, home/index, and app/dags routes. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Closed] (TEZ-3844) Tez UI Dag Counters show no records for a RUNNING DAG.
[ https://issues.apache.org/jira/browse/TEZ-3844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhiyuan Yang closed TEZ-3844. - > Tez UI Dag Counters show no records for a RUNNING DAG. > -- > > Key: TEZ-3844 > URL: https://issues.apache.org/jira/browse/TEZ-3844 > Project: Apache Tez > Issue Type: Bug > Components: UI >Reporter: Kuhu Shukla >Assignee: Jonathan Eagles > Fix For: 0.9.1 > > Attachments: TEZ-3844.001.patch > > > A Running DAG shows no counters under "DAG Counters" tab even though the Dag > Overview page shows REST response with counters coming through. CC: > [~Sreenath]. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Closed] (TEZ-3862) Tez UI: Upgrade em-tgraph to version 0.0.14
[ https://issues.apache.org/jira/browse/TEZ-3862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhiyuan Yang closed TEZ-3862. - > Tez UI: Upgrade em-tgraph to version 0.0.14 > --- > > Key: TEZ-3862 > URL: https://issues.apache.org/jira/browse/TEZ-3862 > Project: Apache Tez > Issue Type: Bug > Components: UI >Reporter: Jonathan Eagles >Assignee: Jonathan Eagles >Priority: Trivial > Fix For: 0.9.1 > > Attachments: TEZ-3862.001.patch > > > There have been notable improvements that can be pulled in, that will make > viewing graphs easier. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Closed] (TEZ-3861) PipelineSorter setting negative progess
[ https://issues.apache.org/jira/browse/TEZ-3861?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhiyuan Yang closed TEZ-3861. - > PipelineSorter setting negative progess > --- > > Key: TEZ-3861 > URL: https://issues.apache.org/jira/browse/TEZ-3861 > Project: Apache Tez > Issue Type: Bug >Affects Versions: 0.9.1 >Reporter: Prasanth Jayachandran >Assignee: Rajesh Balamohan > Fix For: 0.9.1 > > Attachments: TEZ-3861.2.patch, TezTR-545720_1_1_2_0_0.log > > > PipelineSorter is generating too big log mostly coming from setting progress > to negative value in some cases. > {code} > 2017-10-30T01:22:16,466 DEBUG [TezTR-702853_1_1_2_0_0] util.Progress: Illegal > progress value found, progress is less than 0. Progress will be changed to 0 > 2017-10-30T01:22:16,469 DEBUG [TezTR-702853_1_1_2_0_0] util.Progress: Illegal > progress value found, progress is less than 0. Progress will be changed to 0 > 2017-10-30T01:22:16,469 DEBUG [TezTR-702853_1_1_2_0_0] util.Progress: Illegal > progress value found, progress is less than 0. Progress will be changed to 0 > 2017-10-30T01:22:16,470 DEBUG [TezTR-702853_1_1_2_0_0] util.Progress: Illegal > progress value found, progress is less than 0. Progress will be changed to 0 > 2017-10-30T01:22:16,470 DEBUG [TezTR-702853_1_1_2_0_0] util.Progress: Illegal > progress value found, progress is less than 0. Progress will be changed to 0 > {code} > this is emitted from > https://github.com/apache/tez/blob/87d7c145ffc71707d1d393fddf94efa2a77d8822/tez-runtime-library/src/main/java/org/apache/tez/runtime/library/common/sort/impl/PipelinedSorter.java#L1126 -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Closed] (TEZ-3855) Allow vertex manager to send event to processor
[ https://issues.apache.org/jira/browse/TEZ-3855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhiyuan Yang closed TEZ-3855. - > Allow vertex manager to send event to processor > --- > > Key: TEZ-3855 > URL: https://issues.apache.org/jira/browse/TEZ-3855 > Project: Apache Tez > Issue Type: Bug >Reporter: Zhiyuan Yang >Assignee: Zhiyuan Yang >Priority: Blocker > Fix For: 0.9.1 > > Attachments: TEZ-3855.1.patch, TEZ-3855.2.patch, TEZ-3855.3.patch, > TEZ-3855.addendum.patch, TEZ-3855.prototype.patch > > > Hive is trying to propagate some info from vertex manager to processor. The > task framework support processor event but there is no interface for VM to > send event out. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Closed] (TEZ-3869) Analyzer: Fix VertexInfo::getLastTaskToFinish comparison
[ https://issues.apache.org/jira/browse/TEZ-3869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhiyuan Yang closed TEZ-3869. - > Analyzer: Fix VertexInfo::getLastTaskToFinish comparison > > > Key: TEZ-3869 > URL: https://issues.apache.org/jira/browse/TEZ-3869 > Project: Apache Tez > Issue Type: Bug >Reporter: Rajesh Balamohan >Assignee: Rajesh Balamohan >Priority: Minor > Fix For: 0.9.1 > > Attachments: TEZ-3869.1.patch, TEZ-3869.2.patch > > > {{VertexInfo::getLastTaskToFinish}} incorrectly compares with > getStartTimeInterval. This needs to be fixed. Observed timsort exceptions > when analyzing some dag zips. > {code} > java.lang.IllegalArgumentException: Comparison method violates its general > contract! > at java.util.TimSort.mergeHi(TimSort.java:895) > at java.util.TimSort.mergeAt(TimSort.java:512) > at java.util.TimSort.mergeForceCollapse(TimSort.java:453) > at java.util.TimSort.sort(TimSort.java:250) > at java.util.Arrays.sort(Arrays.java:1435) > at java.util.Collections.sort(Collections.java:230) > at > org.apache.tez.history.parser.datamodel.VertexInfo.getLastTaskToFinish(VertexInfo.java:542) > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Closed] (TEZ-3804) FetcherOrderedGrouped#setupLocalDiskFetch should ignore empty partition records
[ https://issues.apache.org/jira/browse/TEZ-3804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhiyuan Yang closed TEZ-3804. - > FetcherOrderedGrouped#setupLocalDiskFetch should ignore empty partition > records > --- > > Key: TEZ-3804 > URL: https://issues.apache.org/jira/browse/TEZ-3804 > Project: Apache Tez > Issue Type: Bug >Reporter: Kuhu Shukla >Assignee: Kuhu Shukla > Fix For: 0.9.1 > > Attachments: TEZ-3804.001.patch > > > Similar to the copyMapOutput() logic, local fetches can also ignore > indexRecords that are empty (hasData == false) to avoid duplicate fetch > warnings. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Closed] (TEZ-3876) Bug in local mode distributed cache files
[ https://issues.apache.org/jira/browse/TEZ-3876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhiyuan Yang closed TEZ-3876. - > Bug in local mode distributed cache files > - > > Key: TEZ-3876 > URL: https://issues.apache.org/jira/browse/TEZ-3876 > Project: Apache Tez > Issue Type: Task >Reporter: Jacob Tolar >Assignee: Jacob Tolar >Priority: Minor > Fix For: 0.9.1 > > Attachments: TEZ-3876.2.patch, TEZ-3876.3.patch > > > If multiple symlinks to the same resource are requested, only one is created. > See TEZ-3848 -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Closed] (TEZ-3825) Tez UI DAGs page can't query RUNNING or SUBMITTED apps
[ https://issues.apache.org/jira/browse/TEZ-3825?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhiyuan Yang closed TEZ-3825. - > Tez UI DAGs page can't query RUNNING or SUBMITTED apps > -- > > Key: TEZ-3825 > URL: https://issues.apache.org/jira/browse/TEZ-3825 > Project: Apache Tez > Issue Type: Bug > Components: UI >Reporter: Jonathan Eagles >Assignee: Jonathan Eagles > Fix For: 0.9.1 > > Attachments: TEZ-3825.001.patch > > > status is only a primary filter when a final dag status is set. RUNNING and > SUBMITTED status can't be added as a final status so it must be set to > secondaryFilter -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Closed] (TEZ-3845) Tez UI Cleanup Stats Table
[ https://issues.apache.org/jira/browse/TEZ-3845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhiyuan Yang closed TEZ-3845. - > Tez UI Cleanup Stats Table > -- > > Key: TEZ-3845 > URL: https://issues.apache.org/jira/browse/TEZ-3845 > Project: Apache Tez > Issue Type: Bug > Components: UI >Reporter: Jonathan Eagles >Assignee: Jonathan Eagles > Fix For: 0.9.1 > > Attachments: TEZ-3845.001.patch, after_stats.png, before_stats.png > > > Removed redundant status (for example: Succeeded Tasks: 10 Succeeded) > Made total tasks links > Added killed/failed task attempts available on the dag/index/ page > Reordered Stats to be consistent across all pages. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Closed] (TEZ-3857) Tez TaskImpl can throw Invalid state transition for leaf tasks that do Retro Active Transition
[ https://issues.apache.org/jira/browse/TEZ-3857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhiyuan Yang closed TEZ-3857. - > Tez TaskImpl can throw Invalid state transition for leaf tasks that do Retro > Active Transition > -- > > Key: TEZ-3857 > URL: https://issues.apache.org/jira/browse/TEZ-3857 > Project: Apache Tez > Issue Type: Bug >Reporter: Kuhu Shukla >Assignee: Kuhu Shukla > Fix For: 0.9.1 > > Attachments: TEZ-3857.001.patch, TEZ-3857.002.patch, > TEZ-3857.003.patch > > > {code} > Invalid event T_ATTEMPT_FAILED on Task task_1234_5678_1_01_01 > {code} > The task had more than one running attempts (because of speculative > execution), while one of them succeeded and the task was marked succeeded, > the second failed and caused the Task state machine to enter error state > since the task was in a leaf vertex and does the following: > {code} > if (task.leafVertex) { > LOG.error("Unexpected event for task of leaf vertex " + > event.getType() + ", taskId: " > + task.getTaskId()); > task.internalError(event.getType()); > } > {code} > This JIRA tracks fixing this invalid state. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Closed] (TEZ-3752) Reduce Object size of InMemoryMapOutput for large jobs
[ https://issues.apache.org/jira/browse/TEZ-3752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhiyuan Yang closed TEZ-3752. - > Reduce Object size of InMemoryMapOutput for large jobs > -- > > Key: TEZ-3752 > URL: https://issues.apache.org/jira/browse/TEZ-3752 > Project: Apache Tez > Issue Type: Bug >Reporter: Jonathan Eagles >Assignee: Muhammad Samir Khan > Fix For: 0.9.1 > > Attachments: TEZ-3752.001.patch > > > Follow-on jira from TEZ-3732. The InMemoryMapOutput has a > BoundedByteArrayOutputStream that is only used in the Merged MapOutput case. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Closed] (TEZ-3431) Add unit tests for container release
[ https://issues.apache.org/jira/browse/TEZ-3431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhiyuan Yang closed TEZ-3431. - > Add unit tests for container release > > > Key: TEZ-3431 > URL: https://issues.apache.org/jira/browse/TEZ-3431 > Project: Apache Tez > Issue Type: Bug >Reporter: Sushmitha Sreenivasan >Assignee: Taklon Stephen Wu > Labels: newbie > Fix For: 0.9.1 > > Attachments: TEZ-3431.1.patch, TEZ-3431.2.patch, TEZ-3431.patch > > > * Add unit tests to verify that scheduler release container after expiry > time(HeldContainer.containerExpiryTime). > ** This add a local cluster mock test for releasing container when > HeldContainer.containerExpiryTime is older than current date time in > milliseconds and container is not new. > ** Also, this commit refactor the common variables appHost, appPort, appUrl > and appMsg to default constant values. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Closed] (TEZ-3847) AM web controller task counters are empty sometimes
[ https://issues.apache.org/jira/browse/TEZ-3847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhiyuan Yang closed TEZ-3847. - > AM web controller task counters are empty sometimes > --- > > Key: TEZ-3847 > URL: https://issues.apache.org/jira/browse/TEZ-3847 > Project: Apache Tez > Issue Type: Bug >Reporter: Jonathan Eagles >Assignee: Jonathan Eagles > Fix For: 0.9.1 > > Attachments: TEZ-3847.001.patch, TEZ-3847.002.patch, > TEZ-3847.003.patch > > > The interval for statistics and counters are send at longer intervals and the > TaskAttemptImpl blindly overwrites it stats and counters with null. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Closed] (TEZ-3834) TaskSchedulerManager NullPointerException during shutdown when failed to start
[ https://issues.apache.org/jira/browse/TEZ-3834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhiyuan Yang closed TEZ-3834. - > TaskSchedulerManager NullPointerException during shutdown when failed to start > -- > > Key: TEZ-3834 > URL: https://issues.apache.org/jira/browse/TEZ-3834 > Project: Apache Tez > Issue Type: Bug >Reporter: Jonathan Eagles >Assignee: Jonathan Eagles > Fix For: 0.9.1 > > Attachments: TEZ-3834.001.patch, TEZ-3834.002.patch, > TEZ-3834.003.patch > > > {noformat:title=NPE 1} > 2017-09-14 12:16:48,259 [ERROR] [main] |rm.TaskSchedulerManager|: Failed to > do a clean initiateStop for Scheduler: [0:TezYarn] > java.lang.NullPointerException > at > org.apache.tez.dag.app.rm.TaskSchedulerManager.initiateStop(TaskSchedulerManager.java:696) > at > org.apache.tez.dag.app.DAGAppMaster.initiateStop(DAGAppMaster.java:2223) > at > org.apache.tez.dag.app.DAGAppMaster.serviceStop(DAGAppMaster.java:2239) > at > org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221) > at > org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52) > at > org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80) > at > org.apache.hadoop.service.AbstractService.start(AbstractService.java:203) > at org.apache.tez.dag.app.DAGAppMaster$9.run(DAGAppMaster.java:2707) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1936) > at > org.apache.tez.dag.app.DAGAppMaster.initAndStartAppMaster(DAGAppMaster.java:2703) > at org.apache.tez.dag.app.DAGAppMaster.main(DAGAppMaster.java:2508) > {noformat} > {noformat:title=NPE 2} > 2017-09-14 12:16:48,610 [ERROR] [main] |rm.TaskSchedulerManager|: Error in > TaskScheduler when checking if a scheduler has unregistered, > scheduler=[0:TezYarn] > java.lang.NullPointerException > at > org.apache.tez.dag.app.rm.TaskSchedulerManager.hasUnregistered(TaskSchedulerManager.java:998) > at > org.apache.tez.dag.app.DAGAppMaster.serviceStop(DAGAppMaster.java:2252) > at > org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221) > at > org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52) > at > org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80) > at > org.apache.hadoop.service.AbstractService.start(AbstractService.java:203) > at org.apache.tez.dag.app.DAGAppMaster$9.run(DAGAppMaster.java:2707) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1936) > at > org.apache.tez.dag.app.DAGAppMaster.initAndStartAppMaster(DAGAppMaster.java:2703) > at org.apache.tez.dag.app.DAGAppMaster.main(DAGAppMaster.java:2508) > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Closed] (TEZ-3850) Enable header as sort button on Tez UI
[ https://issues.apache.org/jira/browse/TEZ-3850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhiyuan Yang closed TEZ-3850. - > Enable header as sort button on Tez UI > -- > > Key: TEZ-3850 > URL: https://issues.apache.org/jira/browse/TEZ-3850 > Project: Apache Tez > Issue Type: Sub-task > Components: UI >Reporter: Jonathan Eagles >Assignee: Jonathan Eagles > Fix For: 0.9.1 > > Attachments: TEZ-3850.001.patch, TEZ-3850.002.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Closed] (TEZ-3805) Analyzer: Add an analyzer to find out scheduling misses in 1:1 edges
[ https://issues.apache.org/jira/browse/TEZ-3805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhiyuan Yang closed TEZ-3805. - > Analyzer: Add an analyzer to find out scheduling misses in 1:1 edges > > > Key: TEZ-3805 > URL: https://issues.apache.org/jira/browse/TEZ-3805 > Project: Apache Tez > Issue Type: Improvement >Reporter: Rajesh Balamohan >Assignee: Rajesh Balamohan > Fix For: 0.9.1 > > Attachments: TEZ-3805.1.patch > > > When 1:1 edge is used, it would be helpful to find out whether downstream > tasks ran on the same location provided in the hints by the runtime. > One of the recent feature in upstream project (hive) used 1:1 edge. Instead > of checking the logs, it would be useful to have an analyzer to churn out the > details. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Closed] (TEZ-3797) Add tez debug tool for comparing counters of 2 DAGs
[ https://issues.apache.org/jira/browse/TEZ-3797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhiyuan Yang closed TEZ-3797. - > Add tez debug tool for comparing counters of 2 DAGs > --- > > Key: TEZ-3797 > URL: https://issues.apache.org/jira/browse/TEZ-3797 > Project: Apache Tez > Issue Type: Bug >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran > Fix For: 0.9.1 > > Attachments: TEZ-3797.1.patch, counter-diff.png > > > Will be useful for debugging to have a simple script that just compares the > counters from 2 different dag runs. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Closed] (TEZ-3666) Integer overflow in ShuffleVertexManagerBase
[ https://issues.apache.org/jira/browse/TEZ-3666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhiyuan Yang closed TEZ-3666. - > Integer overflow in ShuffleVertexManagerBase > > > Key: TEZ-3666 > URL: https://issues.apache.org/jira/browse/TEZ-3666 > Project: Apache Tez > Issue Type: Bug >Reporter: Ming Ma >Assignee: Ming Ma > Fix For: 0.9.1 > > Attachments: TEZ-3666-2.patch, TEZ-3666.patch > > > In function getExpectedStatsInAtIndex, {{statsInMB[index] * numTasks / > numVMEventsReceived}} could cause Integer overflow, for example when > statsInMB[index] == 3 and numTasks == 20. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Closed] (TEZ-3843) Tez UI Vertex/Tasks log links for running tasks are missing
[ https://issues.apache.org/jira/browse/TEZ-3843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhiyuan Yang closed TEZ-3843. - > Tez UI Vertex/Tasks log links for running tasks are missing > --- > > Key: TEZ-3843 > URL: https://issues.apache.org/jira/browse/TEZ-3843 > Project: Apache Tez > Issue Type: Bug > Components: UI >Reporter: Jonathan Eagles >Assignee: Jonathan Eagles > Fix For: 0.9.1 > > Attachments: TEZ-3843.001.patch, relatedentities.png > > > task serialization mistakenly getting list of attempts under > otherinfo.relatedentities. relatedentities is a top level property are > serialization should reflect this. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Closed] (TEZ-3813) Reduce Object size of MemoryFetchedInput for large jobs
[ https://issues.apache.org/jira/browse/TEZ-3813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhiyuan Yang closed TEZ-3813. - > Reduce Object size of MemoryFetchedInput for large jobs > --- > > Key: TEZ-3813 > URL: https://issues.apache.org/jira/browse/TEZ-3813 > Project: Apache Tez > Issue Type: Bug >Reporter: Muhammad Samir Khan >Assignee: Muhammad Samir Khan > Fix For: 0.9.1 > > Attachments: TEZ-3813.001.patch, TEZ-3813.002.patch, > TEZ-3813.003.patch, TEZ-3813.004.patch, TEZ-3813.005.patch, TEZ-3813.006.patch > > > Same as TEZ-3752 for the unordered case. MemoryFetchedInput has a > BoundedByteArrayOutputStream that is not used (only the underlying byte[] is > used). -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Closed] (TEZ-3833) Tasks should report codec errors during shuffle as fetch failures
[ https://issues.apache.org/jira/browse/TEZ-3833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhiyuan Yang closed TEZ-3833. - > Tasks should report codec errors during shuffle as fetch failures > - > > Key: TEZ-3833 > URL: https://issues.apache.org/jira/browse/TEZ-3833 > Project: Apache Tez > Issue Type: Bug >Reporter: Kuhu Shukla >Assignee: Kuhu Shukla > Fix For: 0.9.1 > > Attachments: TEZ-3833.001.patch, TEZ-3833.002.patch, > TEZ-3833.003.patch, TEZ-3833.004.patch, TEZ-3833.005.patch > > > Do the equivalent of https://issues.apache.org/jira/browse/MAPREDUCE-6633 so > that compression errors do not prove fatal for the DAG/tasks. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Closed] (TEZ-3858) Misleading dag level diagnostics in case of invalid vertex event
[ https://issues.apache.org/jira/browse/TEZ-3858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhiyuan Yang closed TEZ-3858. - > Misleading dag level diagnostics in case of invalid vertex event > > > Key: TEZ-3858 > URL: https://issues.apache.org/jira/browse/TEZ-3858 > Project: Apache Tez > Issue Type: Bug >Reporter: Zhiyuan Yang >Assignee: Zhiyuan Yang > Fix For: 0.9.1 > > Attachments: TEZ-3858.1.patch, TEZ-3858.2.patch > > > When a vertex gets invalid event, the state machine will be transited by > InternalErrorTransition. This transition prints this and adds it to dag > diagnostic: > {code} > ("Invalid event " + event.getType() + " on Vertex " + > vertex.getLogIdentifier() > {code} > But variable event here is V_INTERNAL_ERROR event instead of the event that > caused V_INTERNAL_ERROR. V_INTERNAL_ERROR is not the invalid event, the > original event is. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Closed] (TEZ-3828) Allow relaxing locality when retried task's priority is kept same
[ https://issues.apache.org/jira/browse/TEZ-3828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhiyuan Yang closed TEZ-3828. - > Allow relaxing locality when retried task's priority is kept same > -- > > Key: TEZ-3828 > URL: https://issues.apache.org/jira/browse/TEZ-3828 > Project: Apache Tez > Issue Type: Bug >Reporter: Zhiyuan Yang >Assignee: Zhiyuan Yang > Fix For: 0.9.1 > > Attachments: TEZ-3828.1.patch, TEZ-3828.2.patch, TEZ-3828.3.patch > > > TEZ-3716 introduced the conf to keep priority for retried task, but there is > no way to relax locality requirement in that case. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Closed] (TEZ-3854) Make use of new improved em-table sort-icon
[ https://issues.apache.org/jira/browse/TEZ-3854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhiyuan Yang closed TEZ-3854. - > Make use of new improved em-table sort-icon > --- > > Key: TEZ-3854 > URL: https://issues.apache.org/jira/browse/TEZ-3854 > Project: Apache Tez > Issue Type: Sub-task >Reporter: Jonathan Eagles >Assignee: Jonathan Eagles > Fix For: 0.9.1 > > Attachments: TEZ-3854.001.patch > > > em-table 0.11.3 uses improved table column sort-icon. This jira updates > em-table version and makes changes to fully support the new feature. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Closed] (TEZ-3816) Ability to automatically speculate single-task vertices
[ https://issues.apache.org/jira/browse/TEZ-3816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhiyuan Yang closed TEZ-3816. - > Ability to automatically speculate single-task vertices > --- > > Key: TEZ-3816 > URL: https://issues.apache.org/jira/browse/TEZ-3816 > Project: Apache Tez > Issue Type: Improvement >Reporter: Muhammad Samir Khan >Assignee: Muhammad Samir Khan > Fix For: 0.9.1 > > Attachments: TEZ-3816.001.patch, TEZ-3816.002.patch, > TEZ-3816.003.patch > > > When a single-task vertex is unlucky, it lands on a very slow node. > Speculation doesn't currently apply when there are no other tasks to compare > with. It would be good to have a configurable timeout after which the tasks > automatically speculate. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Closed] (TEZ-3807) InMemoryWriter is not tested with RLE enabled
[ https://issues.apache.org/jira/browse/TEZ-3807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhiyuan Yang closed TEZ-3807. - > InMemoryWriter is not tested with RLE enabled > - > > Key: TEZ-3807 > URL: https://issues.apache.org/jira/browse/TEZ-3807 > Project: Apache Tez > Issue Type: Test >Reporter: Muhammad Samir Khan >Assignee: Muhammad Samir Khan > Fix For: 0.9.1 > > Attachments: TEZ-3807.001.patch, TEZ-3807.002.patch > > > In TestIFile, A couple of test cases are supposed to test InMemoryWriter with > RLE enabled but the RLE flag is turned off. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Closed] (TEZ-3836) Tez UI task page sort does not work on RHEL7/Fedora
[ https://issues.apache.org/jira/browse/TEZ-3836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhiyuan Yang closed TEZ-3836. - > Tez UI task page sort does not work on RHEL7/Fedora > --- > > Key: TEZ-3836 > URL: https://issues.apache.org/jira/browse/TEZ-3836 > Project: Apache Tez > Issue Type: Bug >Reporter: Kuhu Shukla >Assignee: Sreenath Somarajapuram > Fix For: 0.9.1 > > Attachments: TEZ-3836.1.patch > > > Irrespective of the browser, linux machines have trouble rendering the sort > arrows near the edge of the columns. Resizing the column does not solve the > problem either. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Closed] (TEZ-3801) Update version in master to 0.9.1
[ https://issues.apache.org/jira/browse/TEZ-3801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhiyuan Yang closed TEZ-3801. - > Update version in master to 0.9.1 > - > > Key: TEZ-3801 > URL: https://issues.apache.org/jira/browse/TEZ-3801 > Project: Apache Tez > Issue Type: Bug >Reporter: Zhiyuan Yang >Assignee: Zhiyuan Yang > Fix For: 0.9.1 > > Attachments: TEZ-3801.1.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Closed] (TEZ-3831) Reduce Unordered memory needed for storing empty completed events
[ https://issues.apache.org/jira/browse/TEZ-3831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhiyuan Yang closed TEZ-3831. - > Reduce Unordered memory needed for storing empty completed events > - > > Key: TEZ-3831 > URL: https://issues.apache.org/jira/browse/TEZ-3831 > Project: Apache Tez > Issue Type: Bug >Reporter: Jonathan Eagles >Assignee: Jonathan Eagles > Fix For: 0.9.1 > > Attachments: Screen Shot 2017-09-13 at 4.55.11 PM.png, > TEZ-3831.001-addendum.patch, TEZ-3831.001.patch > > > the completedInputs blocking queue is used to store inputs for the > UnorderedKVReader to consume. With Auto-reduce parallelism enabled and nearly > all empty inputs, the reader can't prune the empty events from the blocking > queue fast enough to keep up. In my scenario, an OOM occurred. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Closed] (TEZ-3839) Tez Shuffle Handler prints disk error stack traces for every read failure.
[ https://issues.apache.org/jira/browse/TEZ-3839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhiyuan Yang closed TEZ-3839. - > Tez Shuffle Handler prints disk error stack traces for every read failure. > -- > > Key: TEZ-3839 > URL: https://issues.apache.org/jira/browse/TEZ-3839 > Project: Apache Tez > Issue Type: Bug >Reporter: Kuhu Shukla >Assignee: Kuhu Shukla > Fix For: 0.9.1 > > Attachments: TEZ-3839.001.patch > > > Do the equivalent MAPREDUCE-6960 for the Tez Shuffle Handler. This will avoid > filling up the logs with disk error exceptions for every read. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Closed] (TEZ-3827) TEZ Vertex status on DAG index page shows SUCCEEDED for SUCCEEDED_WITH_FAILURES final status
[ https://issues.apache.org/jira/browse/TEZ-3827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhiyuan Yang closed TEZ-3827. - > TEZ Vertex status on DAG index page shows SUCCEEDED for > SUCCEEDED_WITH_FAILURES final status > > > Key: TEZ-3827 > URL: https://issues.apache.org/jira/browse/TEZ-3827 > Project: Apache Tez > Issue Type: Bug > Components: UI >Reporter: Jonathan Eagles >Assignee: Jonathan Eagles > Fix For: 0.9.1 > > Attachments: TEZ-3827.001.patch > > > Vertex details page has a more advance final status with SUCCEEDED with > FAILURES. This adds that logic to the DAG details vertex table as well. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Closed] (TEZ-3840) Tez should write TEZ_DAG_ID before TEZ_EXTRA_INFO
[ https://issues.apache.org/jira/browse/TEZ-3840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhiyuan Yang closed TEZ-3840. - > Tez should write TEZ_DAG_ID before TEZ_EXTRA_INFO > - > > Key: TEZ-3840 > URL: https://issues.apache.org/jira/browse/TEZ-3840 > Project: Apache Tez > Issue Type: Bug >Reporter: Jonathan Eagles >Assignee: Jonathan Eagles > Fix For: 0.9.1 > > Attachments: TEZ-3840.001.addendum.patch, TEZ-3840.001.patch > > > The relation added from EXTRA_INFO to DAG_ID is added before DAG_ID is > written and will add the relation ship and auto-vivify the the DAG_ID entity. > Writing them in the other order is more natural. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Closed] (TEZ-3724) Tez UI on HTTP "corrects" HTTPS REST calls to HTTP
[ https://issues.apache.org/jira/browse/TEZ-3724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhiyuan Yang closed TEZ-3724. - > Tez UI on HTTP "corrects" HTTPS REST calls to HTTP > -- > > Key: TEZ-3724 > URL: https://issues.apache.org/jira/browse/TEZ-3724 > Project: Apache Tez > Issue Type: Bug > Components: UI >Reporter: Jonathan Eagles >Assignee: Jonathan Eagles > Fix For: 0.9.1 > > Attachments: TEZ-3724.1.patch, TEZ-3724.2.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Closed] (TEZ-3848) Tez Local mode doesn't localize distributed cache files
[ https://issues.apache.org/jira/browse/TEZ-3848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhiyuan Yang closed TEZ-3848. - > Tez Local mode doesn't localize distributed cache files > --- > > Key: TEZ-3848 > URL: https://issues.apache.org/jira/browse/TEZ-3848 > Project: Apache Tez > Issue Type: Bug >Reporter: Jacob Tolar >Assignee: Jacob Tolar > Fix For: 0.9.1 > > Attachments: TEZ-3848.1.patch > > > Tez doesn't symlink LocalResources into place in LocalContainerLauncher. > In YARN mode, Yarn takes care of this when it launches the container. But in > local mode, if you're depending on a file existing in the distributed cache, > it's never symlinked into place (so you're out of luck). > We test our pig scripts in local mode and have some tools to set up the > distributed cache the same way it would work in production. This works fine > in Mapreduce mode but are unable to use Pig + Tez local mode for testing due > to this problem. > I have a fix working and will submit a PR once I rebase it. > [~jeagles] [~wla...@yahoo-inc.com] -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Closed] (TEZ-3252) [Umbrella] Enable support for Hadoop-3.x
[ https://issues.apache.org/jira/browse/TEZ-3252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhiyuan Yang closed TEZ-3252. - > [Umbrella] Enable support for Hadoop-3.x > - > > Key: TEZ-3252 > URL: https://issues.apache.org/jira/browse/TEZ-3252 > Project: Apache Tez > Issue Type: Bug >Reporter: Hitesh Shah > Fix For: 0.9.1 > > Attachments: TEZ-3252.patch > > > Placeholder umbrella to track the various issues/tasks discovered to get full > stable functionality against hadoop-3.x once it is released in a stable form. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Closed] (TEZ-3856) API to access counters in InputInitializerContext
[ https://issues.apache.org/jira/browse/TEZ-3856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhiyuan Yang closed TEZ-3856. - > API to access counters in InputInitializerContext > - > > Key: TEZ-3856 > URL: https://issues.apache.org/jira/browse/TEZ-3856 > Project: Apache Tez > Issue Type: Bug >Affects Versions: 0.9.1 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran > Fix For: 0.9.1 > > Attachments: TEZ-3856.1.patch, TEZ-3856.2.patch, TEZ-3856.2.patch, > TEZ-3856.3.patch > > > Hive would like to publish some counters related to input splits during split > generation. Tez doesn't expose TezCounters via InputIntializerContext. This > ticket is to expose TezCounters via InputInitializerContext so that counters > can be accessed during split generation. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Closed] (TEZ-3830) HistoryEventTimelineConversion should not hard code the Task state.
[ https://issues.apache.org/jira/browse/TEZ-3830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhiyuan Yang closed TEZ-3830. - > HistoryEventTimelineConversion should not hard code the Task state. > --- > > Key: TEZ-3830 > URL: https://issues.apache.org/jira/browse/TEZ-3830 > Project: Apache Tez > Issue Type: Bug >Reporter: Kuhu Shukla >Assignee: Kuhu Shukla > Fix For: 0.9.1 > > Attachments: TEZ-3830.001.patch > > > TaskStartedEvent can have the state of the task so that the HistoryConversion > does not require task state to be hardcoded to SCHEDULED. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Closed] (TEZ-3853) Binary incompatibility caused by DEFAULT_LOG_LEVEL
[ https://issues.apache.org/jira/browse/TEZ-3853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhiyuan Yang closed TEZ-3853. - > Binary incompatibility caused by DEFAULT_LOG_LEVEL > -- > > Key: TEZ-3853 > URL: https://issues.apache.org/jira/browse/TEZ-3853 > Project: Apache Tez > Issue Type: Sub-task >Affects Versions: 0.9.0 >Reporter: Aihua Xu >Assignee: Zhiyuan Yang > Fix For: 0.9.1 > > Attachments: TEZ-3853.1.patch > > > Hive is moving to support hadoop 3.0 in HIVE-15016. As we find out that > hadoop introduced some incompatible changes in 3.0 which requires Tez to > support hadoop 3.0 as well in order for hive to integrate with Tez. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Closed] (TEZ-3212) IFile throws NegativeArraySizeException for value sizes between 1GB and 2GB
[ https://issues.apache.org/jira/browse/TEZ-3212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhiyuan Yang closed TEZ-3212. - > IFile throws NegativeArraySizeException for value sizes between 1GB and 2GB > --- > > Key: TEZ-3212 > URL: https://issues.apache.org/jira/browse/TEZ-3212 > Project: Apache Tez > Issue Type: Bug >Reporter: Jonathan Eagles >Assignee: Muhammad Samir Khan > Fix For: 0.9.1 > > Attachments: TEZ-3212.1.patch, tez-3212.002.patch, > tez-3212.003.patch, tez-3212.004.patch, tez-3212.005.patch > > > This is not a regression with respect to MR, just an issue that was > encountered with a job whose IFile record values (which can be of max size > 2GB) which can be successfully written but not successfully read. > Failure while running task:java.lang.NegativeArraySizeException > at > org.apache.tez.runtime.library.common.sort.impl.IFile$Reader.nextRawValue(IFile.java:765) -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Closed] (TEZ-3852) Optimize ContainerContext.isSuperSet to speed container reuse decisions
[ https://issues.apache.org/jira/browse/TEZ-3852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhiyuan Yang closed TEZ-3852. - > Optimize ContainerContext.isSuperSet to speed container reuse decisions > --- > > Key: TEZ-3852 > URL: https://issues.apache.org/jira/browse/TEZ-3852 > Project: Apache Tez > Issue Type: Improvement >Reporter: Jonathan Eagles >Assignee: Jonathan Eagles > Fix For: 0.9.1 > > Attachments: TEZ-3852.001.patch, TEZ-3852.002.patch, > TEZ-3852.003.patch > > > Found an AM that was consuming high CPU. The stack trace below shows that > container reuse compatibility check with a high number of local resources was > the culprit. > {noformat:title=task scheduler compatibility check} > "DelayedContainerManager" #112 prio=5 os_prio=0 tid=0x03b59800 > nid=0x1edba runnable [0x7fe13c232000] >java.lang.Thread.State: RUNNABLE > at java.util.HashMap.putVal(HashMap.java:628) > at java.util.HashMap.putMapEntries(HashMap.java:514) > at java.util.HashMap.(HashMap.java:489) > at > org.apache.tez.dag.app.ContainerContext.localResourcesCompatible(ContainerContext.java:132) > at > org.apache.tez.dag.app.ContainerContext.isSuperSet(ContainerContext.java:116) > at > org.apache.tez.dag.app.rm.container.ContainerContextMatcher.isSuperSet(ContainerContextMatcher.java:50) > at > org.apache.tez.dag.app.rm.YarnTaskSchedulerService.canAssignTaskToContainer(YarnTaskSchedulerService.java:1543) > at > org.apache.tez.dag.app.rm.YarnTaskSchedulerService.getMatchingRequestWithoutPriority(YarnTaskSchedulerService.java:1492) > at > org.apache.tez.dag.app.rm.YarnTaskSchedulerService.access$500(YarnTaskSchedulerService.java:85) > at > org.apache.tez.dag.app.rm.YarnTaskSchedulerService$NodeLocalContainerAssigner.assignReUsedContainer(YarnTaskSchedulerService.java:1870) > at > org.apache.tez.dag.app.rm.YarnTaskSchedulerService.assignReUsedContainerWithLocation(YarnTaskSchedulerService.java:1754) > - locked <0x0006e0d12600> (a > org.apache.tez.dag.app.rm.YarnTaskSchedulerService) > at > org.apache.tez.dag.app.rm.YarnTaskSchedulerService.assignReUsedContainersWithLocation(YarnTaskSchedulerService.java:1712) > - locked <0x0006e0d12600> (a > org.apache.tez.dag.app.rm.YarnTaskSchedulerService) > at > org.apache.tez.dag.app.rm.YarnTaskSchedulerService.tryAssignReUsedContainers(YarnTaskSchedulerService.java:578) > - locked <0x0006e0d12600> (a > org.apache.tez.dag.app.rm.YarnTaskSchedulerService) > at > org.apache.tez.dag.app.rm.YarnTaskSchedulerService.access$800(YarnTaskSchedulerService.java:85) > at > org.apache.tez.dag.app.rm.YarnTaskSchedulerService$DelayedContainerManager.doAssignAll(YarnTaskSchedulerService.java:2103) > - locked <0x0006e0d12600> (a > org.apache.tez.dag.app.rm.YarnTaskSchedulerService) > at > org.apache.tez.dag.app.rm.YarnTaskSchedulerService$DelayedContainerManager.mainLoop(YarnTaskSchedulerService.java:1984) > at > org.apache.tez.dag.app.rm.YarnTaskSchedulerService$DelayedContainerManager.run(YarnTaskSchedulerService.java:1974) > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Closed] (TEZ-3868) Update website to factor in the TEZ trademark registration
[ https://issues.apache.org/jira/browse/TEZ-3868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhiyuan Yang closed TEZ-3868. - > Update website to factor in the TEZ trademark registration > -- > > Key: TEZ-3868 > URL: https://issues.apache.org/jira/browse/TEZ-3868 > Project: Apache Tez > Issue Type: Task >Reporter: Siddharth Seth >Assignee: Siddharth Seth > Fix For: 0.9.1 > > Attachments: TEZ-3868.01.patch, TEZ-3868.02.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Closed] (TEZ-3867) testSendCustomProcessorEvent try to get array out of read only ByteBuffer
[ https://issues.apache.org/jira/browse/TEZ-3867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhiyuan Yang closed TEZ-3867. - > testSendCustomProcessorEvent try to get array out of read only ByteBuffer > - > > Key: TEZ-3867 > URL: https://issues.apache.org/jira/browse/TEZ-3867 > Project: Apache Tez > Issue Type: Bug >Reporter: Zhiyuan Yang >Assignee: Zhiyuan Yang > Fix For: 0.9.1 > > Attachments: TEZ-3867.1.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Closed] (TEZ-3849) Combiner+PipelinedSorter silently drops records
[ https://issues.apache.org/jira/browse/TEZ-3849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhiyuan Yang closed TEZ-3849. - > Combiner+PipelinedSorter silently drops records > --- > > Key: TEZ-3849 > URL: https://issues.apache.org/jira/browse/TEZ-3849 > Project: Apache Tez > Issue Type: Bug >Reporter: Jacob Tolar >Assignee: Jacob Tolar > Fix For: 0.9.1 > > Attachments: TEZ-3849.1.patch, TEZ-3849.2.patch, TEZ-3849.3.patch, > TEZ-3849.4.patch, TEZ-3849.5.patch, TEZ-3849.6.patch > > > This bug was introduced in > https://github.com/apache/tez/commit/a47e8fcbea5eeab5a7cf812271d329524cc02dba?diff=split > > when combiner != null, the change in this commit passes kvIter with next() > having already been called. This ends up (silently) dropping the first record > in the partition. > Will submit PR and attach patch. [~jeagles], not sure if this is the way you > want to fix or not but it does fix my tests. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (TEZ-3883) Update version in master to 0.9.2
[ https://issues.apache.org/jira/browse/TEZ-3883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16312180#comment-16312180 ] Zhiyuan Yang commented on TEZ-3883: --- Patch committed to master > Update version in master to 0.9.2 > - > > Key: TEZ-3883 > URL: https://issues.apache.org/jira/browse/TEZ-3883 > Project: Apache Tez > Issue Type: Bug >Reporter: Zhiyuan Yang >Assignee: Zhiyuan Yang > Attachments: TEZ-3883.1.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Resolved] (TEZ-3883) Update version in master to 0.9.2
[ https://issues.apache.org/jira/browse/TEZ-3883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhiyuan Yang resolved TEZ-3883. --- Resolution: Fixed > Update version in master to 0.9.2 > - > > Key: TEZ-3883 > URL: https://issues.apache.org/jira/browse/TEZ-3883 > Project: Apache Tez > Issue Type: Bug >Reporter: Zhiyuan Yang >Assignee: Zhiyuan Yang > Attachments: TEZ-3883.1.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (TEZ-3883) Update version in master to 0.9.2
[ https://issues.apache.org/jira/browse/TEZ-3883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhiyuan Yang updated TEZ-3883: -- Attachment: TEZ-3883.1.patch > Update version in master to 0.9.2 > - > > Key: TEZ-3883 > URL: https://issues.apache.org/jira/browse/TEZ-3883 > Project: Apache Tez > Issue Type: Bug >Reporter: Zhiyuan Yang >Assignee: Zhiyuan Yang > Attachments: TEZ-3883.1.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (TEZ-3883) Update version in master to 0.9.2
Zhiyuan Yang created TEZ-3883: - Summary: Update version in master to 0.9.2 Key: TEZ-3883 URL: https://issues.apache.org/jira/browse/TEZ-3883 Project: Apache Tez Issue Type: Bug Reporter: Zhiyuan Yang Assignee: Zhiyuan Yang -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Resolved] (TEZ-3882) Changes for 0.9.1 release
[ https://issues.apache.org/jira/browse/TEZ-3882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhiyuan Yang resolved TEZ-3882. --- Resolution: Fixed > Changes for 0.9.1 release > - > > Key: TEZ-3882 > URL: https://issues.apache.org/jira/browse/TEZ-3882 > Project: Apache Tez > Issue Type: Bug >Reporter: Zhiyuan Yang >Assignee: Zhiyuan Yang > Attachments: TEZ-3882.1.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (TEZ-3882) Changes for 0.9.1 release
[ https://issues.apache.org/jira/browse/TEZ-3882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16312145#comment-16312145 ] Zhiyuan Yang commented on TEZ-3882: --- Patch committed to master branch. > Changes for 0.9.1 release > - > > Key: TEZ-3882 > URL: https://issues.apache.org/jira/browse/TEZ-3882 > Project: Apache Tez > Issue Type: Bug >Reporter: Zhiyuan Yang >Assignee: Zhiyuan Yang > Attachments: TEZ-3882.1.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (TEZ-3882) Changes for 0.9.1 release
[ https://issues.apache.org/jira/browse/TEZ-3882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhiyuan Yang updated TEZ-3882: -- Attachment: TEZ-3882.1.patch > Changes for 0.9.1 release > - > > Key: TEZ-3882 > URL: https://issues.apache.org/jira/browse/TEZ-3882 > Project: Apache Tez > Issue Type: Bug >Reporter: Zhiyuan Yang >Assignee: Zhiyuan Yang > Attachments: TEZ-3882.1.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (TEZ-3882) Changes for 0.9.1 release
Zhiyuan Yang created TEZ-3882: - Summary: Changes for 0.9.1 release Key: TEZ-3882 URL: https://issues.apache.org/jira/browse/TEZ-3882 Project: Apache Tez Issue Type: Bug Reporter: Zhiyuan Yang Assignee: Zhiyuan Yang -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (TEZ-3810) TezCounter for idle time in shuffle phase
[ https://issues.apache.org/jira/browse/TEZ-3810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16290112#comment-16290112 ] Zhiyuan Yang commented on TEZ-3810: --- I think this may be not necessary {code} } else if (idleStartTime != 0) { shuffleIdleTime.increment(Time.monotonicNow() - idleStartTime); idleStartTime = 0; } {code} since number of fetchers won't increase within this loop anyway. {code} while ((runningFetchers.size() >= numFetchers || pendingHosts.isEmpty()) && numCompletedInputs.get() < numInputs) { {code} Also the test make this counter look like a timestamp, although the code works. {code} long startTime = inputContext.getCounters().findCounter(TaskCounter.SHUFFLE_IDLE_TIME).getValue(); long endTime = inputContext.getCounters().findCounter(TaskCounter.SHUFFLE_IDLE_TIME).getValue(); assertTrue("ShuffleIdleTime counter was: "+ (endTime - startTime) + "ms", endTime - startTime >= 5000); {code} > TezCounter for idle time in shuffle phase > - > > Key: TEZ-3810 > URL: https://issues.apache.org/jira/browse/TEZ-3810 > Project: Apache Tez > Issue Type: Improvement >Reporter: Ashwin Ramesh > Attachments: TEZ-3810-001.patch, TEZ-3810.002.patch, > TEZ-3810.003.patch, TEZ-3810.004.patch > > > A task attempt counter that tracks how much time was spent waiting for > inputs in the shuffle phase. We can use this to quickly identify jobs that > are wasting a lot of time on the grid with idle reducer tasks instead of > shuffling/merging. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (TEZ-3694) Adopt YARN-5007 in MiniTezCluster
[ https://issues.apache.org/jira/browse/TEZ-3694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16289985#comment-16289985 ] Zhiyuan Yang commented on TEZ-3694: --- This cannot be done unless we raise hadoop version to 2.7.2. Before 2.7.2 MiniYarnCluster only read params but don't recognize conf at all. > Adopt YARN-5007 in MiniTezCluster > - > > Key: TEZ-3694 > URL: https://issues.apache.org/jira/browse/TEZ-3694 > Project: Apache Tez > Issue Type: Bug >Reporter: Zhiyuan Yang >Assignee: Zhiyuan Yang > Attachments: TEZ-3694.1.patch > > > Master branch won't build on hadoop trunk because YARN-5007 removed enableAHS > param from MiniYarnCluster ctor, which breaks MiniTezCluster. We should adopt > the change and use config to enable timeline service. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (TEZ-3869) Analyzer: Fix VertexInfo::getLastTaskToFinish comparison
[ https://issues.apache.org/jira/browse/TEZ-3869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhiyuan Yang updated TEZ-3869: -- Fix Version/s: (was: 0.9.next) 0.9.1 > Analyzer: Fix VertexInfo::getLastTaskToFinish comparison > > > Key: TEZ-3869 > URL: https://issues.apache.org/jira/browse/TEZ-3869 > Project: Apache Tez > Issue Type: Bug >Reporter: Rajesh Balamohan >Assignee: Rajesh Balamohan >Priority: Minor > Fix For: 0.9.1 > > Attachments: TEZ-3869.1.patch, TEZ-3869.2.patch > > > {{VertexInfo::getLastTaskToFinish}} incorrectly compares with > getStartTimeInterval. This needs to be fixed. Observed timsort exceptions > when analyzing some dag zips. > {code} > java.lang.IllegalArgumentException: Comparison method violates its general > contract! > at java.util.TimSort.mergeHi(TimSort.java:895) > at java.util.TimSort.mergeAt(TimSort.java:512) > at java.util.TimSort.mergeForceCollapse(TimSort.java:453) > at java.util.TimSort.sort(TimSort.java:250) > at java.util.Arrays.sort(Arrays.java:1435) > at java.util.Collections.sort(Collections.java:230) > at > org.apache.tez.history.parser.datamodel.VertexInfo.getLastTaskToFinish(VertexInfo.java:542) > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (TEZ-3868) Update website to factor in the TEZ trademark registration
[ https://issues.apache.org/jira/browse/TEZ-3868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhiyuan Yang updated TEZ-3868: -- Fix Version/s: 0.9.1 > Update website to factor in the TEZ trademark registration > -- > > Key: TEZ-3868 > URL: https://issues.apache.org/jira/browse/TEZ-3868 > Project: Apache Tez > Issue Type: Task >Reporter: Siddharth Seth >Assignee: Siddharth Seth > Fix For: 0.9.1 > > Attachments: TEZ-3868.01.patch, TEZ-3868.02.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Resolved] (TEZ-3252) [Umbrella] Enable support for Hadoop-3.x
[ https://issues.apache.org/jira/browse/TEZ-3252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhiyuan Yang resolved TEZ-3252. --- Resolution: Fixed > [Umbrella] Enable support for Hadoop-3.x > - > > Key: TEZ-3252 > URL: https://issues.apache.org/jira/browse/TEZ-3252 > Project: Apache Tez > Issue Type: Bug >Reporter: Hitesh Shah > Fix For: 0.9.1 > > Attachments: TEZ-3252.patch > > > Placeholder umbrella to track the various issues/tasks discovered to get full > stable functionality against hadoop-3.x once it is released in a stable form. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (TEZ-3694) Adopt YARN-5007 in MiniTezCluster
[ https://issues.apache.org/jira/browse/TEZ-3694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhiyuan Yang updated TEZ-3694: -- Issue Type: Bug (was: Sub-task) Parent: (was: TEZ-3252) > Adopt YARN-5007 in MiniTezCluster > - > > Key: TEZ-3694 > URL: https://issues.apache.org/jira/browse/TEZ-3694 > Project: Apache Tez > Issue Type: Bug >Reporter: Zhiyuan Yang >Assignee: Zhiyuan Yang > Attachments: TEZ-3694.1.patch > > > Master branch won't build on hadoop trunk because YARN-5007 removed enableAHS > param from MiniYarnCluster ctor, which breaks MiniTezCluster. We should adopt > the change and use config to enable timeline service. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (TEZ-3694) Adopt YARN-5007 in MiniTezCluster
[ https://issues.apache.org/jira/browse/TEZ-3694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16288483#comment-16288483 ] Zhiyuan Yang commented on TEZ-3694: --- Test failed...Moving this out of TEZ-3252 umbrella jira since this works fine with Hadoop 3. This can be fixed later. > Adopt YARN-5007 in MiniTezCluster > - > > Key: TEZ-3694 > URL: https://issues.apache.org/jira/browse/TEZ-3694 > Project: Apache Tez > Issue Type: Sub-task >Reporter: Zhiyuan Yang >Assignee: Zhiyuan Yang > Attachments: TEZ-3694.1.patch > > > Master branch won't build on hadoop trunk because YARN-5007 removed enableAHS > param from MiniYarnCluster ctor, which breaks MiniTezCluster. We should adopt > the change and use config to enable timeline service. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (TEZ-3874) NPE in TezClientUtils when "yarn.resourcemanager.zk-address" is present in Configuration
[ https://issues.apache.org/jira/browse/TEZ-3874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhiyuan Yang updated TEZ-3874: -- Fix Version/s: (was: 0.9.1) > NPE in TezClientUtils when "yarn.resourcemanager.zk-address" is present in > Configuration > > > Key: TEZ-3874 > URL: https://issues.apache.org/jira/browse/TEZ-3874 > Project: Apache Tez > Issue Type: Bug >Affects Versions: 0.9.1 >Reporter: Eric Wohlstadter >Priority: Blocker > Attachments: TEZ-3874.1.patch > > Original Estimate: 48h > Remaining Estimate: 48h > > "yarn.resourcemanager.zk-address" is deprecated in favor of > "hadoop.zk.address" for Hadoop 2.9+. > Configuration base class does't auto-translate the deprecation. Only > YarnConfiguration applies the translation. > In TezClientUtils.createFinalConfProtoForApp, a NPE is throw if > "yarn.resourcemanager.zk-address" is present in the Configuration. > {code} > for (Entry entry : amConf) { > PlanKeyValuePair.Builder kvp = PlanKeyValuePair.newBuilder(); > kvp.setKey(entry.getKey()); > kvp.setValue(amConf.get(entry.getKey())); > builder.addConfKeyValues(kvp); > } > {code} > Even though Tez is not specifically looking for the deprecated property, > {{amConf.get(entry.getKey())}} will find it during the iteration, if it is in > any of the merged xml property resources. > {{amConf.get(entry.getKey())}} will return null, and {{kvp.setValue(null)}} > will trigger NPE. > Suggested solution is to change to: > {code} > YarnConfiguration wrappedConf = new YarnConfiguration(amConf); > for (Entry entry : wrappedConf) { > PlanKeyValuePair.Builder kvp = PlanKeyValuePair.newBuilder(); > kvp.setKey(entry.getKey()); > kvp.setValue(wrappedConf.get(entry.getKey())); > builder.addConfKeyValues(kvp); > } > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (TEZ-3874) NPE in TezClientUtils when "yarn.resourcemanager.zk-address" is present in Configuration
[ https://issues.apache.org/jira/browse/TEZ-3874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhiyuan Yang updated TEZ-3874: -- Target Version/s: 0.9.next (was: 0.9.1) > NPE in TezClientUtils when "yarn.resourcemanager.zk-address" is present in > Configuration > > > Key: TEZ-3874 > URL: https://issues.apache.org/jira/browse/TEZ-3874 > Project: Apache Tez > Issue Type: Bug >Affects Versions: 0.9.1 >Reporter: Eric Wohlstadter >Priority: Blocker > Attachments: TEZ-3874.1.patch > > Original Estimate: 48h > Remaining Estimate: 48h > > "yarn.resourcemanager.zk-address" is deprecated in favor of > "hadoop.zk.address" for Hadoop 2.9+. > Configuration base class does't auto-translate the deprecation. Only > YarnConfiguration applies the translation. > In TezClientUtils.createFinalConfProtoForApp, a NPE is throw if > "yarn.resourcemanager.zk-address" is present in the Configuration. > {code} > for (Entry entry : amConf) { > PlanKeyValuePair.Builder kvp = PlanKeyValuePair.newBuilder(); > kvp.setKey(entry.getKey()); > kvp.setValue(amConf.get(entry.getKey())); > builder.addConfKeyValues(kvp); > } > {code} > Even though Tez is not specifically looking for the deprecated property, > {{amConf.get(entry.getKey())}} will find it during the iteration, if it is in > any of the merged xml property resources. > {{amConf.get(entry.getKey())}} will return null, and {{kvp.setValue(null)}} > will trigger NPE. > Suggested solution is to change to: > {code} > YarnConfiguration wrappedConf = new YarnConfiguration(amConf); > for (Entry entry : wrappedConf) { > PlanKeyValuePair.Builder kvp = PlanKeyValuePair.newBuilder(); > kvp.setKey(entry.getKey()); > kvp.setValue(wrappedConf.get(entry.getKey())); > builder.addConfKeyValues(kvp); > } > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (TEZ-3874) NPE in TezClientUtils when "yarn.resourcemanager.zk-address" is present in Configuration
[ https://issues.apache.org/jira/browse/TEZ-3874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16288349#comment-16288349 ] Zhiyuan Yang commented on TEZ-3874: --- Move this to 0.9.2 as offline discussion with [~ewohlstadter]. > NPE in TezClientUtils when "yarn.resourcemanager.zk-address" is present in > Configuration > > > Key: TEZ-3874 > URL: https://issues.apache.org/jira/browse/TEZ-3874 > Project: Apache Tez > Issue Type: Bug >Affects Versions: 0.9.1 >Reporter: Eric Wohlstadter >Priority: Blocker > Attachments: TEZ-3874.1.patch > > Original Estimate: 48h > Remaining Estimate: 48h > > "yarn.resourcemanager.zk-address" is deprecated in favor of > "hadoop.zk.address" for Hadoop 2.9+. > Configuration base class does't auto-translate the deprecation. Only > YarnConfiguration applies the translation. > In TezClientUtils.createFinalConfProtoForApp, a NPE is throw if > "yarn.resourcemanager.zk-address" is present in the Configuration. > {code} > for (Entry entry : amConf) { > PlanKeyValuePair.Builder kvp = PlanKeyValuePair.newBuilder(); > kvp.setKey(entry.getKey()); > kvp.setValue(amConf.get(entry.getKey())); > builder.addConfKeyValues(kvp); > } > {code} > Even though Tez is not specifically looking for the deprecated property, > {{amConf.get(entry.getKey())}} will find it during the iteration, if it is in > any of the merged xml property resources. > {{amConf.get(entry.getKey())}} will return null, and {{kvp.setValue(null)}} > will trigger NPE. > Suggested solution is to change to: > {code} > YarnConfiguration wrappedConf = new YarnConfiguration(amConf); > for (Entry entry : wrappedConf) { > PlanKeyValuePair.Builder kvp = PlanKeyValuePair.newBuilder(); > kvp.setKey(entry.getKey()); > kvp.setValue(wrappedConf.get(entry.getKey())); > builder.addConfKeyValues(kvp); > } > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (TEZ-3694) Adopt YARN-5007 in MiniTezCluster
[ https://issues.apache.org/jira/browse/TEZ-3694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16288313#comment-16288313 ] Zhiyuan Yang commented on TEZ-3694: --- I'm going to get another jenkins run and commit this if things go smoothly. The constructor is deprecated anyway and we should adopt it. > Adopt YARN-5007 in MiniTezCluster > - > > Key: TEZ-3694 > URL: https://issues.apache.org/jira/browse/TEZ-3694 > Project: Apache Tez > Issue Type: Sub-task >Reporter: Zhiyuan Yang >Assignee: Zhiyuan Yang > Attachments: TEZ-3694.1.patch > > > Master branch won't build on hadoop trunk because YARN-5007 removed enableAHS > param from MiniYarnCluster ctor, which breaks MiniTezCluster. We should adopt > the change and use config to enable timeline service. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Resolved] (TEZ-3855) Allow vertex manager to send event to processor
[ https://issues.apache.org/jira/browse/TEZ-3855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhiyuan Yang resolved TEZ-3855. --- Resolution: Fixed Release Note: Issue was addressed in TEZ-3867. > Allow vertex manager to send event to processor > --- > > Key: TEZ-3855 > URL: https://issues.apache.org/jira/browse/TEZ-3855 > Project: Apache Tez > Issue Type: Bug >Reporter: Zhiyuan Yang >Assignee: Zhiyuan Yang >Priority: Blocker > Fix For: 0.9.1 > > Attachments: TEZ-3855.1.patch, TEZ-3855.2.patch, TEZ-3855.3.patch, > TEZ-3855.addendum.patch, TEZ-3855.prototype.patch > > > Hive is trying to propagate some info from vertex manager to processor. The > task framework support processor event but there is no interface for VM to > send event out. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (TEZ-3867) testSendCustomProcessorEvent try to get array out of read only ByteBuffer
[ https://issues.apache.org/jira/browse/TEZ-3867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16288282#comment-16288282 ] Zhiyuan Yang commented on TEZ-3867: --- Thanks [~kshukla] for review! I'll add this to 0.9.1 release. > testSendCustomProcessorEvent try to get array out of read only ByteBuffer > - > > Key: TEZ-3867 > URL: https://issues.apache.org/jira/browse/TEZ-3867 > Project: Apache Tez > Issue Type: Bug >Reporter: Zhiyuan Yang >Assignee: Zhiyuan Yang > Fix For: 0.9.1 > > Attachments: TEZ-3867.1.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (TEZ-3855) Allow vertex manager to send event to processor
[ https://issues.apache.org/jira/browse/TEZ-3855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16257410#comment-16257410 ] Zhiyuan Yang commented on TEZ-3855: --- [~jeagles] Sorry, my bad. Should have get a jenkins run for addendum patch... I've made a patch and use TEZ-3867 to get a jenkins run. > Allow vertex manager to send event to processor > --- > > Key: TEZ-3855 > URL: https://issues.apache.org/jira/browse/TEZ-3855 > Project: Apache Tez > Issue Type: Bug >Reporter: Zhiyuan Yang >Assignee: Zhiyuan Yang >Priority: Blocker > Fix For: 0.9.1 > > Attachments: TEZ-3855.1.patch, TEZ-3855.2.patch, TEZ-3855.3.patch, > TEZ-3855.addendum.patch, TEZ-3855.prototype.patch > > > Hive is trying to propagate some info from vertex manager to processor. The > task framework support processor event but there is no interface for VM to > send event out. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (TEZ-3867) testSendCustomProcessorEvent try to get array out of read only ByteBuffer
[ https://issues.apache.org/jira/browse/TEZ-3867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhiyuan Yang updated TEZ-3867: -- Attachment: TEZ-3867.1.patch > testSendCustomProcessorEvent try to get array out of read only ByteBuffer > - > > Key: TEZ-3867 > URL: https://issues.apache.org/jira/browse/TEZ-3867 > Project: Apache Tez > Issue Type: Bug >Reporter: Zhiyuan Yang >Assignee: Zhiyuan Yang > Attachments: TEZ-3867.1.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (TEZ-3867) testSendCustomProcessorEvent try to get array out of read only ByteBuffer
Zhiyuan Yang created TEZ-3867: - Summary: testSendCustomProcessorEvent try to get array out of read only ByteBuffer Key: TEZ-3867 URL: https://issues.apache.org/jira/browse/TEZ-3867 Project: Apache Tez Issue Type: Bug Reporter: Zhiyuan Yang Assignee: Zhiyuan Yang -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Comment Edited] (TEZ-3846) Tez AM may not clean up properly on an internal error
[ https://issues.apache.org/jira/browse/TEZ-3846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16256030#comment-16256030 ] Zhiyuan Yang edited comment on TEZ-3846 at 11/16/17 10:03 PM: -- [~ewohlstadter] It's done in TEZ-3858. Do you want to investigate on this one? If so, feel free to take it over. was (Author: aplusplus): [~ewohlstadter] It's done in TEZ-3858. > Tez AM may not clean up properly on an internal error > - > > Key: TEZ-3846 > URL: https://issues.apache.org/jira/browse/TEZ-3846 > Project: Apache Tez > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Zhiyuan Yang > > Normally, in Hive we blindly reopen the session on any submit error; however > I accidentally broke that, and while investigating noticed a new error before > reopen that claims that session where a DAG has failed is still running a > DAG. Looks like it should either clean up, or if we assume OOM is not > clean-up-able, die completely. > {noformat} > 2017-09-28T01:07:12,352 INFO [3d4e3f44-40c5-431a-b3de-801d60c1c579 main] > client.TezClient: Submitted dag to TezSession, > sessionName=HIVE-35a0e5c9-ce27-4b27-824c-ce9bc0fe104d, > applicationId=application_1506585924598_0001, > dagId=dag_1506585924598_0001_53, dagName=SELECT count(1) FROM ( > ... > 2017-09-28T01:07:25,787 ERROR [3d4e3f44-40c5-431a-b3de-801d60c1c579 main] > SessionState: Status: Failed > 2017-09-28T01:07:25,787 ERROR [3d4e3f44-40c5-431a-b3de-801d60c1c579 main] > SessionState: Vertex failed, vertexName=Map 61, > vertexId=vertex_1506585924598_0001_53_01, diagnostics=[Vertex > vertex_1506585924598_0001_53_01 [Map 61] killed/failed due > to:ROOT_INPUT_INIT_FAILURE, Vertex Input: src initializer failed, > vertex=vertex_1506585924598_0001_53_01 [Map 61], java.lang.OutOfMemoryError: > GC overhead limit exceeded > 2017-09-28T01:07:25,787 ERROR [3d4e3f44-40c5-431a-b3de-801d60c1c579 main] > SessionState: Invalid event V_INTERNAL_ERROR on Vertex > vertex_1506585924598_0001_53_00 [Map 60] > 2017-09-28T01:07:25,787 DEBUG [3d4e3f44-40c5-431a-b3de-801d60c1c579 main] > log.PerfLogger: end=1506586045787 duration=13435 > from=org.apache.hadoop.hive.ql.exec.tez.monitoring.TezJobMonitor> > ... [reuse] > 2017-09-28T01:07:28,459 INFO [11108166-069e-43d7-9e21-25b9214d01a4 main] > client.TezClient: Submitting dag to TezSession, > sessionName=HIVE-35a0e5c9-ce27-4b27-824c-ce9bc0fe104d, > applicationId=application_1506585924598_0001, dagName=insert overwrite table > orc_ppd_staging s...s(Stage-1), callerContext={ context=HIVE, > callerType=HIVE_QUERY_ID, > callerId=hiveptest_20170928010728_58f19d98-85da-4fad-83a7-7bf3aa0252a7 } > 2017-09-28T01:07:35,259 INFO [11108166-069e-43d7-9e21-25b9214d01a4 main] > exec.Task: Dag submit failed due to App master already running a DAG > {noformat} > Session continues living and failing like that multiple times. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Comment Edited] (TEZ-3846) Tez AM may not clean up properly on an internal error
[ https://issues.apache.org/jira/browse/TEZ-3846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16256030#comment-16256030 ] Zhiyuan Yang edited comment on TEZ-3846 at 11/16/17 10:01 PM: -- [~ewohlstadter] It's done in TEZ-3858. was (Author: aplusplus): [~EricWohlstadter] It's done in TEZ-3858. > Tez AM may not clean up properly on an internal error > - > > Key: TEZ-3846 > URL: https://issues.apache.org/jira/browse/TEZ-3846 > Project: Apache Tez > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Zhiyuan Yang > > Normally, in Hive we blindly reopen the session on any submit error; however > I accidentally broke that, and while investigating noticed a new error before > reopen that claims that session where a DAG has failed is still running a > DAG. Looks like it should either clean up, or if we assume OOM is not > clean-up-able, die completely. > {noformat} > 2017-09-28T01:07:12,352 INFO [3d4e3f44-40c5-431a-b3de-801d60c1c579 main] > client.TezClient: Submitted dag to TezSession, > sessionName=HIVE-35a0e5c9-ce27-4b27-824c-ce9bc0fe104d, > applicationId=application_1506585924598_0001, > dagId=dag_1506585924598_0001_53, dagName=SELECT count(1) FROM ( > ... > 2017-09-28T01:07:25,787 ERROR [3d4e3f44-40c5-431a-b3de-801d60c1c579 main] > SessionState: Status: Failed > 2017-09-28T01:07:25,787 ERROR [3d4e3f44-40c5-431a-b3de-801d60c1c579 main] > SessionState: Vertex failed, vertexName=Map 61, > vertexId=vertex_1506585924598_0001_53_01, diagnostics=[Vertex > vertex_1506585924598_0001_53_01 [Map 61] killed/failed due > to:ROOT_INPUT_INIT_FAILURE, Vertex Input: src initializer failed, > vertex=vertex_1506585924598_0001_53_01 [Map 61], java.lang.OutOfMemoryError: > GC overhead limit exceeded > 2017-09-28T01:07:25,787 ERROR [3d4e3f44-40c5-431a-b3de-801d60c1c579 main] > SessionState: Invalid event V_INTERNAL_ERROR on Vertex > vertex_1506585924598_0001_53_00 [Map 60] > 2017-09-28T01:07:25,787 DEBUG [3d4e3f44-40c5-431a-b3de-801d60c1c579 main] > log.PerfLogger: end=1506586045787 duration=13435 > from=org.apache.hadoop.hive.ql.exec.tez.monitoring.TezJobMonitor> > ... [reuse] > 2017-09-28T01:07:28,459 INFO [11108166-069e-43d7-9e21-25b9214d01a4 main] > client.TezClient: Submitting dag to TezSession, > sessionName=HIVE-35a0e5c9-ce27-4b27-824c-ce9bc0fe104d, > applicationId=application_1506585924598_0001, dagName=insert overwrite table > orc_ppd_staging s...s(Stage-1), callerContext={ context=HIVE, > callerType=HIVE_QUERY_ID, > callerId=hiveptest_20170928010728_58f19d98-85da-4fad-83a7-7bf3aa0252a7 } > 2017-09-28T01:07:35,259 INFO [11108166-069e-43d7-9e21-25b9214d01a4 main] > exec.Task: Dag submit failed due to App master already running a DAG > {noformat} > Session continues living and failing like that multiple times. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (TEZ-3846) Tez AM may not clean up properly on an internal error
[ https://issues.apache.org/jira/browse/TEZ-3846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16256030#comment-16256030 ] Zhiyuan Yang commented on TEZ-3846: --- [~EricWohlstadter] It's done in TEZ-3858. > Tez AM may not clean up properly on an internal error > - > > Key: TEZ-3846 > URL: https://issues.apache.org/jira/browse/TEZ-3846 > Project: Apache Tez > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Zhiyuan Yang > > Normally, in Hive we blindly reopen the session on any submit error; however > I accidentally broke that, and while investigating noticed a new error before > reopen that claims that session where a DAG has failed is still running a > DAG. Looks like it should either clean up, or if we assume OOM is not > clean-up-able, die completely. > {noformat} > 2017-09-28T01:07:12,352 INFO [3d4e3f44-40c5-431a-b3de-801d60c1c579 main] > client.TezClient: Submitted dag to TezSession, > sessionName=HIVE-35a0e5c9-ce27-4b27-824c-ce9bc0fe104d, > applicationId=application_1506585924598_0001, > dagId=dag_1506585924598_0001_53, dagName=SELECT count(1) FROM ( > ... > 2017-09-28T01:07:25,787 ERROR [3d4e3f44-40c5-431a-b3de-801d60c1c579 main] > SessionState: Status: Failed > 2017-09-28T01:07:25,787 ERROR [3d4e3f44-40c5-431a-b3de-801d60c1c579 main] > SessionState: Vertex failed, vertexName=Map 61, > vertexId=vertex_1506585924598_0001_53_01, diagnostics=[Vertex > vertex_1506585924598_0001_53_01 [Map 61] killed/failed due > to:ROOT_INPUT_INIT_FAILURE, Vertex Input: src initializer failed, > vertex=vertex_1506585924598_0001_53_01 [Map 61], java.lang.OutOfMemoryError: > GC overhead limit exceeded > 2017-09-28T01:07:25,787 ERROR [3d4e3f44-40c5-431a-b3de-801d60c1c579 main] > SessionState: Invalid event V_INTERNAL_ERROR on Vertex > vertex_1506585924598_0001_53_00 [Map 60] > 2017-09-28T01:07:25,787 DEBUG [3d4e3f44-40c5-431a-b3de-801d60c1c579 main] > log.PerfLogger: end=1506586045787 duration=13435 > from=org.apache.hadoop.hive.ql.exec.tez.monitoring.TezJobMonitor> > ... [reuse] > 2017-09-28T01:07:28,459 INFO [11108166-069e-43d7-9e21-25b9214d01a4 main] > client.TezClient: Submitting dag to TezSession, > sessionName=HIVE-35a0e5c9-ce27-4b27-824c-ce9bc0fe104d, > applicationId=application_1506585924598_0001, dagName=insert overwrite table > orc_ppd_staging s...s(Stage-1), callerContext={ context=HIVE, > callerType=HIVE_QUERY_ID, > callerId=hiveptest_20170928010728_58f19d98-85da-4fad-83a7-7bf3aa0252a7 } > 2017-09-28T01:07:35,259 INFO [11108166-069e-43d7-9e21-25b9214d01a4 main] > exec.Task: Dag submit failed due to App master already running a DAG > {noformat} > Session continues living and failing like that multiple times. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (TEZ-3855) Allow vertex manager to send event to processor
[ https://issues.apache.org/jira/browse/TEZ-3855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16254264#comment-16254264 ] Zhiyuan Yang commented on TEZ-3855: --- [~gopalv] It is sufficient. With addendum patch, buffer object is sealed intact within the event. It can be get multiple times, either for heartbeat or processor. Thanks for review! I'll commit this soon. > Allow vertex manager to send event to processor > --- > > Key: TEZ-3855 > URL: https://issues.apache.org/jira/browse/TEZ-3855 > Project: Apache Tez > Issue Type: Bug >Reporter: Zhiyuan Yang >Assignee: Zhiyuan Yang >Priority: Blocker > Fix For: 0.9.1 > > Attachments: TEZ-3855.1.patch, TEZ-3855.2.patch, TEZ-3855.3.patch, > TEZ-3855.addendum.patch, TEZ-3855.prototype.patch > > > Hive is trying to propagate some info from vertex manager to processor. The > task framework support processor event but there is no interface for VM to > send event out. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (TEZ-3837) Parallel sorting with inline sampling
[ https://issues.apache.org/jira/browse/TEZ-3837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhiyuan Yang updated TEZ-3837: -- Priority: Major (was: Blocker) > Parallel sorting with inline sampling > - > > Key: TEZ-3837 > URL: https://issues.apache.org/jira/browse/TEZ-3837 > Project: Apache Tez > Issue Type: New Feature >Reporter: Zhiyuan Yang >Assignee: Zhiyuan Yang > Attachments: Parallel Sorting In Tez.pdf, TEZ-3837.1.patch, > TEZ-3837.2.patch, TEZ-3837.3.patch, TEZ-3837.4.patch, TEZ-3837.5.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (TEZ-3855) Allow vertex manager to send event to processor
[ https://issues.apache.org/jira/browse/TEZ-3855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhiyuan Yang updated TEZ-3855: -- Priority: Blocker (was: Major) > Allow vertex manager to send event to processor > --- > > Key: TEZ-3855 > URL: https://issues.apache.org/jira/browse/TEZ-3855 > Project: Apache Tez > Issue Type: Bug >Reporter: Zhiyuan Yang >Assignee: Zhiyuan Yang >Priority: Blocker > Fix For: 0.9.1 > > Attachments: TEZ-3855.1.patch, TEZ-3855.2.patch, TEZ-3855.3.patch, > TEZ-3855.addendum.patch, TEZ-3855.prototype.patch > > > Hive is trying to propagate some info from vertex manager to processor. The > task framework support processor event but there is no interface for VM to > send event out. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (TEZ-3837) Parallel sorting with inline sampling
[ https://issues.apache.org/jira/browse/TEZ-3837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhiyuan Yang updated TEZ-3837: -- Priority: Blocker (was: Major) > Parallel sorting with inline sampling > - > > Key: TEZ-3837 > URL: https://issues.apache.org/jira/browse/TEZ-3837 > Project: Apache Tez > Issue Type: New Feature >Reporter: Zhiyuan Yang >Assignee: Zhiyuan Yang >Priority: Blocker > Attachments: Parallel Sorting In Tez.pdf, TEZ-3837.1.patch, > TEZ-3837.2.patch, TEZ-3837.3.patch, TEZ-3837.4.patch, TEZ-3837.5.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (TEZ-3855) Allow vertex manager to send event to processor
[ https://issues.apache.org/jira/browse/TEZ-3855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhiyuan Yang updated TEZ-3855: -- Attachment: TEZ-3855.addendum.patch Just work with [~djaiswal] and find out this new event itself broke fault tolerance. Previously code returns original ByteBuffer for consuming, leaving empty buffer for next time. Attached addendum patch to fix the issue.[~gopalv] Can you help review? > Allow vertex manager to send event to processor > --- > > Key: TEZ-3855 > URL: https://issues.apache.org/jira/browse/TEZ-3855 > Project: Apache Tez > Issue Type: Bug >Reporter: Zhiyuan Yang >Assignee: Zhiyuan Yang > Fix For: 0.9.1 > > Attachments: TEZ-3855.1.patch, TEZ-3855.2.patch, TEZ-3855.3.patch, > TEZ-3855.addendum.patch, TEZ-3855.prototype.patch > > > Hive is trying to propagate some info from vertex manager to processor. The > task framework support processor event but there is no interface for VM to > send event out. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Reopened] (TEZ-3855) Allow vertex manager to send event to processor
[ https://issues.apache.org/jira/browse/TEZ-3855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhiyuan Yang reopened TEZ-3855: --- > Allow vertex manager to send event to processor > --- > > Key: TEZ-3855 > URL: https://issues.apache.org/jira/browse/TEZ-3855 > Project: Apache Tez > Issue Type: Bug >Reporter: Zhiyuan Yang >Assignee: Zhiyuan Yang > Fix For: 0.9.1 > > Attachments: TEZ-3855.1.patch, TEZ-3855.2.patch, TEZ-3855.3.patch, > TEZ-3855.prototype.patch > > > Hive is trying to propagate some info from vertex manager to processor. The > task framework support processor event but there is no interface for VM to > send event out. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (TEZ-3837) Parallel sorting with inline sampling
[ https://issues.apache.org/jira/browse/TEZ-3837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhiyuan Yang updated TEZ-3837: -- Attachment: TEZ-3837.5.patch > Parallel sorting with inline sampling > - > > Key: TEZ-3837 > URL: https://issues.apache.org/jira/browse/TEZ-3837 > Project: Apache Tez > Issue Type: New Feature >Reporter: Zhiyuan Yang >Assignee: Zhiyuan Yang > Attachments: Parallel Sorting In Tez.pdf, TEZ-3837.1.patch, > TEZ-3837.2.patch, TEZ-3837.3.patch, TEZ-3837.4.patch, TEZ-3837.5.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (TEZ-3864) Tez failed to intergrate with hadoop(2.8.2)
[ https://issues.apache.org/jira/browse/TEZ-3864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16250012#comment-16250012 ] Zhiyuan Yang commented on TEZ-3864: --- Interruption probably wasn't the culprit. You may want to find out who sent the interruption and why. > Tez failed to intergrate with hadoop(2.8.2) > --- > > Key: TEZ-3864 > URL: https://issues.apache.org/jira/browse/TEZ-3864 > Project: Apache Tez > Issue Type: Bug >Affects Versions: 0.9.0 >Reporter: Shen Yinjie > > When I intergrated tez(0.9.0) with hadoop(2.8.2), always failed to running > tez service check:orderedwordcount, > "hadoop --config /etc/hadoop/conf jar /usr/lib/tez/tez-examples*.jar > ordeib/tez/tez-examples*.jar orderedwordcount > /tmp/tezsmokeinput/sample-tez-test /tmp/tezsmokeoutput/" > But all containers could not run successfully, container logs just > print exceptions as follows: > "TaskAttempt 2 failed, info=[Error: Error while running task ( failure ) > : java.lang.RuntimeException: java.io.IOException: Failed on local exception: > java.nio.channels.ClosedByInterruptException; Host Details : local host is: > "wjf1-hc/xx.xx.xx.xx"; destination host is: "wjf1-hc":8020; > at > org.apache.hadoop.mapreduce.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.initNextRecordReader(TezGroupedSplitsInputFormat.java:209) > at > org.apache.hadoop.mapreduce.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.initialize(TezGroupedSplitsInputFormat.java:156) > at > org.apache.tez.mapreduce.lib.MRReaderMapReduce.setupNewRecordReader(MRReaderMapReduce.java:157) > at > org.apache.tez.mapreduce.lib.MRReaderMapReduce.setSplit(MRReaderMapReduce.java:88) > at > org.apache.tez.mapreduce.input.MRInput.initFromEventInternal(MRInput.java:703) > at org.apache.tez.mapreduce.input.MRInput.processSplitEvent(MRInput.java:631) > at org.apache.tez.mapreduce.input.MRInput.handleEvents(MRInput.java:590) > at > org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.handleEvent(LogicalIOProcessorRuntimeTask.java:719) > at > org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.access$600(LogicalIOProcessorRuntimeTask.java:106) > at > org.apache.tez.runtime.LogicalIOProcessorRuntimeTask$1.runInternal(LogicalIOProcessorRuntimeTask.java:796) > at org.apache.tez.common.RunnableWithNdc.run(RunnableWithNdc.java:35) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.io.IOException: Failed on local exception: > java.nio.channels.ClosedByInterruptException; Host Details : local host is: > "wjf1-hc/xx.xx.xx.xx"; destination host is: "wjf1-hc":8020; > at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:785) > at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1499) > at org.apache.hadoop.ipc.Client.call(Client.java:1441) > at org.apache.hadoop.ipc.Client.call(Client.java:1351) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:235) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116) > at com.sun.proxy.$Proxy14.getBlockLocations(Unknown Source) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getBlockLocations(ClientNamenodeProtocolTranslatorPB.java:259) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:409) > at > org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:163) > at > org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:155) > at > org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:346) > at com.sun.proxy.$Proxy15.getBlockLocations(Unknown Source) > at org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:830) > at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:819) > at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:808) > at > org.apache.hadoop.hdfs.DFSInputStream.fetchLocatedBlocksAndGetLastBlockLength(DFSInputStream.java:319) > at org.apache.hadoop.hdfs.DFSInputStream.openInfo(DFSInputStream.java:281) > at org.apache.hadoop.hdfs.DFSInputStream.(DFSInputStream.java:270) > at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:1119) > at > org.apache.hadoop.hdfs.DistributedFileSystem$4.doCall(DistributedFileSystem.java:343) > at > org.ap
[jira] [Updated] (TEZ-3855) Allow vertex manager to send event to processor
[ https://issues.apache.org/jira/browse/TEZ-3855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhiyuan Yang updated TEZ-3855: -- Fix Version/s: 0.9.1 > Allow vertex manager to send event to processor > --- > > Key: TEZ-3855 > URL: https://issues.apache.org/jira/browse/TEZ-3855 > Project: Apache Tez > Issue Type: Bug >Reporter: Zhiyuan Yang >Assignee: Zhiyuan Yang > Fix For: 0.9.1 > > Attachments: TEZ-3855.1.patch, TEZ-3855.2.patch, TEZ-3855.3.patch, > TEZ-3855.prototype.patch > > > Hive is trying to propagate some info from vertex manager to processor. The > task framework support processor event but there is no interface for VM to > send event out. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (TEZ-3855) Allow vertex manager to send event to processor
[ https://issues.apache.org/jira/browse/TEZ-3855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16248236#comment-16248236 ] Zhiyuan Yang commented on TEZ-3855: --- Thanks [~gopalv] for review! Patch was committed to master branch. {code} commit b96f79fa75dc6cf47e4d648b028ccb12f02308a6 Author: Zhiyuan Yang Date: Fri Nov 10 16:44:29 2017 -0800 TEZ-3855. Allow vertex manager to send event to processor (zhiyuany) {code} > Allow vertex manager to send event to processor > --- > > Key: TEZ-3855 > URL: https://issues.apache.org/jira/browse/TEZ-3855 > Project: Apache Tez > Issue Type: Bug >Reporter: Zhiyuan Yang >Assignee: Zhiyuan Yang > Attachments: TEZ-3855.1.patch, TEZ-3855.2.patch, TEZ-3855.3.patch, > TEZ-3855.prototype.patch > > > Hive is trying to propagate some info from vertex manager to processor. The > task framework support processor event but there is no interface for VM to > send event out. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (TEZ-3855) Allow vertex manager to send event to processor
[ https://issues.apache.org/jira/browse/TEZ-3855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhiyuan Yang updated TEZ-3855: -- Attachment: TEZ-3855.3.patch Added a way to allow tracing event to the sender AM according to [~gopalv]'s offline comments. Now events have app attempt id in version field. > Allow vertex manager to send event to processor > --- > > Key: TEZ-3855 > URL: https://issues.apache.org/jira/browse/TEZ-3855 > Project: Apache Tez > Issue Type: Bug >Reporter: Zhiyuan Yang >Assignee: Zhiyuan Yang > Attachments: TEZ-3855.1.patch, TEZ-3855.2.patch, TEZ-3855.3.patch, > TEZ-3855.prototype.patch > > > Hive is trying to propagate some info from vertex manager to processor. The > task framework support processor event but there is no interface for VM to > send event out. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (TEZ-3855) Allow vertex manager to send event to processor
[ https://issues.apache.org/jira/browse/TEZ-3855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16244595#comment-16244595 ] Zhiyuan Yang commented on TEZ-3855: --- Thanks [~gopalv]! AFAIK processor event framework has been there for long time but was never used. > Allow vertex manager to send event to processor > --- > > Key: TEZ-3855 > URL: https://issues.apache.org/jira/browse/TEZ-3855 > Project: Apache Tez > Issue Type: Bug >Reporter: Zhiyuan Yang >Assignee: Zhiyuan Yang > Attachments: TEZ-3855.1.patch, TEZ-3855.2.patch, > TEZ-3855.prototype.patch > > > Hive is trying to propagate some info from vertex manager to processor. The > task framework support processor event but there is no interface for VM to > send event out. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (TEZ-3855) Allow vertex manager to send event to processor
[ https://issues.apache.org/jira/browse/TEZ-3855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16244566#comment-16244566 ] Zhiyuan Yang commented on TEZ-3855: --- Ping [~rajesh.balamohan], [~gopalv] for review. Hive need have this in 0.9.1 release. > Allow vertex manager to send event to processor > --- > > Key: TEZ-3855 > URL: https://issues.apache.org/jira/browse/TEZ-3855 > Project: Apache Tez > Issue Type: Bug >Reporter: Zhiyuan Yang >Assignee: Zhiyuan Yang > Attachments: TEZ-3855.1.patch, TEZ-3855.2.patch, > TEZ-3855.prototype.patch > > > Hive is trying to propagate some info from vertex manager to processor. The > task framework support processor event but there is no interface for VM to > send event out. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (TEZ-3805) Analyzer: Add an analyzer to find out scheduling misses in 1:1 edges
[ https://issues.apache.org/jira/browse/TEZ-3805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhiyuan Yang updated TEZ-3805: -- Fix Version/s: (was: 0.9.next) 0.9.1 > Analyzer: Add an analyzer to find out scheduling misses in 1:1 edges > > > Key: TEZ-3805 > URL: https://issues.apache.org/jira/browse/TEZ-3805 > Project: Apache Tez > Issue Type: Improvement >Reporter: Rajesh Balamohan >Assignee: Rajesh Balamohan > Fix For: 0.9.1 > > Attachments: TEZ-3805.1.patch > > > When 1:1 edge is used, it would be helpful to find out whether downstream > tasks ran on the same location provided in the hints by the runtime. > One of the recent feature in upstream project (hive) used 1:1 edge. Instead > of checking the logs, it would be useful to have an analyzer to churn out the > details. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (TEZ-3858) Misleading dag level diagnostics in case of invalid vertex event
[ https://issues.apache.org/jira/browse/TEZ-3858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhiyuan Yang updated TEZ-3858: -- Fix Version/s: 0.9.1 > Misleading dag level diagnostics in case of invalid vertex event > > > Key: TEZ-3858 > URL: https://issues.apache.org/jira/browse/TEZ-3858 > Project: Apache Tez > Issue Type: Bug >Reporter: Zhiyuan Yang >Assignee: Zhiyuan Yang > Fix For: 0.9.1 > > Attachments: TEZ-3858.1.patch, TEZ-3858.2.patch > > > When a vertex gets invalid event, the state machine will be transited by > InternalErrorTransition. This transition prints this and adds it to dag > diagnostic: > {code} > ("Invalid event " + event.getType() + " on Vertex " + > vertex.getLogIdentifier() > {code} > But variable event here is V_INTERNAL_ERROR event instead of the event that > caused V_INTERNAL_ERROR. V_INTERNAL_ERROR is not the invalid event, the > original event is. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (TEZ-3858) Misleading dag level diagnostics in case of invalid vertex event
[ https://issues.apache.org/jira/browse/TEZ-3858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16244554#comment-16244554 ] Zhiyuan Yang commented on TEZ-3858: --- Thanks [~kshukla] for reviewing and committing this! > Misleading dag level diagnostics in case of invalid vertex event > > > Key: TEZ-3858 > URL: https://issues.apache.org/jira/browse/TEZ-3858 > Project: Apache Tez > Issue Type: Bug >Reporter: Zhiyuan Yang >Assignee: Zhiyuan Yang > Fix For: 0.9.1 > > Attachments: TEZ-3858.1.patch, TEZ-3858.2.patch > > > When a vertex gets invalid event, the state machine will be transited by > InternalErrorTransition. This transition prints this and adds it to dag > diagnostic: > {code} > ("Invalid event " + event.getType() + " on Vertex " + > vertex.getLogIdentifier() > {code} > But variable event here is V_INTERNAL_ERROR event instead of the event that > caused V_INTERNAL_ERROR. V_INTERNAL_ERROR is not the invalid event, the > original event is. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (TEZ-3837) Parallel sorting with inline sampling
[ https://issues.apache.org/jira/browse/TEZ-3837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhiyuan Yang updated TEZ-3837: -- Attachment: TEZ-3837.4.patch Fix some event related SerDe issue. > Parallel sorting with inline sampling > - > > Key: TEZ-3837 > URL: https://issues.apache.org/jira/browse/TEZ-3837 > Project: Apache Tez > Issue Type: New Feature >Reporter: Zhiyuan Yang >Assignee: Zhiyuan Yang > Attachments: Parallel Sorting In Tez.pdf, TEZ-3837.1.patch, > TEZ-3837.2.patch, TEZ-3837.3.patch, TEZ-3837.4.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (TEZ-3837) Parallel sorting with inline sampling
[ https://issues.apache.org/jira/browse/TEZ-3837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhiyuan Yang updated TEZ-3837: -- Attachment: TEZ-3837.3.patch Fix javadoc warning > Parallel sorting with inline sampling > - > > Key: TEZ-3837 > URL: https://issues.apache.org/jira/browse/TEZ-3837 > Project: Apache Tez > Issue Type: New Feature >Reporter: Zhiyuan Yang >Assignee: Zhiyuan Yang > Attachments: Parallel Sorting In Tez.pdf, TEZ-3837.1.patch, > TEZ-3837.2.patch, TEZ-3837.3.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (TEZ-3837) Parallel sorting with inline sampling
[ https://issues.apache.org/jira/browse/TEZ-3837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhiyuan Yang updated TEZ-3837: -- Attachment: TEZ-3837.2.patch > Parallel sorting with inline sampling > - > > Key: TEZ-3837 > URL: https://issues.apache.org/jira/browse/TEZ-3837 > Project: Apache Tez > Issue Type: New Feature >Reporter: Zhiyuan Yang >Assignee: Zhiyuan Yang >Priority: Major > Attachments: Parallel Sorting In Tez.pdf, TEZ-3837.1.patch, > TEZ-3837.2.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (TEZ-3855) Allow vertex manager to send event to processor
[ https://issues.apache.org/jira/browse/TEZ-3855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhiyuan Yang updated TEZ-3855: -- Attachment: (was: TEZ-3855.2.patch) > Allow vertex manager to send event to processor > --- > > Key: TEZ-3855 > URL: https://issues.apache.org/jira/browse/TEZ-3855 > Project: Apache Tez > Issue Type: Bug >Reporter: Zhiyuan Yang >Assignee: Zhiyuan Yang >Priority: Major > Attachments: TEZ-3855.1.patch, TEZ-3855.2.patch, > TEZ-3855.prototype.patch > > > Hive is trying to propagate some info from vertex manager to processor. The > task framework support processor event but there is no interface for VM to > send event out. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (TEZ-3855) Allow vertex manager to send event to processor
[ https://issues.apache.org/jira/browse/TEZ-3855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhiyuan Yang updated TEZ-3855: -- Attachment: TEZ-3855.2.patch > Allow vertex manager to send event to processor > --- > > Key: TEZ-3855 > URL: https://issues.apache.org/jira/browse/TEZ-3855 > Project: Apache Tez > Issue Type: Bug >Reporter: Zhiyuan Yang >Assignee: Zhiyuan Yang >Priority: Major > Attachments: TEZ-3855.1.patch, TEZ-3855.2.patch, > TEZ-3855.prototype.patch > > > Hive is trying to propagate some info from vertex manager to processor. The > task framework support processor event but there is no interface for VM to > send event out. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (TEZ-3855) Allow vertex manager to send event to processor
[ https://issues.apache.org/jira/browse/TEZ-3855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhiyuan Yang updated TEZ-3855: -- Attachment: TEZ-3855.2.patch [~rajesh.balamohan] Thanks for taking a look! That task attempt id won't be used anyway since this is a task level event. I've changed it to -1 and add comments in new patch. > Allow vertex manager to send event to processor > --- > > Key: TEZ-3855 > URL: https://issues.apache.org/jira/browse/TEZ-3855 > Project: Apache Tez > Issue Type: Bug >Reporter: Zhiyuan Yang >Assignee: Zhiyuan Yang >Priority: Major > Attachments: TEZ-3855.1.patch, TEZ-3855.2.patch, > TEZ-3855.prototype.patch > > > Hive is trying to propagate some info from vertex manager to processor. The > task framework support processor event but there is no interface for VM to > send event out. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (TEZ-3837) Parallel sorting with inline sampling
[ https://issues.apache.org/jira/browse/TEZ-3837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhiyuan Yang updated TEZ-3837: -- Attachment: (was: TEZ-3837.1.patch.example) > Parallel sorting with inline sampling > - > > Key: TEZ-3837 > URL: https://issues.apache.org/jira/browse/TEZ-3837 > Project: Apache Tez > Issue Type: New Feature >Reporter: Zhiyuan Yang >Assignee: Zhiyuan Yang >Priority: Major > Attachments: Parallel Sorting In Tez.pdf, TEZ-3837.1.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Resolved] (TEZ-3838) API for enabling sampler and specify configuration
[ https://issues.apache.org/jira/browse/TEZ-3838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhiyuan Yang resolved TEZ-3838. --- Resolution: Won't Fix Close since it's included in TEZ-3837 patch. > API for enabling sampler and specify configuration > -- > > Key: TEZ-3838 > URL: https://issues.apache.org/jira/browse/TEZ-3838 > Project: Apache Tez > Issue Type: Sub-task >Reporter: Zhiyuan Yang >Assignee: Zhiyuan Yang >Priority: Major > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (TEZ-3837) Parallel sorting with inline sampling
[ https://issues.apache.org/jira/browse/TEZ-3837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhiyuan Yang reassigned TEZ-3837: - Assignee: Zhiyuan Yang > Parallel sorting with inline sampling > - > > Key: TEZ-3837 > URL: https://issues.apache.org/jira/browse/TEZ-3837 > Project: Apache Tez > Issue Type: New Feature >Reporter: Zhiyuan Yang >Assignee: Zhiyuan Yang >Priority: Major > Attachments: Parallel Sorting In Tez.pdf, TEZ-3837.1.patch, > TEZ-3837.1.patch.example > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)