from:"Zhiyuan Yang \(JIRA\)"

[jira] [Commented] (TEZ-3718) Better handling of 'bad' nodes

2018-07-18 Thread Zhiyuan Yang (JIRA)



[ 
https://issues.apache.org/jira/browse/TEZ-3718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16548343#comment-16548343
 ] 

Zhiyuan Yang commented on TEZ-3718:
---

This one has been pending review for long. Review is greatly appreciated. But 
feel free to drop this from the release. 

> Better handling of 'bad' nodes
> --
>
> Key: TEZ-3718
> URL: https://issues.apache.org/jira/browse/TEZ-3718
> Project: Apache Tez
>  Issue Type: Improvement
>Reporter: Siddharth Seth
>Assignee: Zhiyuan Yang
>Priority: Major
> Attachments: TEZ-3718.1.patch, TEZ-3718.2.patch, TEZ-3718.3.patch, 
> TEZ-3718.4.patch
>
>
> At the moment, the default behaviour in case of a node being marked bad is to 
> do nothing other than not schedule new tasks on this node.
> The alternate, via config, is to retroactively kill every task which ran on 
> the node, which causes far too many unnecessary re-runs.
> Proposing the following changes.
> 1. KILL fragments which are currently in the RUNNING state (instead of 
> relying on a timeout which leads to the attempt being marked as FAILED after 
> the timeout interval.
> 2. Keep track of these failed nodes, and use this as input to the failure 
> heuristics. Normally source tasks require multiple consumers to report 
> failure for them to be marked as bad. If a single consumer reports failure 
> against a source which ran on a bad node, consider it bad and re-schedule 
> immediately. (Otherwise failures can take a while to propagate, and jobs get 
> a lot slower).
> [~jlowe] - think you've looked at this in the past. Any thoughts/suggestions.
> What I'm seeing is retroactive failures taking a long time to apply, and 
> restart sources which ran on a bad node. Also running tasks being counted as 
> FAILURES instead of KILLS.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (TEZ-3694) Adopt YARN-5007 in MiniTezCluster

2018-07-18 Thread Zhiyuan Yang (JIRA)



 [ 
https://issues.apache.org/jira/browse/TEZ-3694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhiyuan Yang updated TEZ-3694:
--
Attachment: TEZ-3694.2.patch

> Adopt YARN-5007 in MiniTezCluster
> -
>
> Key: TEZ-3694
> URL: https://issues.apache.org/jira/browse/TEZ-3694
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Zhiyuan Yang
>Assignee: Zhiyuan Yang
>Priority: Major
> Attachments: TEZ-3694.1.patch, TEZ-3694.2.patch
>
>
> Master branch won't build on hadoop trunk because YARN-5007 removed enableAHS 
> param from MiniYarnCluster ctor, which breaks MiniTezCluster. We should adopt 
> the change and use config to enable timeline service.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (TEZ-3694) Adopt YARN-5007 in MiniTezCluster

2018-07-18 Thread Zhiyuan Yang (JIRA)



[ 
https://issues.apache.org/jira/browse/TEZ-3694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16548335#comment-16548335
 ] 

Zhiyuan Yang commented on TEZ-3694:
---

I see hadoop version has been raised to 3.0.3 in TEZ-3955. Probably the patch 
here already work. Let me kick off another jenkins run for it. But feel free to 
drop this from the release.

> Adopt YARN-5007 in MiniTezCluster
> -
>
> Key: TEZ-3694
> URL: https://issues.apache.org/jira/browse/TEZ-3694
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Zhiyuan Yang
>Assignee: Zhiyuan Yang
>Priority: Major
> Attachments: TEZ-3694.1.patch, TEZ-3694.2.patch
>
>
> Master branch won't build on hadoop trunk because YARN-5007 removed enableAHS 
> param from MiniYarnCluster ctor, which breaks MiniTezCluster. We should adopt 
> the change and use config to enable timeline service.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Closed] (TEZ-3803) Tasks can get killed due to insufficient progress while waiting for shuffle inputs to complete

2018-01-04 Thread Zhiyuan Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/TEZ-3803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhiyuan Yang closed TEZ-3803.
-

> Tasks can get killed due to insufficient progress while waiting for shuffle 
> inputs to complete
> --
>
> Key: TEZ-3803
> URL: https://issues.apache.org/jira/browse/TEZ-3803
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
>Priority: Critical
> Fix For: 0.9.1
>
> Attachments: TEZ-3803.001.patch, TEZ-3803.002.patch, 
> TEZ-3803.003.patch, TEZ-3803.004.patch, TEZ-3803.005.patch
>
>
> In a scenario where a downstream task has no slow start and gets started 
> before all its shuffle inputs are done, the task can timeout as the wait does 
> not notify progress( set the "progress is being made bit") like it does in 
> MapReduce.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Closed] (TEZ-3832) TEZ DAG status shows SUCCEEDED for SUCCEEDED_WITH_FAILURES final status

2018-01-04 Thread Zhiyuan Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/TEZ-3832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhiyuan Yang closed TEZ-3832.
-

> TEZ DAG status shows SUCCEEDED for SUCCEEDED_WITH_FAILURES final status
> ---
>
> Key: TEZ-3832
> URL: https://issues.apache.org/jira/browse/TEZ-3832
> Project: Apache Tez
>  Issue Type: Bug
>  Components: UI
>Reporter: Jonathan Eagles
>Assignee: Jonathan Eagles
> Fix For: 0.9.1
>
> Attachments: TEZ-3832.001.patch
>
>
> This is a regression from Tez 0.7 UI. Relevant changes are made to the 
> dag/index, home/index, and app/dags routes.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Closed] (TEZ-3844) Tez UI Dag Counters show no records for a RUNNING DAG.

2018-01-04 Thread Zhiyuan Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/TEZ-3844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhiyuan Yang closed TEZ-3844.
-

> Tez UI Dag Counters show no records for a RUNNING DAG.
> --
>
> Key: TEZ-3844
> URL: https://issues.apache.org/jira/browse/TEZ-3844
> Project: Apache Tez
>  Issue Type: Bug
>  Components: UI
>Reporter: Kuhu Shukla
>Assignee: Jonathan Eagles
> Fix For: 0.9.1
>
> Attachments: TEZ-3844.001.patch
>
>
> A Running DAG shows no counters under "DAG Counters" tab even though the Dag 
> Overview page shows REST response with counters coming through. CC: 
> [~Sreenath].



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Closed] (TEZ-3862) Tez UI: Upgrade em-tgraph to version 0.0.14

2018-01-04 Thread Zhiyuan Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/TEZ-3862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhiyuan Yang closed TEZ-3862.
-

> Tez UI: Upgrade em-tgraph to version 0.0.14
> ---
>
> Key: TEZ-3862
> URL: https://issues.apache.org/jira/browse/TEZ-3862
> Project: Apache Tez
>  Issue Type: Bug
>  Components: UI
>Reporter: Jonathan Eagles
>Assignee: Jonathan Eagles
>Priority: Trivial
> Fix For: 0.9.1
>
> Attachments: TEZ-3862.001.patch
>
>
> There have been notable improvements that can be pulled in, that will make 
> viewing graphs easier.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Closed] (TEZ-3861) PipelineSorter setting negative progess

2018-01-04 Thread Zhiyuan Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/TEZ-3861?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhiyuan Yang closed TEZ-3861.
-

> PipelineSorter setting negative progess
> ---
>
> Key: TEZ-3861
> URL: https://issues.apache.org/jira/browse/TEZ-3861
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.9.1
>Reporter: Prasanth Jayachandran
>Assignee: Rajesh Balamohan
> Fix For: 0.9.1
>
> Attachments: TEZ-3861.2.patch, TezTR-545720_1_1_2_0_0.log
>
>
> PipelineSorter is generating too big log mostly coming from setting progress 
> to negative value in some cases. 
> {code}
> 2017-10-30T01:22:16,466 DEBUG [TezTR-702853_1_1_2_0_0] util.Progress: Illegal 
> progress value found, progress is less than 0. Progress will be changed to 0
> 2017-10-30T01:22:16,469 DEBUG [TezTR-702853_1_1_2_0_0] util.Progress: Illegal 
> progress value found, progress is less than 0. Progress will be changed to 0
> 2017-10-30T01:22:16,469 DEBUG [TezTR-702853_1_1_2_0_0] util.Progress: Illegal 
> progress value found, progress is less than 0. Progress will be changed to 0
> 2017-10-30T01:22:16,470 DEBUG [TezTR-702853_1_1_2_0_0] util.Progress: Illegal 
> progress value found, progress is less than 0. Progress will be changed to 0
> 2017-10-30T01:22:16,470 DEBUG [TezTR-702853_1_1_2_0_0] util.Progress: Illegal 
> progress value found, progress is less than 0. Progress will be changed to 0
> {code}
> this is emitted from
> https://github.com/apache/tez/blob/87d7c145ffc71707d1d393fddf94efa2a77d8822/tez-runtime-library/src/main/java/org/apache/tez/runtime/library/common/sort/impl/PipelinedSorter.java#L1126



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Closed] (TEZ-3855) Allow vertex manager to send event to processor

2018-01-04 Thread Zhiyuan Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/TEZ-3855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhiyuan Yang closed TEZ-3855.
-

> Allow vertex manager to send event to processor
> ---
>
> Key: TEZ-3855
> URL: https://issues.apache.org/jira/browse/TEZ-3855
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Zhiyuan Yang
>Assignee: Zhiyuan Yang
>Priority: Blocker
> Fix For: 0.9.1
>
> Attachments: TEZ-3855.1.patch, TEZ-3855.2.patch, TEZ-3855.3.patch, 
> TEZ-3855.addendum.patch, TEZ-3855.prototype.patch
>
>
> Hive is trying to propagate some info from vertex manager to processor. The 
> task framework support processor event but there is no interface for VM to 
> send event out.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Closed] (TEZ-3869) Analyzer: Fix VertexInfo::getLastTaskToFinish comparison

2018-01-04 Thread Zhiyuan Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/TEZ-3869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhiyuan Yang closed TEZ-3869.
-

> Analyzer: Fix VertexInfo::getLastTaskToFinish comparison
> 
>
> Key: TEZ-3869
> URL: https://issues.apache.org/jira/browse/TEZ-3869
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
>Priority: Minor
> Fix For: 0.9.1
>
> Attachments: TEZ-3869.1.patch, TEZ-3869.2.patch
>
>
> {{VertexInfo::getLastTaskToFinish}} incorrectly compares with 
> getStartTimeInterval. This needs to be fixed. Observed timsort exceptions 
> when analyzing some dag zips. 
> {code}
> java.lang.IllegalArgumentException: Comparison method violates its general 
> contract!
>   at java.util.TimSort.mergeHi(TimSort.java:895)
>   at java.util.TimSort.mergeAt(TimSort.java:512)
>   at java.util.TimSort.mergeForceCollapse(TimSort.java:453)
>   at java.util.TimSort.sort(TimSort.java:250)
>   at java.util.Arrays.sort(Arrays.java:1435)
>   at java.util.Collections.sort(Collections.java:230)
>   at 
> org.apache.tez.history.parser.datamodel.VertexInfo.getLastTaskToFinish(VertexInfo.java:542)
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Closed] (TEZ-3804) FetcherOrderedGrouped#setupLocalDiskFetch should ignore empty partition records

2018-01-04 Thread Zhiyuan Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/TEZ-3804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhiyuan Yang closed TEZ-3804.
-

> FetcherOrderedGrouped#setupLocalDiskFetch should ignore empty partition 
> records
> ---
>
> Key: TEZ-3804
> URL: https://issues.apache.org/jira/browse/TEZ-3804
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
> Fix For: 0.9.1
>
> Attachments: TEZ-3804.001.patch
>
>
> Similar to the copyMapOutput() logic, local fetches can also ignore 
> indexRecords that are empty (hasData == false) to avoid duplicate fetch 
> warnings.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Closed] (TEZ-3876) Bug in local mode distributed cache files

2018-01-04 Thread Zhiyuan Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/TEZ-3876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhiyuan Yang closed TEZ-3876.
-

> Bug in local mode distributed cache files
> -
>
> Key: TEZ-3876
> URL: https://issues.apache.org/jira/browse/TEZ-3876
> Project: Apache Tez
>  Issue Type: Task
>Reporter: Jacob Tolar
>Assignee: Jacob Tolar
>Priority: Minor
> Fix For: 0.9.1
>
> Attachments: TEZ-3876.2.patch, TEZ-3876.3.patch
>
>
> If multiple symlinks to the same resource are requested, only one is created. 
> See TEZ-3848



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Closed] (TEZ-3825) Tez UI DAGs page can't query RUNNING or SUBMITTED apps

2018-01-04 Thread Zhiyuan Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/TEZ-3825?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhiyuan Yang closed TEZ-3825.
-

> Tez UI DAGs page can't query RUNNING or SUBMITTED apps
> --
>
> Key: TEZ-3825
> URL: https://issues.apache.org/jira/browse/TEZ-3825
> Project: Apache Tez
>  Issue Type: Bug
>  Components: UI
>Reporter: Jonathan Eagles
>Assignee: Jonathan Eagles
> Fix For: 0.9.1
>
> Attachments: TEZ-3825.001.patch
>
>
> status is only a primary filter when a final dag status is set. RUNNING and 
> SUBMITTED status can't be added as a final status so it must be set to 
> secondaryFilter



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Closed] (TEZ-3845) Tez UI Cleanup Stats Table

2018-01-04 Thread Zhiyuan Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/TEZ-3845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhiyuan Yang closed TEZ-3845.
-

> Tez UI Cleanup Stats Table
> --
>
> Key: TEZ-3845
> URL: https://issues.apache.org/jira/browse/TEZ-3845
> Project: Apache Tez
>  Issue Type: Bug
>  Components: UI
>Reporter: Jonathan Eagles
>Assignee: Jonathan Eagles
> Fix For: 0.9.1
>
> Attachments: TEZ-3845.001.patch, after_stats.png, before_stats.png
>
>
> Removed redundant status (for example: Succeeded Tasks: 10 Succeeded)
> Made total tasks links
> Added killed/failed task attempts available on the dag/index/ page
> Reordered Stats to be consistent across all pages.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Closed] (TEZ-3857) Tez TaskImpl can throw Invalid state transition for leaf tasks that do Retro Active Transition

2018-01-04 Thread Zhiyuan Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/TEZ-3857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhiyuan Yang closed TEZ-3857.
-

> Tez TaskImpl can throw Invalid state transition for leaf tasks that do Retro 
> Active Transition
> --
>
> Key: TEZ-3857
> URL: https://issues.apache.org/jira/browse/TEZ-3857
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
> Fix For: 0.9.1
>
> Attachments: TEZ-3857.001.patch, TEZ-3857.002.patch, 
> TEZ-3857.003.patch
>
>
> {code}
> Invalid event T_ATTEMPT_FAILED on Task task_1234_5678_1_01_01
> {code}
> The task had more than one running attempts (because of speculative 
> execution), while one of them succeeded and the task was marked succeeded, 
> the second failed and caused the Task state machine to enter error state 
> since the task was in a leaf vertex and does the following:
> {code}
> if (task.leafVertex) {
> LOG.error("Unexpected event for task of leaf vertex " + 
> event.getType() + ", taskId: "
> + task.getTaskId());
> task.internalError(event.getType());
>   }
> {code}
> This JIRA tracks fixing this invalid state.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Closed] (TEZ-3752) Reduce Object size of InMemoryMapOutput for large jobs

2018-01-04 Thread Zhiyuan Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/TEZ-3752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhiyuan Yang closed TEZ-3752.
-

> Reduce Object size of InMemoryMapOutput for large jobs
> --
>
> Key: TEZ-3752
> URL: https://issues.apache.org/jira/browse/TEZ-3752
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Jonathan Eagles
>Assignee: Muhammad Samir Khan
> Fix For: 0.9.1
>
> Attachments: TEZ-3752.001.patch
>
>
> Follow-on jira from TEZ-3732. The InMemoryMapOutput has a 
> BoundedByteArrayOutputStream that is only used in the Merged MapOutput case. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Closed] (TEZ-3431) Add unit tests for container release

2018-01-04 Thread Zhiyuan Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/TEZ-3431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhiyuan Yang closed TEZ-3431.
-

> Add unit tests for container release
> 
>
> Key: TEZ-3431
> URL: https://issues.apache.org/jira/browse/TEZ-3431
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Sushmitha Sreenivasan
>Assignee: Taklon Stephen Wu
>  Labels: newbie
> Fix For: 0.9.1
>
> Attachments: TEZ-3431.1.patch, TEZ-3431.2.patch, TEZ-3431.patch
>
>
> * Add unit tests to verify that scheduler release container after expiry 
> time(HeldContainer.containerExpiryTime).
> ** This add a local cluster mock test for releasing container when 
> HeldContainer.containerExpiryTime is older than current date time in 
> milliseconds and container is not new.
> ** Also, this commit refactor the common variables appHost, appPort, appUrl 
> and appMsg to default constant values. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Closed] (TEZ-3847) AM web controller task counters are empty sometimes

2018-01-04 Thread Zhiyuan Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/TEZ-3847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhiyuan Yang closed TEZ-3847.
-

> AM web controller task counters are empty sometimes
> ---
>
> Key: TEZ-3847
> URL: https://issues.apache.org/jira/browse/TEZ-3847
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Jonathan Eagles
>Assignee: Jonathan Eagles
> Fix For: 0.9.1
>
> Attachments: TEZ-3847.001.patch, TEZ-3847.002.patch, 
> TEZ-3847.003.patch
>
>
> The interval for statistics and counters are send at longer intervals and the 
> TaskAttemptImpl blindly overwrites it stats and counters with null.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Closed] (TEZ-3834) TaskSchedulerManager NullPointerException during shutdown when failed to start

2018-01-04 Thread Zhiyuan Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/TEZ-3834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhiyuan Yang closed TEZ-3834.
-

> TaskSchedulerManager NullPointerException during shutdown when failed to start
> --
>
> Key: TEZ-3834
> URL: https://issues.apache.org/jira/browse/TEZ-3834
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Jonathan Eagles
>Assignee: Jonathan Eagles
> Fix For: 0.9.1
>
> Attachments: TEZ-3834.001.patch, TEZ-3834.002.patch, 
> TEZ-3834.003.patch
>
>
> {noformat:title=NPE 1}
> 2017-09-14 12:16:48,259 [ERROR] [main] |rm.TaskSchedulerManager|: Failed to 
> do a clean initiateStop for Scheduler: [0:TezYarn]
> java.lang.NullPointerException
>   at 
> org.apache.tez.dag.app.rm.TaskSchedulerManager.initiateStop(TaskSchedulerManager.java:696)
>   at 
> org.apache.tez.dag.app.DAGAppMaster.initiateStop(DAGAppMaster.java:2223)
>   at 
> org.apache.tez.dag.app.DAGAppMaster.serviceStop(DAGAppMaster.java:2239)
>   at 
> org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
>   at 
> org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52)
>   at 
> org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80)
>   at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:203)
>   at org.apache.tez.dag.app.DAGAppMaster$9.run(DAGAppMaster.java:2707)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1936)
>   at 
> org.apache.tez.dag.app.DAGAppMaster.initAndStartAppMaster(DAGAppMaster.java:2703)
>   at org.apache.tez.dag.app.DAGAppMaster.main(DAGAppMaster.java:2508)
> {noformat}
> {noformat:title=NPE 2}
> 2017-09-14 12:16:48,610 [ERROR] [main] |rm.TaskSchedulerManager|: Error in 
> TaskScheduler when checking if a scheduler has unregistered, 
> scheduler=[0:TezYarn]
> java.lang.NullPointerException
>   at 
> org.apache.tez.dag.app.rm.TaskSchedulerManager.hasUnregistered(TaskSchedulerManager.java:998)
>   at 
> org.apache.tez.dag.app.DAGAppMaster.serviceStop(DAGAppMaster.java:2252)
>   at 
> org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
>   at 
> org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52)
>   at 
> org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80)
>   at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:203)
>   at org.apache.tez.dag.app.DAGAppMaster$9.run(DAGAppMaster.java:2707)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1936)
>   at 
> org.apache.tez.dag.app.DAGAppMaster.initAndStartAppMaster(DAGAppMaster.java:2703)
>   at org.apache.tez.dag.app.DAGAppMaster.main(DAGAppMaster.java:2508)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Closed] (TEZ-3850) Enable header as sort button on Tez UI

2018-01-04 Thread Zhiyuan Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/TEZ-3850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhiyuan Yang closed TEZ-3850.
-

> Enable header as sort button on Tez UI
> --
>
> Key: TEZ-3850
> URL: https://issues.apache.org/jira/browse/TEZ-3850
> Project: Apache Tez
>  Issue Type: Sub-task
>  Components: UI
>Reporter: Jonathan Eagles
>Assignee: Jonathan Eagles
> Fix For: 0.9.1
>
> Attachments: TEZ-3850.001.patch, TEZ-3850.002.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Closed] (TEZ-3805) Analyzer: Add an analyzer to find out scheduling misses in 1:1 edges

2018-01-04 Thread Zhiyuan Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/TEZ-3805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhiyuan Yang closed TEZ-3805.
-

> Analyzer: Add an analyzer to find out scheduling misses in 1:1 edges
> 
>
> Key: TEZ-3805
> URL: https://issues.apache.org/jira/browse/TEZ-3805
> Project: Apache Tez
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
> Fix For: 0.9.1
>
> Attachments: TEZ-3805.1.patch
>
>
> When 1:1 edge is used, it would be helpful to find out whether downstream 
> tasks ran on the same location provided in the hints by the runtime. 
> One of the recent feature in upstream project (hive) used 1:1 edge. Instead 
> of checking the logs, it would be useful to have an analyzer to churn out the 
> details.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Closed] (TEZ-3797) Add tez debug tool for comparing counters of 2 DAGs

2018-01-04 Thread Zhiyuan Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/TEZ-3797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhiyuan Yang closed TEZ-3797.
-

> Add tez debug tool for comparing counters of 2 DAGs
> ---
>
> Key: TEZ-3797
> URL: https://issues.apache.org/jira/browse/TEZ-3797
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Fix For: 0.9.1
>
> Attachments: TEZ-3797.1.patch, counter-diff.png
>
>
> Will be useful for debugging to have a simple script that just compares the 
> counters from 2 different dag runs. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Closed] (TEZ-3666) Integer overflow in ShuffleVertexManagerBase

2018-01-04 Thread Zhiyuan Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/TEZ-3666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhiyuan Yang closed TEZ-3666.
-

> Integer overflow in ShuffleVertexManagerBase
> 
>
> Key: TEZ-3666
> URL: https://issues.apache.org/jira/browse/TEZ-3666
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Ming Ma
>Assignee: Ming Ma
> Fix For: 0.9.1
>
> Attachments: TEZ-3666-2.patch, TEZ-3666.patch
>
>
> In function getExpectedStatsInAtIndex, {{statsInMB[index] * numTasks / 
> numVMEventsReceived}} could cause Integer overflow, for example when 
> statsInMB[index]  == 3 and numTasks == 20.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Closed] (TEZ-3843) Tez UI Vertex/Tasks log links for running tasks are missing

2018-01-04 Thread Zhiyuan Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/TEZ-3843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhiyuan Yang closed TEZ-3843.
-

> Tez UI Vertex/Tasks log links for running tasks are missing
> ---
>
> Key: TEZ-3843
> URL: https://issues.apache.org/jira/browse/TEZ-3843
> Project: Apache Tez
>  Issue Type: Bug
>  Components: UI
>Reporter: Jonathan Eagles
>Assignee: Jonathan Eagles
> Fix For: 0.9.1
>
> Attachments: TEZ-3843.001.patch, relatedentities.png
>
>
> task serialization mistakenly getting list of attempts under 
> otherinfo.relatedentities. relatedentities is a top level property are 
> serialization should reflect this.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Closed] (TEZ-3813) Reduce Object size of MemoryFetchedInput for large jobs

2018-01-04 Thread Zhiyuan Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/TEZ-3813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhiyuan Yang closed TEZ-3813.
-

> Reduce Object size of MemoryFetchedInput for large jobs
> ---
>
> Key: TEZ-3813
> URL: https://issues.apache.org/jira/browse/TEZ-3813
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Muhammad Samir Khan
>Assignee: Muhammad Samir Khan
> Fix For: 0.9.1
>
> Attachments: TEZ-3813.001.patch, TEZ-3813.002.patch, 
> TEZ-3813.003.patch, TEZ-3813.004.patch, TEZ-3813.005.patch, TEZ-3813.006.patch
>
>
> Same as TEZ-3752 for the unordered case. MemoryFetchedInput has a 
> BoundedByteArrayOutputStream that is not used (only the underlying byte[] is 
> used).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Closed] (TEZ-3833) Tasks should report codec errors during shuffle as fetch failures

2018-01-04 Thread Zhiyuan Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/TEZ-3833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhiyuan Yang closed TEZ-3833.
-

> Tasks should report codec errors during shuffle as fetch failures
> -
>
> Key: TEZ-3833
> URL: https://issues.apache.org/jira/browse/TEZ-3833
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
> Fix For: 0.9.1
>
> Attachments: TEZ-3833.001.patch, TEZ-3833.002.patch, 
> TEZ-3833.003.patch, TEZ-3833.004.patch, TEZ-3833.005.patch
>
>
> Do the equivalent of https://issues.apache.org/jira/browse/MAPREDUCE-6633 so 
> that compression errors do not prove fatal for the DAG/tasks.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Closed] (TEZ-3858) Misleading dag level diagnostics in case of invalid vertex event

2018-01-04 Thread Zhiyuan Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/TEZ-3858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhiyuan Yang closed TEZ-3858.
-

> Misleading dag level diagnostics in case of invalid vertex event
> 
>
> Key: TEZ-3858
> URL: https://issues.apache.org/jira/browse/TEZ-3858
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Zhiyuan Yang
>Assignee: Zhiyuan Yang
> Fix For: 0.9.1
>
> Attachments: TEZ-3858.1.patch, TEZ-3858.2.patch
>
>
> When a vertex gets invalid event, the state machine will be transited by 
> InternalErrorTransition. This transition prints this and adds it to dag 
> diagnostic: 
> {code}
> ("Invalid event " + event.getType() + " on Vertex " + 
> vertex.getLogIdentifier()
> {code}
> But variable event here is V_INTERNAL_ERROR event instead of the event that 
> caused V_INTERNAL_ERROR. V_INTERNAL_ERROR is not the invalid event, the 
> original event is.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Closed] (TEZ-3828) Allow relaxing locality when retried task's priority is kept same

2018-01-04 Thread Zhiyuan Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/TEZ-3828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhiyuan Yang closed TEZ-3828.
-

> Allow relaxing locality when retried task's priority is kept same 
> --
>
> Key: TEZ-3828
> URL: https://issues.apache.org/jira/browse/TEZ-3828
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Zhiyuan Yang
>Assignee: Zhiyuan Yang
> Fix For: 0.9.1
>
> Attachments: TEZ-3828.1.patch, TEZ-3828.2.patch, TEZ-3828.3.patch
>
>
> TEZ-3716 introduced the conf to keep priority for retried task, but there is 
> no way to relax locality requirement in that case.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Closed] (TEZ-3854) Make use of new improved em-table sort-icon

2018-01-04 Thread Zhiyuan Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/TEZ-3854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhiyuan Yang closed TEZ-3854.
-

> Make use of new improved em-table sort-icon
> ---
>
> Key: TEZ-3854
> URL: https://issues.apache.org/jira/browse/TEZ-3854
> Project: Apache Tez
>  Issue Type: Sub-task
>Reporter: Jonathan Eagles
>Assignee: Jonathan Eagles
> Fix For: 0.9.1
>
> Attachments: TEZ-3854.001.patch
>
>
> em-table 0.11.3 uses improved table column sort-icon. This jira updates 
> em-table version and makes changes to fully support the new feature.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Closed] (TEZ-3816) Ability to automatically speculate single-task vertices

2018-01-04 Thread Zhiyuan Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/TEZ-3816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhiyuan Yang closed TEZ-3816.
-

> Ability to automatically speculate single-task vertices
> ---
>
> Key: TEZ-3816
> URL: https://issues.apache.org/jira/browse/TEZ-3816
> Project: Apache Tez
>  Issue Type: Improvement
>Reporter: Muhammad Samir Khan
>Assignee: Muhammad Samir Khan
> Fix For: 0.9.1
>
> Attachments: TEZ-3816.001.patch, TEZ-3816.002.patch, 
> TEZ-3816.003.patch
>
>
> When a single-task vertex is unlucky, it lands on a very slow node. 
> Speculation doesn't currently apply when there are no other tasks to compare 
> with. It would be good to have a configurable timeout after which the tasks 
> automatically speculate.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Closed] (TEZ-3807) InMemoryWriter is not tested with RLE enabled

2018-01-04 Thread Zhiyuan Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/TEZ-3807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhiyuan Yang closed TEZ-3807.
-

> InMemoryWriter is not tested with RLE enabled
> -
>
> Key: TEZ-3807
> URL: https://issues.apache.org/jira/browse/TEZ-3807
> Project: Apache Tez
>  Issue Type: Test
>Reporter: Muhammad Samir Khan
>Assignee: Muhammad Samir Khan
> Fix For: 0.9.1
>
> Attachments: TEZ-3807.001.patch, TEZ-3807.002.patch
>
>
> In TestIFile, A couple of test cases are supposed to test InMemoryWriter with 
> RLE enabled but the RLE flag is turned off.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Closed] (TEZ-3836) Tez UI task page sort does not work on RHEL7/Fedora

2018-01-04 Thread Zhiyuan Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/TEZ-3836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhiyuan Yang closed TEZ-3836.
-

> Tez UI task page sort does not work on RHEL7/Fedora
> ---
>
> Key: TEZ-3836
> URL: https://issues.apache.org/jira/browse/TEZ-3836
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Kuhu Shukla
>Assignee: Sreenath Somarajapuram
> Fix For: 0.9.1
>
> Attachments: TEZ-3836.1.patch
>
>
> Irrespective of the browser, linux machines have trouble rendering the sort 
> arrows near the edge of the columns. Resizing the column does not solve the 
> problem either. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Closed] (TEZ-3801) Update version in master to 0.9.1

2018-01-04 Thread Zhiyuan Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/TEZ-3801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhiyuan Yang closed TEZ-3801.
-

> Update version in master to 0.9.1
> -
>
> Key: TEZ-3801
> URL: https://issues.apache.org/jira/browse/TEZ-3801
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Zhiyuan Yang
>Assignee: Zhiyuan Yang
> Fix For: 0.9.1
>
> Attachments: TEZ-3801.1.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Closed] (TEZ-3831) Reduce Unordered memory needed for storing empty completed events

2018-01-04 Thread Zhiyuan Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/TEZ-3831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhiyuan Yang closed TEZ-3831.
-

> Reduce Unordered memory needed for storing empty completed events
> -
>
> Key: TEZ-3831
> URL: https://issues.apache.org/jira/browse/TEZ-3831
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Jonathan Eagles
>Assignee: Jonathan Eagles
> Fix For: 0.9.1
>
> Attachments: Screen Shot 2017-09-13 at 4.55.11 PM.png, 
> TEZ-3831.001-addendum.patch, TEZ-3831.001.patch
>
>
> the completedInputs blocking queue is used to store inputs for the 
> UnorderedKVReader to consume. With Auto-reduce parallelism enabled and nearly 
> all empty inputs, the reader can't prune the empty events from the blocking 
> queue fast enough to keep up. In my scenario, an OOM occurred. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Closed] (TEZ-3839) Tez Shuffle Handler prints disk error stack traces for every read failure.

2018-01-04 Thread Zhiyuan Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/TEZ-3839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhiyuan Yang closed TEZ-3839.
-

> Tez Shuffle Handler prints disk error stack traces for every read failure.
> --
>
> Key: TEZ-3839
> URL: https://issues.apache.org/jira/browse/TEZ-3839
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
> Fix For: 0.9.1
>
> Attachments: TEZ-3839.001.patch
>
>
> Do the equivalent MAPREDUCE-6960 for the Tez Shuffle Handler. This will avoid 
> filling up the logs with disk error exceptions for every read.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Closed] (TEZ-3827) TEZ Vertex status on DAG index page shows SUCCEEDED for SUCCEEDED_WITH_FAILURES final status

2018-01-04 Thread Zhiyuan Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/TEZ-3827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhiyuan Yang closed TEZ-3827.
-

> TEZ Vertex status on DAG index page shows SUCCEEDED for 
> SUCCEEDED_WITH_FAILURES final status
> 
>
> Key: TEZ-3827
> URL: https://issues.apache.org/jira/browse/TEZ-3827
> Project: Apache Tez
>  Issue Type: Bug
>  Components: UI
>Reporter: Jonathan Eagles
>Assignee: Jonathan Eagles
> Fix For: 0.9.1
>
> Attachments: TEZ-3827.001.patch
>
>
> Vertex details page has a more advance final status with SUCCEEDED with 
> FAILURES. This adds that logic to the DAG details vertex table as well.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Closed] (TEZ-3840) Tez should write TEZ_DAG_ID before TEZ_EXTRA_INFO

2018-01-04 Thread Zhiyuan Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/TEZ-3840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhiyuan Yang closed TEZ-3840.
-

> Tez should write TEZ_DAG_ID before TEZ_EXTRA_INFO
> -
>
> Key: TEZ-3840
> URL: https://issues.apache.org/jira/browse/TEZ-3840
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Jonathan Eagles
>Assignee: Jonathan Eagles
> Fix For: 0.9.1
>
> Attachments: TEZ-3840.001.addendum.patch, TEZ-3840.001.patch
>
>
> The relation added from EXTRA_INFO to DAG_ID is added before DAG_ID is 
> written and will add the relation ship and auto-vivify the the DAG_ID entity. 
> Writing them in the other order is more natural.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Closed] (TEZ-3724) Tez UI on HTTP "corrects" HTTPS REST calls to HTTP

2018-01-04 Thread Zhiyuan Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/TEZ-3724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhiyuan Yang closed TEZ-3724.
-

> Tez UI on HTTP "corrects" HTTPS REST calls to HTTP
> --
>
> Key: TEZ-3724
> URL: https://issues.apache.org/jira/browse/TEZ-3724
> Project: Apache Tez
>  Issue Type: Bug
>  Components: UI
>Reporter: Jonathan Eagles
>Assignee: Jonathan Eagles
> Fix For: 0.9.1
>
> Attachments: TEZ-3724.1.patch, TEZ-3724.2.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Closed] (TEZ-3848) Tez Local mode doesn't localize distributed cache files

2018-01-04 Thread Zhiyuan Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/TEZ-3848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhiyuan Yang closed TEZ-3848.
-

> Tez Local mode doesn't localize distributed cache files
> ---
>
> Key: TEZ-3848
> URL: https://issues.apache.org/jira/browse/TEZ-3848
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Jacob Tolar
>Assignee: Jacob Tolar
> Fix For: 0.9.1
>
> Attachments: TEZ-3848.1.patch
>
>
> Tez doesn't symlink LocalResources into place in LocalContainerLauncher.
> In YARN mode, Yarn takes care of this when it launches the container. But in 
> local mode, if you're depending on a file existing in the distributed cache, 
> it's never symlinked into place (so you're out of luck).
> We test our pig scripts in local mode and have some tools to set up the 
> distributed cache the same way it would work in production. This works fine 
> in Mapreduce mode but are unable to use Pig + Tez local mode for testing due 
> to this problem.
> I have a fix working and will submit a PR once I rebase it.
> [~jeagles] [~wla...@yahoo-inc.com]



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Closed] (TEZ-3252) [Umbrella] Enable support for Hadoop-3.x

2018-01-04 Thread Zhiyuan Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/TEZ-3252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhiyuan Yang closed TEZ-3252.
-

> [Umbrella] Enable support for Hadoop-3.x 
> -
>
> Key: TEZ-3252
> URL: https://issues.apache.org/jira/browse/TEZ-3252
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Hitesh Shah
> Fix For: 0.9.1
>
> Attachments: TEZ-3252.patch
>
>
> Placeholder umbrella to track the various issues/tasks discovered to get full 
> stable functionality against hadoop-3.x once it is released in a stable form. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Closed] (TEZ-3856) API to access counters in InputInitializerContext

2018-01-04 Thread Zhiyuan Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/TEZ-3856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhiyuan Yang closed TEZ-3856.
-

> API to access counters in InputInitializerContext
> -
>
> Key: TEZ-3856
> URL: https://issues.apache.org/jira/browse/TEZ-3856
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.9.1
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Fix For: 0.9.1
>
> Attachments: TEZ-3856.1.patch, TEZ-3856.2.patch, TEZ-3856.2.patch, 
> TEZ-3856.3.patch
>
>
> Hive would like to publish some counters related to input splits during split 
> generation. Tez doesn't expose TezCounters via InputIntializerContext. This 
> ticket is to expose TezCounters via InputInitializerContext so that counters 
> can be accessed during split generation.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Closed] (TEZ-3830) HistoryEventTimelineConversion should not hard code the Task state.

2018-01-04 Thread Zhiyuan Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/TEZ-3830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhiyuan Yang closed TEZ-3830.
-

> HistoryEventTimelineConversion should not hard code the Task state.
> ---
>
> Key: TEZ-3830
> URL: https://issues.apache.org/jira/browse/TEZ-3830
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
> Fix For: 0.9.1
>
> Attachments: TEZ-3830.001.patch
>
>
> TaskStartedEvent can have the state of the task so that the HistoryConversion 
> does not require task state to be hardcoded to SCHEDULED.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Closed] (TEZ-3853) Binary incompatibility caused by DEFAULT_LOG_LEVEL

2018-01-04 Thread Zhiyuan Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/TEZ-3853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhiyuan Yang closed TEZ-3853.
-

> Binary incompatibility caused by DEFAULT_LOG_LEVEL
> --
>
> Key: TEZ-3853
> URL: https://issues.apache.org/jira/browse/TEZ-3853
> Project: Apache Tez
>  Issue Type: Sub-task
>Affects Versions: 0.9.0
>Reporter: Aihua Xu
>Assignee: Zhiyuan Yang
> Fix For: 0.9.1
>
> Attachments: TEZ-3853.1.patch
>
>
> Hive is moving to support hadoop 3.0 in HIVE-15016. As we find out that 
> hadoop introduced some incompatible changes in 3.0 which requires Tez to 
> support hadoop 3.0 as well in order for hive to integrate with Tez. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Closed] (TEZ-3212) IFile throws NegativeArraySizeException for value sizes between 1GB and 2GB

2018-01-04 Thread Zhiyuan Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/TEZ-3212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhiyuan Yang closed TEZ-3212.
-

> IFile throws NegativeArraySizeException for value sizes between 1GB and 2GB
> ---
>
> Key: TEZ-3212
> URL: https://issues.apache.org/jira/browse/TEZ-3212
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Jonathan Eagles
>Assignee: Muhammad Samir Khan
> Fix For: 0.9.1
>
> Attachments: TEZ-3212.1.patch, tez-3212.002.patch, 
> tez-3212.003.patch, tez-3212.004.patch, tez-3212.005.patch
>
>
> This is not a regression with respect to MR, just an issue that was 
> encountered with a job whose IFile record values (which can be of max size 
> 2GB) which can be successfully written but not successfully read.
> Failure while running task:java.lang.NegativeArraySizeException
>   at 
> org.apache.tez.runtime.library.common.sort.impl.IFile$Reader.nextRawValue(IFile.java:765)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Closed] (TEZ-3852) Optimize ContainerContext.isSuperSet to speed container reuse decisions

2018-01-04 Thread Zhiyuan Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/TEZ-3852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhiyuan Yang closed TEZ-3852.
-

> Optimize ContainerContext.isSuperSet to speed container reuse decisions
> ---
>
> Key: TEZ-3852
> URL: https://issues.apache.org/jira/browse/TEZ-3852
> Project: Apache Tez
>  Issue Type: Improvement
>Reporter: Jonathan Eagles
>Assignee: Jonathan Eagles
> Fix For: 0.9.1
>
> Attachments: TEZ-3852.001.patch, TEZ-3852.002.patch, 
> TEZ-3852.003.patch
>
>
> Found an AM that was consuming high CPU. The stack trace below shows that 
> container reuse compatibility check with a high number of local resources was 
> the culprit.
> {noformat:title=task scheduler compatibility check}
> "DelayedContainerManager" #112 prio=5 os_prio=0 tid=0x03b59800 
> nid=0x1edba runnable [0x7fe13c232000]
>java.lang.Thread.State: RUNNABLE
>   at java.util.HashMap.putVal(HashMap.java:628)
>   at java.util.HashMap.putMapEntries(HashMap.java:514)
>   at java.util.HashMap.(HashMap.java:489)
>   at 
> org.apache.tez.dag.app.ContainerContext.localResourcesCompatible(ContainerContext.java:132)
>   at 
> org.apache.tez.dag.app.ContainerContext.isSuperSet(ContainerContext.java:116)
>   at 
> org.apache.tez.dag.app.rm.container.ContainerContextMatcher.isSuperSet(ContainerContextMatcher.java:50)
>   at 
> org.apache.tez.dag.app.rm.YarnTaskSchedulerService.canAssignTaskToContainer(YarnTaskSchedulerService.java:1543)
>   at 
> org.apache.tez.dag.app.rm.YarnTaskSchedulerService.getMatchingRequestWithoutPriority(YarnTaskSchedulerService.java:1492)
>   at 
> org.apache.tez.dag.app.rm.YarnTaskSchedulerService.access$500(YarnTaskSchedulerService.java:85)
>   at 
> org.apache.tez.dag.app.rm.YarnTaskSchedulerService$NodeLocalContainerAssigner.assignReUsedContainer(YarnTaskSchedulerService.java:1870)
>   at 
> org.apache.tez.dag.app.rm.YarnTaskSchedulerService.assignReUsedContainerWithLocation(YarnTaskSchedulerService.java:1754)
>   - locked <0x0006e0d12600> (a 
> org.apache.tez.dag.app.rm.YarnTaskSchedulerService)
>   at 
> org.apache.tez.dag.app.rm.YarnTaskSchedulerService.assignReUsedContainersWithLocation(YarnTaskSchedulerService.java:1712)
>   - locked <0x0006e0d12600> (a 
> org.apache.tez.dag.app.rm.YarnTaskSchedulerService)
>   at 
> org.apache.tez.dag.app.rm.YarnTaskSchedulerService.tryAssignReUsedContainers(YarnTaskSchedulerService.java:578)
>   - locked <0x0006e0d12600> (a 
> org.apache.tez.dag.app.rm.YarnTaskSchedulerService)
>   at 
> org.apache.tez.dag.app.rm.YarnTaskSchedulerService.access$800(YarnTaskSchedulerService.java:85)
>   at 
> org.apache.tez.dag.app.rm.YarnTaskSchedulerService$DelayedContainerManager.doAssignAll(YarnTaskSchedulerService.java:2103)
>   - locked <0x0006e0d12600> (a 
> org.apache.tez.dag.app.rm.YarnTaskSchedulerService)
>   at 
> org.apache.tez.dag.app.rm.YarnTaskSchedulerService$DelayedContainerManager.mainLoop(YarnTaskSchedulerService.java:1984)
>   at 
> org.apache.tez.dag.app.rm.YarnTaskSchedulerService$DelayedContainerManager.run(YarnTaskSchedulerService.java:1974)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Closed] (TEZ-3868) Update website to factor in the TEZ trademark registration

2018-01-04 Thread Zhiyuan Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/TEZ-3868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhiyuan Yang closed TEZ-3868.
-

> Update website to factor in the TEZ trademark registration
> --
>
> Key: TEZ-3868
> URL: https://issues.apache.org/jira/browse/TEZ-3868
> Project: Apache Tez
>  Issue Type: Task
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
> Fix For: 0.9.1
>
> Attachments: TEZ-3868.01.patch, TEZ-3868.02.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Closed] (TEZ-3867) testSendCustomProcessorEvent try to get array out of read only ByteBuffer

2018-01-04 Thread Zhiyuan Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/TEZ-3867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhiyuan Yang closed TEZ-3867.
-

> testSendCustomProcessorEvent try to get array out of read only ByteBuffer
> -
>
> Key: TEZ-3867
> URL: https://issues.apache.org/jira/browse/TEZ-3867
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Zhiyuan Yang
>Assignee: Zhiyuan Yang
> Fix For: 0.9.1
>
> Attachments: TEZ-3867.1.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Closed] (TEZ-3849) Combiner+PipelinedSorter silently drops records

2018-01-04 Thread Zhiyuan Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/TEZ-3849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhiyuan Yang closed TEZ-3849.
-

> Combiner+PipelinedSorter silently drops records
> ---
>
> Key: TEZ-3849
> URL: https://issues.apache.org/jira/browse/TEZ-3849
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Jacob Tolar
>Assignee: Jacob Tolar
> Fix For: 0.9.1
>
> Attachments: TEZ-3849.1.patch, TEZ-3849.2.patch, TEZ-3849.3.patch, 
> TEZ-3849.4.patch, TEZ-3849.5.patch, TEZ-3849.6.patch
>
>
> This bug was introduced in 
> https://github.com/apache/tez/commit/a47e8fcbea5eeab5a7cf812271d329524cc02dba?diff=split
>  
> when combiner != null, the change in this commit passes kvIter with next() 
> having already been called. This ends up (silently) dropping the first record 
> in the partition. 
> Will submit PR and attach patch. [~jeagles], not sure if this is the way you 
> want to fix or not but it does fix my tests.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (TEZ-3883) Update version in master to 0.9.2

2018-01-04 Thread Zhiyuan Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-3883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16312180#comment-16312180
 ] 

Zhiyuan Yang commented on TEZ-3883:
---

Patch committed to master

> Update version in master to 0.9.2
> -
>
> Key: TEZ-3883
> URL: https://issues.apache.org/jira/browse/TEZ-3883
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Zhiyuan Yang
>Assignee: Zhiyuan Yang
> Attachments: TEZ-3883.1.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Resolved] (TEZ-3883) Update version in master to 0.9.2

2018-01-04 Thread Zhiyuan Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/TEZ-3883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhiyuan Yang resolved TEZ-3883.
---
Resolution: Fixed

> Update version in master to 0.9.2
> -
>
> Key: TEZ-3883
> URL: https://issues.apache.org/jira/browse/TEZ-3883
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Zhiyuan Yang
>Assignee: Zhiyuan Yang
> Attachments: TEZ-3883.1.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (TEZ-3883) Update version in master to 0.9.2

2018-01-04 Thread Zhiyuan Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/TEZ-3883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhiyuan Yang updated TEZ-3883:
--
Attachment: TEZ-3883.1.patch

> Update version in master to 0.9.2
> -
>
> Key: TEZ-3883
> URL: https://issues.apache.org/jira/browse/TEZ-3883
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Zhiyuan Yang
>Assignee: Zhiyuan Yang
> Attachments: TEZ-3883.1.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Created] (TEZ-3883) Update version in master to 0.9.2

2018-01-04 Thread Zhiyuan Yang (JIRA)

Zhiyuan Yang created TEZ-3883:
-

 Summary: Update version in master to 0.9.2
 Key: TEZ-3883
 URL: https://issues.apache.org/jira/browse/TEZ-3883
 Project: Apache Tez
  Issue Type: Bug
Reporter: Zhiyuan Yang
Assignee: Zhiyuan Yang






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Resolved] (TEZ-3882) Changes for 0.9.1 release

2018-01-04 Thread Zhiyuan Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/TEZ-3882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhiyuan Yang resolved TEZ-3882.
---
Resolution: Fixed

> Changes for 0.9.1 release
> -
>
> Key: TEZ-3882
> URL: https://issues.apache.org/jira/browse/TEZ-3882
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Zhiyuan Yang
>Assignee: Zhiyuan Yang
> Attachments: TEZ-3882.1.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (TEZ-3882) Changes for 0.9.1 release

2018-01-04 Thread Zhiyuan Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-3882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16312145#comment-16312145
 ] 

Zhiyuan Yang commented on TEZ-3882:
---

Patch committed to master branch.

> Changes for 0.9.1 release
> -
>
> Key: TEZ-3882
> URL: https://issues.apache.org/jira/browse/TEZ-3882
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Zhiyuan Yang
>Assignee: Zhiyuan Yang
> Attachments: TEZ-3882.1.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (TEZ-3882) Changes for 0.9.1 release

2018-01-04 Thread Zhiyuan Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/TEZ-3882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhiyuan Yang updated TEZ-3882:
--
Attachment: TEZ-3882.1.patch

> Changes for 0.9.1 release
> -
>
> Key: TEZ-3882
> URL: https://issues.apache.org/jira/browse/TEZ-3882
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Zhiyuan Yang
>Assignee: Zhiyuan Yang
> Attachments: TEZ-3882.1.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Created] (TEZ-3882) Changes for 0.9.1 release

2018-01-04 Thread Zhiyuan Yang (JIRA)

Zhiyuan Yang created TEZ-3882:
-

 Summary: Changes for 0.9.1 release
 Key: TEZ-3882
 URL: https://issues.apache.org/jira/browse/TEZ-3882
 Project: Apache Tez
  Issue Type: Bug
Reporter: Zhiyuan Yang
Assignee: Zhiyuan Yang






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (TEZ-3810) TezCounter for idle time in shuffle phase

2017-12-13 Thread Zhiyuan Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-3810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16290112#comment-16290112
 ] 

Zhiyuan Yang commented on TEZ-3810:
---

I think this may be not necessary
{code}
} else if (idleStartTime != 0) {
shuffleIdleTime.increment(Time.monotonicNow() - idleStartTime);
idleStartTime = 0;
  }
{code}
since number of fetchers won't increase within this loop anyway.
{code}
while ((runningFetchers.size() >= numFetchers || pendingHosts.isEmpty())
  && numCompletedInputs.get() < numInputs) {
{code}

Also the test make this counter look like a timestamp, although the code works.
{code}
long startTime = 
inputContext.getCounters().findCounter(TaskCounter.SHUFFLE_IDLE_TIME).getValue();
long endTime = 
inputContext.getCounters().findCounter(TaskCounter.SHUFFLE_IDLE_TIME).getValue();
assertTrue("ShuffleIdleTime counter was: "+ (endTime - startTime) + "ms", 
endTime - startTime >= 5000);
{code}

> TezCounter for idle time in shuffle phase
> -
>
> Key: TEZ-3810
> URL: https://issues.apache.org/jira/browse/TEZ-3810
> Project: Apache Tez
>  Issue Type: Improvement
>Reporter: Ashwin Ramesh
> Attachments: TEZ-3810-001.patch, TEZ-3810.002.patch, 
> TEZ-3810.003.patch, TEZ-3810.004.patch
>
>
>  A task attempt counter that tracks how much time was spent waiting for 
> inputs in the shuffle phase. We can use this to quickly identify jobs that 
> are wasting a lot of time on the grid with idle reducer tasks instead of 
> shuffling/merging.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (TEZ-3694) Adopt YARN-5007 in MiniTezCluster

2017-12-13 Thread Zhiyuan Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-3694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16289985#comment-16289985
 ] 

Zhiyuan Yang commented on TEZ-3694:
---

This cannot be done unless we raise hadoop version to 2.7.2. Before 2.7.2 
MiniYarnCluster only read params but don't recognize conf at all.

> Adopt YARN-5007 in MiniTezCluster
> -
>
> Key: TEZ-3694
> URL: https://issues.apache.org/jira/browse/TEZ-3694
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Zhiyuan Yang
>Assignee: Zhiyuan Yang
> Attachments: TEZ-3694.1.patch
>
>
> Master branch won't build on hadoop trunk because YARN-5007 removed enableAHS 
> param from MiniYarnCluster ctor, which breaks MiniTezCluster. We should adopt 
> the change and use config to enable timeline service.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (TEZ-3869) Analyzer: Fix VertexInfo::getLastTaskToFinish comparison

2017-12-12 Thread Zhiyuan Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/TEZ-3869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhiyuan Yang updated TEZ-3869:
--
Fix Version/s: (was: 0.9.next)
   0.9.1

> Analyzer: Fix VertexInfo::getLastTaskToFinish comparison
> 
>
> Key: TEZ-3869
> URL: https://issues.apache.org/jira/browse/TEZ-3869
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
>Priority: Minor
> Fix For: 0.9.1
>
> Attachments: TEZ-3869.1.patch, TEZ-3869.2.patch
>
>
> {{VertexInfo::getLastTaskToFinish}} incorrectly compares with 
> getStartTimeInterval. This needs to be fixed. Observed timsort exceptions 
> when analyzing some dag zips. 
> {code}
> java.lang.IllegalArgumentException: Comparison method violates its general 
> contract!
>   at java.util.TimSort.mergeHi(TimSort.java:895)
>   at java.util.TimSort.mergeAt(TimSort.java:512)
>   at java.util.TimSort.mergeForceCollapse(TimSort.java:453)
>   at java.util.TimSort.sort(TimSort.java:250)
>   at java.util.Arrays.sort(Arrays.java:1435)
>   at java.util.Collections.sort(Collections.java:230)
>   at 
> org.apache.tez.history.parser.datamodel.VertexInfo.getLastTaskToFinish(VertexInfo.java:542)
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (TEZ-3868) Update website to factor in the TEZ trademark registration

2017-12-12 Thread Zhiyuan Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/TEZ-3868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhiyuan Yang updated TEZ-3868:
--
Fix Version/s: 0.9.1

> Update website to factor in the TEZ trademark registration
> --
>
> Key: TEZ-3868
> URL: https://issues.apache.org/jira/browse/TEZ-3868
> Project: Apache Tez
>  Issue Type: Task
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
> Fix For: 0.9.1
>
> Attachments: TEZ-3868.01.patch, TEZ-3868.02.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Resolved] (TEZ-3252) [Umbrella] Enable support for Hadoop-3.x

2017-12-12 Thread Zhiyuan Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/TEZ-3252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhiyuan Yang resolved TEZ-3252.
---
Resolution: Fixed

> [Umbrella] Enable support for Hadoop-3.x 
> -
>
> Key: TEZ-3252
> URL: https://issues.apache.org/jira/browse/TEZ-3252
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Hitesh Shah
> Fix For: 0.9.1
>
> Attachments: TEZ-3252.patch
>
>
> Placeholder umbrella to track the various issues/tasks discovered to get full 
> stable functionality against hadoop-3.x once it is released in a stable form. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (TEZ-3694) Adopt YARN-5007 in MiniTezCluster

2017-12-12 Thread Zhiyuan Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/TEZ-3694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhiyuan Yang updated TEZ-3694:
--
Issue Type: Bug  (was: Sub-task)
Parent: (was: TEZ-3252)

> Adopt YARN-5007 in MiniTezCluster
> -
>
> Key: TEZ-3694
> URL: https://issues.apache.org/jira/browse/TEZ-3694
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Zhiyuan Yang
>Assignee: Zhiyuan Yang
> Attachments: TEZ-3694.1.patch
>
>
> Master branch won't build on hadoop trunk because YARN-5007 removed enableAHS 
> param from MiniYarnCluster ctor, which breaks MiniTezCluster. We should adopt 
> the change and use config to enable timeline service.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (TEZ-3694) Adopt YARN-5007 in MiniTezCluster

2017-12-12 Thread Zhiyuan Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-3694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16288483#comment-16288483
 ] 

Zhiyuan Yang commented on TEZ-3694:
---

Test failed...Moving this out of TEZ-3252 umbrella jira since this works fine 
with Hadoop 3. This can be fixed later.

> Adopt YARN-5007 in MiniTezCluster
> -
>
> Key: TEZ-3694
> URL: https://issues.apache.org/jira/browse/TEZ-3694
> Project: Apache Tez
>  Issue Type: Sub-task
>Reporter: Zhiyuan Yang
>Assignee: Zhiyuan Yang
> Attachments: TEZ-3694.1.patch
>
>
> Master branch won't build on hadoop trunk because YARN-5007 removed enableAHS 
> param from MiniYarnCluster ctor, which breaks MiniTezCluster. We should adopt 
> the change and use config to enable timeline service.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (TEZ-3874) NPE in TezClientUtils when "yarn.resourcemanager.zk-address" is present in Configuration

2017-12-12 Thread Zhiyuan Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/TEZ-3874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhiyuan Yang updated TEZ-3874:
--
Fix Version/s: (was: 0.9.1)

> NPE in TezClientUtils when "yarn.resourcemanager.zk-address" is present in 
> Configuration
> 
>
> Key: TEZ-3874
> URL: https://issues.apache.org/jira/browse/TEZ-3874
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.9.1
>Reporter: Eric Wohlstadter
>Priority: Blocker
> Attachments: TEZ-3874.1.patch
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> "yarn.resourcemanager.zk-address" is deprecated in favor of 
> "hadoop.zk.address" for Hadoop 2.9+.
> Configuration base class does't auto-translate the deprecation. Only 
> YarnConfiguration applies the translation.
> In TezClientUtils.createFinalConfProtoForApp, a NPE is throw if 
> "yarn.resourcemanager.zk-address" is present in the Configuration.
> {code}
> for (Entry entry : amConf) {
>   PlanKeyValuePair.Builder kvp = PlanKeyValuePair.newBuilder();
>   kvp.setKey(entry.getKey());
>   kvp.setValue(amConf.get(entry.getKey()));
>   builder.addConfKeyValues(kvp);
> }
> {code}
> Even though Tez is not specifically looking for the deprecated property, 
> {{amConf.get(entry.getKey())}} will find it during the iteration, if it is in 
> any of the merged xml property resources. 
> {{amConf.get(entry.getKey())}} will return null, and {{kvp.setValue(null)}} 
> will trigger NPE.
> Suggested solution is to change to: 
> {code}
> YarnConfiguration wrappedConf = new YarnConfiguration(amConf);
> for (Entry entry : wrappedConf) {
>   PlanKeyValuePair.Builder kvp = PlanKeyValuePair.newBuilder();
>   kvp.setKey(entry.getKey());
>   kvp.setValue(wrappedConf.get(entry.getKey()));
>   builder.addConfKeyValues(kvp);
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (TEZ-3874) NPE in TezClientUtils when "yarn.resourcemanager.zk-address" is present in Configuration

2017-12-12 Thread Zhiyuan Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/TEZ-3874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhiyuan Yang updated TEZ-3874:
--
Target Version/s: 0.9.next  (was: 0.9.1)

> NPE in TezClientUtils when "yarn.resourcemanager.zk-address" is present in 
> Configuration
> 
>
> Key: TEZ-3874
> URL: https://issues.apache.org/jira/browse/TEZ-3874
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.9.1
>Reporter: Eric Wohlstadter
>Priority: Blocker
> Attachments: TEZ-3874.1.patch
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> "yarn.resourcemanager.zk-address" is deprecated in favor of 
> "hadoop.zk.address" for Hadoop 2.9+.
> Configuration base class does't auto-translate the deprecation. Only 
> YarnConfiguration applies the translation.
> In TezClientUtils.createFinalConfProtoForApp, a NPE is throw if 
> "yarn.resourcemanager.zk-address" is present in the Configuration.
> {code}
> for (Entry entry : amConf) {
>   PlanKeyValuePair.Builder kvp = PlanKeyValuePair.newBuilder();
>   kvp.setKey(entry.getKey());
>   kvp.setValue(amConf.get(entry.getKey()));
>   builder.addConfKeyValues(kvp);
> }
> {code}
> Even though Tez is not specifically looking for the deprecated property, 
> {{amConf.get(entry.getKey())}} will find it during the iteration, if it is in 
> any of the merged xml property resources. 
> {{amConf.get(entry.getKey())}} will return null, and {{kvp.setValue(null)}} 
> will trigger NPE.
> Suggested solution is to change to: 
> {code}
> YarnConfiguration wrappedConf = new YarnConfiguration(amConf);
> for (Entry entry : wrappedConf) {
>   PlanKeyValuePair.Builder kvp = PlanKeyValuePair.newBuilder();
>   kvp.setKey(entry.getKey());
>   kvp.setValue(wrappedConf.get(entry.getKey()));
>   builder.addConfKeyValues(kvp);
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (TEZ-3874) NPE in TezClientUtils when "yarn.resourcemanager.zk-address" is present in Configuration

2017-12-12 Thread Zhiyuan Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-3874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16288349#comment-16288349
 ] 

Zhiyuan Yang commented on TEZ-3874:
---

Move this to 0.9.2 as offline discussion with [~ewohlstadter].

> NPE in TezClientUtils when "yarn.resourcemanager.zk-address" is present in 
> Configuration
> 
>
> Key: TEZ-3874
> URL: https://issues.apache.org/jira/browse/TEZ-3874
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.9.1
>Reporter: Eric Wohlstadter
>Priority: Blocker
> Attachments: TEZ-3874.1.patch
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> "yarn.resourcemanager.zk-address" is deprecated in favor of 
> "hadoop.zk.address" for Hadoop 2.9+.
> Configuration base class does't auto-translate the deprecation. Only 
> YarnConfiguration applies the translation.
> In TezClientUtils.createFinalConfProtoForApp, a NPE is throw if 
> "yarn.resourcemanager.zk-address" is present in the Configuration.
> {code}
> for (Entry entry : amConf) {
>   PlanKeyValuePair.Builder kvp = PlanKeyValuePair.newBuilder();
>   kvp.setKey(entry.getKey());
>   kvp.setValue(amConf.get(entry.getKey()));
>   builder.addConfKeyValues(kvp);
> }
> {code}
> Even though Tez is not specifically looking for the deprecated property, 
> {{amConf.get(entry.getKey())}} will find it during the iteration, if it is in 
> any of the merged xml property resources. 
> {{amConf.get(entry.getKey())}} will return null, and {{kvp.setValue(null)}} 
> will trigger NPE.
> Suggested solution is to change to: 
> {code}
> YarnConfiguration wrappedConf = new YarnConfiguration(amConf);
> for (Entry entry : wrappedConf) {
>   PlanKeyValuePair.Builder kvp = PlanKeyValuePair.newBuilder();
>   kvp.setKey(entry.getKey());
>   kvp.setValue(wrappedConf.get(entry.getKey()));
>   builder.addConfKeyValues(kvp);
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (TEZ-3694) Adopt YARN-5007 in MiniTezCluster

2017-12-12 Thread Zhiyuan Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-3694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16288313#comment-16288313
 ] 

Zhiyuan Yang commented on TEZ-3694:
---

I'm going to get another jenkins run and commit this if things go smoothly. The 
constructor is deprecated anyway and we should adopt it.

> Adopt YARN-5007 in MiniTezCluster
> -
>
> Key: TEZ-3694
> URL: https://issues.apache.org/jira/browse/TEZ-3694
> Project: Apache Tez
>  Issue Type: Sub-task
>Reporter: Zhiyuan Yang
>Assignee: Zhiyuan Yang
> Attachments: TEZ-3694.1.patch
>
>
> Master branch won't build on hadoop trunk because YARN-5007 removed enableAHS 
> param from MiniYarnCluster ctor, which breaks MiniTezCluster. We should adopt 
> the change and use config to enable timeline service.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Resolved] (TEZ-3855) Allow vertex manager to send event to processor

2017-12-12 Thread Zhiyuan Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/TEZ-3855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhiyuan Yang resolved TEZ-3855.
---
  Resolution: Fixed
Release Note: Issue was addressed in TEZ-3867.

> Allow vertex manager to send event to processor
> ---
>
> Key: TEZ-3855
> URL: https://issues.apache.org/jira/browse/TEZ-3855
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Zhiyuan Yang
>Assignee: Zhiyuan Yang
>Priority: Blocker
> Fix For: 0.9.1
>
> Attachments: TEZ-3855.1.patch, TEZ-3855.2.patch, TEZ-3855.3.patch, 
> TEZ-3855.addendum.patch, TEZ-3855.prototype.patch
>
>
> Hive is trying to propagate some info from vertex manager to processor. The 
> task framework support processor event but there is no interface for VM to 
> send event out.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (TEZ-3867) testSendCustomProcessorEvent try to get array out of read only ByteBuffer

2017-12-12 Thread Zhiyuan Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-3867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16288282#comment-16288282
 ] 

Zhiyuan Yang commented on TEZ-3867:
---

Thanks [~kshukla] for review! I'll add this to 0.9.1 release.

> testSendCustomProcessorEvent try to get array out of read only ByteBuffer
> -
>
> Key: TEZ-3867
> URL: https://issues.apache.org/jira/browse/TEZ-3867
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Zhiyuan Yang
>Assignee: Zhiyuan Yang
> Fix For: 0.9.1
>
> Attachments: TEZ-3867.1.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (TEZ-3855) Allow vertex manager to send event to processor

2017-11-17 Thread Zhiyuan Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-3855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16257410#comment-16257410
 ] 

Zhiyuan Yang commented on TEZ-3855:
---

[~jeagles] Sorry, my bad. Should have get a jenkins run for addendum patch... 
I've made a patch and use TEZ-3867 to get a jenkins run.

> Allow vertex manager to send event to processor
> ---
>
> Key: TEZ-3855
> URL: https://issues.apache.org/jira/browse/TEZ-3855
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Zhiyuan Yang
>Assignee: Zhiyuan Yang
>Priority: Blocker
> Fix For: 0.9.1
>
> Attachments: TEZ-3855.1.patch, TEZ-3855.2.patch, TEZ-3855.3.patch, 
> TEZ-3855.addendum.patch, TEZ-3855.prototype.patch
>
>
> Hive is trying to propagate some info from vertex manager to processor. The 
> task framework support processor event but there is no interface for VM to 
> send event out.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (TEZ-3867) testSendCustomProcessorEvent try to get array out of read only ByteBuffer

2017-11-17 Thread Zhiyuan Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/TEZ-3867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhiyuan Yang updated TEZ-3867:
--
Attachment: TEZ-3867.1.patch

> testSendCustomProcessorEvent try to get array out of read only ByteBuffer
> -
>
> Key: TEZ-3867
> URL: https://issues.apache.org/jira/browse/TEZ-3867
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Zhiyuan Yang
>Assignee: Zhiyuan Yang
> Attachments: TEZ-3867.1.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Created] (TEZ-3867) testSendCustomProcessorEvent try to get array out of read only ByteBuffer

2017-11-17 Thread Zhiyuan Yang (JIRA)

Zhiyuan Yang created TEZ-3867:
-

 Summary: testSendCustomProcessorEvent try to get array out of read 
only ByteBuffer
 Key: TEZ-3867
 URL: https://issues.apache.org/jira/browse/TEZ-3867
 Project: Apache Tez
  Issue Type: Bug
Reporter: Zhiyuan Yang
Assignee: Zhiyuan Yang






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Comment Edited] (TEZ-3846) Tez AM may not clean up properly on an internal error

2017-11-16 Thread Zhiyuan Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-3846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16256030#comment-16256030
 ] 

Zhiyuan Yang edited comment on TEZ-3846 at 11/16/17 10:03 PM:
--

[~ewohlstadter] It's done in TEZ-3858. Do you want to investigate on this one? 
If so, feel free to take it over.


was (Author: aplusplus):
[~ewohlstadter] It's done in TEZ-3858.

> Tez AM may not clean up properly on an internal error
> -
>
> Key: TEZ-3846
> URL: https://issues.apache.org/jira/browse/TEZ-3846
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Zhiyuan Yang
>
> Normally, in Hive we blindly reopen the session on any submit error; however 
> I accidentally broke that, and while investigating noticed a new error before 
> reopen that claims that session where a DAG has failed is still running a 
> DAG. Looks like it should either clean up, or if we assume OOM is not 
> clean-up-able, die completely.
> {noformat}
> 2017-09-28T01:07:12,352  INFO [3d4e3f44-40c5-431a-b3de-801d60c1c579 main] 
> client.TezClient: Submitted dag to TezSession, 
> sessionName=HIVE-35a0e5c9-ce27-4b27-824c-ce9bc0fe104d, 
> applicationId=application_1506585924598_0001, 
> dagId=dag_1506585924598_0001_53, dagName=SELECT count(1) FROM (
> ...
> 2017-09-28T01:07:25,787 ERROR [3d4e3f44-40c5-431a-b3de-801d60c1c579 main] 
> SessionState: Status: Failed
> 2017-09-28T01:07:25,787 ERROR [3d4e3f44-40c5-431a-b3de-801d60c1c579 main] 
> SessionState: Vertex failed, vertexName=Map 61, 
> vertexId=vertex_1506585924598_0001_53_01, diagnostics=[Vertex 
> vertex_1506585924598_0001_53_01 [Map 61] killed/failed due 
> to:ROOT_INPUT_INIT_FAILURE, Vertex Input: src initializer failed, 
> vertex=vertex_1506585924598_0001_53_01 [Map 61], java.lang.OutOfMemoryError: 
> GC overhead limit exceeded
> 2017-09-28T01:07:25,787 ERROR [3d4e3f44-40c5-431a-b3de-801d60c1c579 main] 
> SessionState: Invalid event V_INTERNAL_ERROR on Vertex 
> vertex_1506585924598_0001_53_00 [Map 60]
> 2017-09-28T01:07:25,787 DEBUG [3d4e3f44-40c5-431a-b3de-801d60c1c579 main] 
> log.PerfLogger:  end=1506586045787 duration=13435 
> from=org.apache.hadoop.hive.ql.exec.tez.monitoring.TezJobMonitor>
> ... [reuse]
> 2017-09-28T01:07:28,459  INFO [11108166-069e-43d7-9e21-25b9214d01a4 main] 
> client.TezClient: Submitting dag to TezSession, 
> sessionName=HIVE-35a0e5c9-ce27-4b27-824c-ce9bc0fe104d, 
> applicationId=application_1506585924598_0001, dagName=insert overwrite table 
> orc_ppd_staging s...s(Stage-1), callerContext={ context=HIVE, 
> callerType=HIVE_QUERY_ID, 
> callerId=hiveptest_20170928010728_58f19d98-85da-4fad-83a7-7bf3aa0252a7 }
> 2017-09-28T01:07:35,259  INFO [11108166-069e-43d7-9e21-25b9214d01a4 main] 
> exec.Task: Dag submit failed due to App master already running a DAG
> {noformat}
> Session continues living and failing like that multiple times.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Comment Edited] (TEZ-3846) Tez AM may not clean up properly on an internal error

2017-11-16 Thread Zhiyuan Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-3846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16256030#comment-16256030
 ] 

Zhiyuan Yang edited comment on TEZ-3846 at 11/16/17 10:01 PM:
--

[~ewohlstadter] It's done in TEZ-3858.


was (Author: aplusplus):
[~EricWohlstadter] It's done in TEZ-3858.

> Tez AM may not clean up properly on an internal error
> -
>
> Key: TEZ-3846
> URL: https://issues.apache.org/jira/browse/TEZ-3846
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Zhiyuan Yang
>
> Normally, in Hive we blindly reopen the session on any submit error; however 
> I accidentally broke that, and while investigating noticed a new error before 
> reopen that claims that session where a DAG has failed is still running a 
> DAG. Looks like it should either clean up, or if we assume OOM is not 
> clean-up-able, die completely.
> {noformat}
> 2017-09-28T01:07:12,352  INFO [3d4e3f44-40c5-431a-b3de-801d60c1c579 main] 
> client.TezClient: Submitted dag to TezSession, 
> sessionName=HIVE-35a0e5c9-ce27-4b27-824c-ce9bc0fe104d, 
> applicationId=application_1506585924598_0001, 
> dagId=dag_1506585924598_0001_53, dagName=SELECT count(1) FROM (
> ...
> 2017-09-28T01:07:25,787 ERROR [3d4e3f44-40c5-431a-b3de-801d60c1c579 main] 
> SessionState: Status: Failed
> 2017-09-28T01:07:25,787 ERROR [3d4e3f44-40c5-431a-b3de-801d60c1c579 main] 
> SessionState: Vertex failed, vertexName=Map 61, 
> vertexId=vertex_1506585924598_0001_53_01, diagnostics=[Vertex 
> vertex_1506585924598_0001_53_01 [Map 61] killed/failed due 
> to:ROOT_INPUT_INIT_FAILURE, Vertex Input: src initializer failed, 
> vertex=vertex_1506585924598_0001_53_01 [Map 61], java.lang.OutOfMemoryError: 
> GC overhead limit exceeded
> 2017-09-28T01:07:25,787 ERROR [3d4e3f44-40c5-431a-b3de-801d60c1c579 main] 
> SessionState: Invalid event V_INTERNAL_ERROR on Vertex 
> vertex_1506585924598_0001_53_00 [Map 60]
> 2017-09-28T01:07:25,787 DEBUG [3d4e3f44-40c5-431a-b3de-801d60c1c579 main] 
> log.PerfLogger:  end=1506586045787 duration=13435 
> from=org.apache.hadoop.hive.ql.exec.tez.monitoring.TezJobMonitor>
> ... [reuse]
> 2017-09-28T01:07:28,459  INFO [11108166-069e-43d7-9e21-25b9214d01a4 main] 
> client.TezClient: Submitting dag to TezSession, 
> sessionName=HIVE-35a0e5c9-ce27-4b27-824c-ce9bc0fe104d, 
> applicationId=application_1506585924598_0001, dagName=insert overwrite table 
> orc_ppd_staging s...s(Stage-1), callerContext={ context=HIVE, 
> callerType=HIVE_QUERY_ID, 
> callerId=hiveptest_20170928010728_58f19d98-85da-4fad-83a7-7bf3aa0252a7 }
> 2017-09-28T01:07:35,259  INFO [11108166-069e-43d7-9e21-25b9214d01a4 main] 
> exec.Task: Dag submit failed due to App master already running a DAG
> {noformat}
> Session continues living and failing like that multiple times.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (TEZ-3846) Tez AM may not clean up properly on an internal error

2017-11-16 Thread Zhiyuan Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-3846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16256030#comment-16256030
 ] 

Zhiyuan Yang commented on TEZ-3846:
---

[~EricWohlstadter] It's done in TEZ-3858.

> Tez AM may not clean up properly on an internal error
> -
>
> Key: TEZ-3846
> URL: https://issues.apache.org/jira/browse/TEZ-3846
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Zhiyuan Yang
>
> Normally, in Hive we blindly reopen the session on any submit error; however 
> I accidentally broke that, and while investigating noticed a new error before 
> reopen that claims that session where a DAG has failed is still running a 
> DAG. Looks like it should either clean up, or if we assume OOM is not 
> clean-up-able, die completely.
> {noformat}
> 2017-09-28T01:07:12,352  INFO [3d4e3f44-40c5-431a-b3de-801d60c1c579 main] 
> client.TezClient: Submitted dag to TezSession, 
> sessionName=HIVE-35a0e5c9-ce27-4b27-824c-ce9bc0fe104d, 
> applicationId=application_1506585924598_0001, 
> dagId=dag_1506585924598_0001_53, dagName=SELECT count(1) FROM (
> ...
> 2017-09-28T01:07:25,787 ERROR [3d4e3f44-40c5-431a-b3de-801d60c1c579 main] 
> SessionState: Status: Failed
> 2017-09-28T01:07:25,787 ERROR [3d4e3f44-40c5-431a-b3de-801d60c1c579 main] 
> SessionState: Vertex failed, vertexName=Map 61, 
> vertexId=vertex_1506585924598_0001_53_01, diagnostics=[Vertex 
> vertex_1506585924598_0001_53_01 [Map 61] killed/failed due 
> to:ROOT_INPUT_INIT_FAILURE, Vertex Input: src initializer failed, 
> vertex=vertex_1506585924598_0001_53_01 [Map 61], java.lang.OutOfMemoryError: 
> GC overhead limit exceeded
> 2017-09-28T01:07:25,787 ERROR [3d4e3f44-40c5-431a-b3de-801d60c1c579 main] 
> SessionState: Invalid event V_INTERNAL_ERROR on Vertex 
> vertex_1506585924598_0001_53_00 [Map 60]
> 2017-09-28T01:07:25,787 DEBUG [3d4e3f44-40c5-431a-b3de-801d60c1c579 main] 
> log.PerfLogger:  end=1506586045787 duration=13435 
> from=org.apache.hadoop.hive.ql.exec.tez.monitoring.TezJobMonitor>
> ... [reuse]
> 2017-09-28T01:07:28,459  INFO [11108166-069e-43d7-9e21-25b9214d01a4 main] 
> client.TezClient: Submitting dag to TezSession, 
> sessionName=HIVE-35a0e5c9-ce27-4b27-824c-ce9bc0fe104d, 
> applicationId=application_1506585924598_0001, dagName=insert overwrite table 
> orc_ppd_staging s...s(Stage-1), callerContext={ context=HIVE, 
> callerType=HIVE_QUERY_ID, 
> callerId=hiveptest_20170928010728_58f19d98-85da-4fad-83a7-7bf3aa0252a7 }
> 2017-09-28T01:07:35,259  INFO [11108166-069e-43d7-9e21-25b9214d01a4 main] 
> exec.Task: Dag submit failed due to App master already running a DAG
> {noformat}
> Session continues living and failing like that multiple times.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (TEZ-3855) Allow vertex manager to send event to processor

2017-11-15 Thread Zhiyuan Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-3855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16254264#comment-16254264
 ] 

Zhiyuan Yang commented on TEZ-3855:
---

[~gopalv] It is sufficient. With addendum patch, buffer object is sealed intact 
within the event. It can be get multiple times, either for heartbeat or 
processor. Thanks for review! I'll commit this soon.

> Allow vertex manager to send event to processor
> ---
>
> Key: TEZ-3855
> URL: https://issues.apache.org/jira/browse/TEZ-3855
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Zhiyuan Yang
>Assignee: Zhiyuan Yang
>Priority: Blocker
> Fix For: 0.9.1
>
> Attachments: TEZ-3855.1.patch, TEZ-3855.2.patch, TEZ-3855.3.patch, 
> TEZ-3855.addendum.patch, TEZ-3855.prototype.patch
>
>
> Hive is trying to propagate some info from vertex manager to processor. The 
> task framework support processor event but there is no interface for VM to 
> send event out.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (TEZ-3837) Parallel sorting with inline sampling

2017-11-14 Thread Zhiyuan Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/TEZ-3837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhiyuan Yang updated TEZ-3837:
--
Priority: Major  (was: Blocker)

> Parallel sorting with inline sampling
> -
>
> Key: TEZ-3837
> URL: https://issues.apache.org/jira/browse/TEZ-3837
> Project: Apache Tez
>  Issue Type: New Feature
>Reporter: Zhiyuan Yang
>Assignee: Zhiyuan Yang
> Attachments: Parallel Sorting In Tez.pdf, TEZ-3837.1.patch, 
> TEZ-3837.2.patch, TEZ-3837.3.patch, TEZ-3837.4.patch, TEZ-3837.5.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (TEZ-3855) Allow vertex manager to send event to processor

2017-11-14 Thread Zhiyuan Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/TEZ-3855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhiyuan Yang updated TEZ-3855:
--
Priority: Blocker  (was: Major)

> Allow vertex manager to send event to processor
> ---
>
> Key: TEZ-3855
> URL: https://issues.apache.org/jira/browse/TEZ-3855
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Zhiyuan Yang
>Assignee: Zhiyuan Yang
>Priority: Blocker
> Fix For: 0.9.1
>
> Attachments: TEZ-3855.1.patch, TEZ-3855.2.patch, TEZ-3855.3.patch, 
> TEZ-3855.addendum.patch, TEZ-3855.prototype.patch
>
>
> Hive is trying to propagate some info from vertex manager to processor. The 
> task framework support processor event but there is no interface for VM to 
> send event out.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (TEZ-3837) Parallel sorting with inline sampling

2017-11-14 Thread Zhiyuan Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/TEZ-3837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhiyuan Yang updated TEZ-3837:
--
Priority: Blocker  (was: Major)

> Parallel sorting with inline sampling
> -
>
> Key: TEZ-3837
> URL: https://issues.apache.org/jira/browse/TEZ-3837
> Project: Apache Tez
>  Issue Type: New Feature
>Reporter: Zhiyuan Yang
>Assignee: Zhiyuan Yang
>Priority: Blocker
> Attachments: Parallel Sorting In Tez.pdf, TEZ-3837.1.patch, 
> TEZ-3837.2.patch, TEZ-3837.3.patch, TEZ-3837.4.patch, TEZ-3837.5.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (TEZ-3855) Allow vertex manager to send event to processor

2017-11-14 Thread Zhiyuan Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/TEZ-3855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhiyuan Yang updated TEZ-3855:
--
Attachment: TEZ-3855.addendum.patch

Just work with [~djaiswal] and find out this new event itself broke fault 
tolerance. Previously code returns original ByteBuffer for consuming, leaving 
empty buffer for next time. Attached addendum patch to fix the issue.[~gopalv] 
Can you help review?

> Allow vertex manager to send event to processor
> ---
>
> Key: TEZ-3855
> URL: https://issues.apache.org/jira/browse/TEZ-3855
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Zhiyuan Yang
>Assignee: Zhiyuan Yang
> Fix For: 0.9.1
>
> Attachments: TEZ-3855.1.patch, TEZ-3855.2.patch, TEZ-3855.3.patch, 
> TEZ-3855.addendum.patch, TEZ-3855.prototype.patch
>
>
> Hive is trying to propagate some info from vertex manager to processor. The 
> task framework support processor event but there is no interface for VM to 
> send event out.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Reopened] (TEZ-3855) Allow vertex manager to send event to processor

2017-11-14 Thread Zhiyuan Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/TEZ-3855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhiyuan Yang reopened TEZ-3855:
---

> Allow vertex manager to send event to processor
> ---
>
> Key: TEZ-3855
> URL: https://issues.apache.org/jira/browse/TEZ-3855
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Zhiyuan Yang
>Assignee: Zhiyuan Yang
> Fix For: 0.9.1
>
> Attachments: TEZ-3855.1.patch, TEZ-3855.2.patch, TEZ-3855.3.patch, 
> TEZ-3855.prototype.patch
>
>
> Hive is trying to propagate some info from vertex manager to processor. The 
> task framework support processor event but there is no interface for VM to 
> send event out.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (TEZ-3837) Parallel sorting with inline sampling

2017-11-13 Thread Zhiyuan Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/TEZ-3837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhiyuan Yang updated TEZ-3837:
--
Attachment: TEZ-3837.5.patch

> Parallel sorting with inline sampling
> -
>
> Key: TEZ-3837
> URL: https://issues.apache.org/jira/browse/TEZ-3837
> Project: Apache Tez
>  Issue Type: New Feature
>Reporter: Zhiyuan Yang
>Assignee: Zhiyuan Yang
> Attachments: Parallel Sorting In Tez.pdf, TEZ-3837.1.patch, 
> TEZ-3837.2.patch, TEZ-3837.3.patch, TEZ-3837.4.patch, TEZ-3837.5.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (TEZ-3864) Tez failed to intergrate with hadoop(2.8.2)

2017-11-13 Thread Zhiyuan Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-3864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16250012#comment-16250012
 ] 

Zhiyuan Yang commented on TEZ-3864:
---

Interruption probably wasn't the culprit. You may want to find out who sent the 
interruption and why.

> Tez failed to intergrate with hadoop(2.8.2)
> ---
>
> Key: TEZ-3864
> URL: https://issues.apache.org/jira/browse/TEZ-3864
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.9.0
>Reporter: Shen Yinjie
>
> When I intergrated tez(0.9.0) with hadoop(2.8.2), always failed to running 
> tez service check:orderedwordcount, 
>  "hadoop --config /etc/hadoop/conf jar /usr/lib/tez/tez-examples*.jar 
> ordeib/tez/tez-examples*.jar orderedwordcount  
> /tmp/tezsmokeinput/sample-tez-test /tmp/tezsmokeoutput/"
> But all containers could not run successfully, container logs just 
> print exceptions as follows:
> "TaskAttempt 2 failed, info=[Error: Error while running task ( failure ) 
> : java.lang.RuntimeException: java.io.IOException: Failed on local exception: 
> java.nio.channels.ClosedByInterruptException; Host Details : local host is: 
> "wjf1-hc/xx.xx.xx.xx"; destination host is: "wjf1-hc":8020; 
>  at 
> org.apache.hadoop.mapreduce.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.initNextRecordReader(TezGroupedSplitsInputFormat.java:209)
>  at 
> org.apache.hadoop.mapreduce.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.initialize(TezGroupedSplitsInputFormat.java:156)
>  at 
> org.apache.tez.mapreduce.lib.MRReaderMapReduce.setupNewRecordReader(MRReaderMapReduce.java:157)
>  at 
> org.apache.tez.mapreduce.lib.MRReaderMapReduce.setSplit(MRReaderMapReduce.java:88)
>  at 
> org.apache.tez.mapreduce.input.MRInput.initFromEventInternal(MRInput.java:703)
>  at org.apache.tez.mapreduce.input.MRInput.processSplitEvent(MRInput.java:631)
>  at org.apache.tez.mapreduce.input.MRInput.handleEvents(MRInput.java:590)
>  at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.handleEvent(LogicalIOProcessorRuntimeTask.java:719)
>  at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.access$600(LogicalIOProcessorRuntimeTask.java:106)
>  at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask$1.runInternal(LogicalIOProcessorRuntimeTask.java:796)
>  at org.apache.tez.common.RunnableWithNdc.run(RunnableWithNdc.java:35)
>  at java.lang.Thread.run(Thread.java:745)
> Caused by: java.io.IOException: Failed on local exception: 
> java.nio.channels.ClosedByInterruptException; Host Details : local host is: 
> "wjf1-hc/xx.xx.xx.xx"; destination host is: "wjf1-hc":8020; 
>  at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:785)
>  at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1499)
>  at org.apache.hadoop.ipc.Client.call(Client.java:1441)
>  at org.apache.hadoop.ipc.Client.call(Client.java:1351)
>  at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:235)
>  at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)
>  at com.sun.proxy.$Proxy14.getBlockLocations(Unknown Source)
>  at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getBlockLocations(ClientNamenodeProtocolTranslatorPB.java:259)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>  at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498)
>  at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:409)
>  at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:163)
>  at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:155)
>  at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95)
>  at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:346)
>  at com.sun.proxy.$Proxy15.getBlockLocations(Unknown Source)
>  at org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:830)
>  at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:819)
>  at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:808)
>  at 
> org.apache.hadoop.hdfs.DFSInputStream.fetchLocatedBlocksAndGetLastBlockLength(DFSInputStream.java:319)
>  at org.apache.hadoop.hdfs.DFSInputStream.openInfo(DFSInputStream.java:281)
>  at org.apache.hadoop.hdfs.DFSInputStream.(DFSInputStream.java:270)
>  at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:1119)
>  at 
> org.apache.hadoop.hdfs.DistributedFileSystem$4.doCall(DistributedFileSystem.java:343)
>  at 
> org.ap

[jira] [Updated] (TEZ-3855) Allow vertex manager to send event to processor

2017-11-10 Thread Zhiyuan Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/TEZ-3855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhiyuan Yang updated TEZ-3855:
--
Fix Version/s: 0.9.1

> Allow vertex manager to send event to processor
> ---
>
> Key: TEZ-3855
> URL: https://issues.apache.org/jira/browse/TEZ-3855
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Zhiyuan Yang
>Assignee: Zhiyuan Yang
> Fix For: 0.9.1
>
> Attachments: TEZ-3855.1.patch, TEZ-3855.2.patch, TEZ-3855.3.patch, 
> TEZ-3855.prototype.patch
>
>
> Hive is trying to propagate some info from vertex manager to processor. The 
> task framework support processor event but there is no interface for VM to 
> send event out.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (TEZ-3855) Allow vertex manager to send event to processor

2017-11-10 Thread Zhiyuan Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-3855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16248236#comment-16248236
 ] 

Zhiyuan Yang commented on TEZ-3855:
---

Thanks [~gopalv] for review! Patch was committed to master branch.
{code}
commit b96f79fa75dc6cf47e4d648b028ccb12f02308a6
Author: Zhiyuan Yang 
Date:   Fri Nov 10 16:44:29 2017 -0800

TEZ-3855. Allow vertex manager to send event to processor (zhiyuany)
{code}

> Allow vertex manager to send event to processor
> ---
>
> Key: TEZ-3855
> URL: https://issues.apache.org/jira/browse/TEZ-3855
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Zhiyuan Yang
>Assignee: Zhiyuan Yang
> Attachments: TEZ-3855.1.patch, TEZ-3855.2.patch, TEZ-3855.3.patch, 
> TEZ-3855.prototype.patch
>
>
> Hive is trying to propagate some info from vertex manager to processor. The 
> task framework support processor event but there is no interface for VM to 
> send event out.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (TEZ-3855) Allow vertex manager to send event to processor

2017-11-09 Thread Zhiyuan Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/TEZ-3855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhiyuan Yang updated TEZ-3855:
--
Attachment: TEZ-3855.3.patch

Added a way to allow tracing event to the sender AM according to [~gopalv]'s 
offline comments. Now events have app attempt id in version field.

> Allow vertex manager to send event to processor
> ---
>
> Key: TEZ-3855
> URL: https://issues.apache.org/jira/browse/TEZ-3855
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Zhiyuan Yang
>Assignee: Zhiyuan Yang
> Attachments: TEZ-3855.1.patch, TEZ-3855.2.patch, TEZ-3855.3.patch, 
> TEZ-3855.prototype.patch
>
>
> Hive is trying to propagate some info from vertex manager to processor. The 
> task framework support processor event but there is no interface for VM to 
> send event out.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (TEZ-3855) Allow vertex manager to send event to processor

2017-11-08 Thread Zhiyuan Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-3855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16244595#comment-16244595
 ] 

Zhiyuan Yang commented on TEZ-3855:
---

Thanks [~gopalv]! AFAIK processor event framework has been there for long time 
but was never used.

> Allow vertex manager to send event to processor
> ---
>
> Key: TEZ-3855
> URL: https://issues.apache.org/jira/browse/TEZ-3855
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Zhiyuan Yang
>Assignee: Zhiyuan Yang
> Attachments: TEZ-3855.1.patch, TEZ-3855.2.patch, 
> TEZ-3855.prototype.patch
>
>
> Hive is trying to propagate some info from vertex manager to processor. The 
> task framework support processor event but there is no interface for VM to 
> send event out.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (TEZ-3855) Allow vertex manager to send event to processor

2017-11-08 Thread Zhiyuan Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-3855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16244566#comment-16244566
 ] 

Zhiyuan Yang commented on TEZ-3855:
---

Ping [~rajesh.balamohan], [~gopalv] for review. Hive need have this in 0.9.1 
release.

> Allow vertex manager to send event to processor
> ---
>
> Key: TEZ-3855
> URL: https://issues.apache.org/jira/browse/TEZ-3855
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Zhiyuan Yang
>Assignee: Zhiyuan Yang
> Attachments: TEZ-3855.1.patch, TEZ-3855.2.patch, 
> TEZ-3855.prototype.patch
>
>
> Hive is trying to propagate some info from vertex manager to processor. The 
> task framework support processor event but there is no interface for VM to 
> send event out.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (TEZ-3805) Analyzer: Add an analyzer to find out scheduling misses in 1:1 edges

2017-11-08 Thread Zhiyuan Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/TEZ-3805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhiyuan Yang updated TEZ-3805:
--
Fix Version/s: (was: 0.9.next)
   0.9.1

> Analyzer: Add an analyzer to find out scheduling misses in 1:1 edges
> 
>
> Key: TEZ-3805
> URL: https://issues.apache.org/jira/browse/TEZ-3805
> Project: Apache Tez
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
> Fix For: 0.9.1
>
> Attachments: TEZ-3805.1.patch
>
>
> When 1:1 edge is used, it would be helpful to find out whether downstream 
> tasks ran on the same location provided in the hints by the runtime. 
> One of the recent feature in upstream project (hive) used 1:1 edge. Instead 
> of checking the logs, it would be useful to have an analyzer to churn out the 
> details.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (TEZ-3858) Misleading dag level diagnostics in case of invalid vertex event

2017-11-08 Thread Zhiyuan Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/TEZ-3858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhiyuan Yang updated TEZ-3858:
--
Fix Version/s: 0.9.1

> Misleading dag level diagnostics in case of invalid vertex event
> 
>
> Key: TEZ-3858
> URL: https://issues.apache.org/jira/browse/TEZ-3858
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Zhiyuan Yang
>Assignee: Zhiyuan Yang
> Fix For: 0.9.1
>
> Attachments: TEZ-3858.1.patch, TEZ-3858.2.patch
>
>
> When a vertex gets invalid event, the state machine will be transited by 
> InternalErrorTransition. This transition prints this and adds it to dag 
> diagnostic: 
> {code}
> ("Invalid event " + event.getType() + " on Vertex " + 
> vertex.getLogIdentifier()
> {code}
> But variable event here is V_INTERNAL_ERROR event instead of the event that 
> caused V_INTERNAL_ERROR. V_INTERNAL_ERROR is not the invalid event, the 
> original event is.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (TEZ-3858) Misleading dag level diagnostics in case of invalid vertex event

2017-11-08 Thread Zhiyuan Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-3858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16244554#comment-16244554
 ] 

Zhiyuan Yang commented on TEZ-3858:
---

Thanks [~kshukla] for reviewing and committing this!

> Misleading dag level diagnostics in case of invalid vertex event
> 
>
> Key: TEZ-3858
> URL: https://issues.apache.org/jira/browse/TEZ-3858
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Zhiyuan Yang
>Assignee: Zhiyuan Yang
> Fix For: 0.9.1
>
> Attachments: TEZ-3858.1.patch, TEZ-3858.2.patch
>
>
> When a vertex gets invalid event, the state machine will be transited by 
> InternalErrorTransition. This transition prints this and adds it to dag 
> diagnostic: 
> {code}
> ("Invalid event " + event.getType() + " on Vertex " + 
> vertex.getLogIdentifier()
> {code}
> But variable event here is V_INTERNAL_ERROR event instead of the event that 
> caused V_INTERNAL_ERROR. V_INTERNAL_ERROR is not the invalid event, the 
> original event is.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (TEZ-3837) Parallel sorting with inline sampling

2017-11-07 Thread Zhiyuan Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/TEZ-3837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhiyuan Yang updated TEZ-3837:
--
Attachment: TEZ-3837.4.patch

Fix some event related SerDe issue.

> Parallel sorting with inline sampling
> -
>
> Key: TEZ-3837
> URL: https://issues.apache.org/jira/browse/TEZ-3837
> Project: Apache Tez
>  Issue Type: New Feature
>Reporter: Zhiyuan Yang
>Assignee: Zhiyuan Yang
> Attachments: Parallel Sorting In Tez.pdf, TEZ-3837.1.patch, 
> TEZ-3837.2.patch, TEZ-3837.3.patch, TEZ-3837.4.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (TEZ-3837) Parallel sorting with inline sampling

2017-11-06 Thread Zhiyuan Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/TEZ-3837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhiyuan Yang updated TEZ-3837:
--
Attachment: TEZ-3837.3.patch

Fix javadoc warning

> Parallel sorting with inline sampling
> -
>
> Key: TEZ-3837
> URL: https://issues.apache.org/jira/browse/TEZ-3837
> Project: Apache Tez
>  Issue Type: New Feature
>Reporter: Zhiyuan Yang
>Assignee: Zhiyuan Yang
> Attachments: Parallel Sorting In Tez.pdf, TEZ-3837.1.patch, 
> TEZ-3837.2.patch, TEZ-3837.3.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (TEZ-3837) Parallel sorting with inline sampling

2017-11-03 Thread Zhiyuan Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/TEZ-3837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhiyuan Yang updated TEZ-3837:
--
Attachment: TEZ-3837.2.patch

> Parallel sorting with inline sampling
> -
>
> Key: TEZ-3837
> URL: https://issues.apache.org/jira/browse/TEZ-3837
> Project: Apache Tez
>  Issue Type: New Feature
>Reporter: Zhiyuan Yang
>Assignee: Zhiyuan Yang
>Priority: Major
> Attachments: Parallel Sorting In Tez.pdf, TEZ-3837.1.patch, 
> TEZ-3837.2.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (TEZ-3855) Allow vertex manager to send event to processor

2017-11-03 Thread Zhiyuan Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/TEZ-3855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhiyuan Yang updated TEZ-3855:
--
Attachment: (was: TEZ-3855.2.patch)

> Allow vertex manager to send event to processor
> ---
>
> Key: TEZ-3855
> URL: https://issues.apache.org/jira/browse/TEZ-3855
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Zhiyuan Yang
>Assignee: Zhiyuan Yang
>Priority: Major
> Attachments: TEZ-3855.1.patch, TEZ-3855.2.patch, 
> TEZ-3855.prototype.patch
>
>
> Hive is trying to propagate some info from vertex manager to processor. The 
> task framework support processor event but there is no interface for VM to 
> send event out.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (TEZ-3855) Allow vertex manager to send event to processor

2017-11-03 Thread Zhiyuan Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/TEZ-3855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhiyuan Yang updated TEZ-3855:
--
Attachment: TEZ-3855.2.patch

> Allow vertex manager to send event to processor
> ---
>
> Key: TEZ-3855
> URL: https://issues.apache.org/jira/browse/TEZ-3855
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Zhiyuan Yang
>Assignee: Zhiyuan Yang
>Priority: Major
> Attachments: TEZ-3855.1.patch, TEZ-3855.2.patch, 
> TEZ-3855.prototype.patch
>
>
> Hive is trying to propagate some info from vertex manager to processor. The 
> task framework support processor event but there is no interface for VM to 
> send event out.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (TEZ-3855) Allow vertex manager to send event to processor

2017-11-03 Thread Zhiyuan Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/TEZ-3855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhiyuan Yang updated TEZ-3855:
--
Attachment: TEZ-3855.2.patch

[~rajesh.balamohan] Thanks for taking a look! That task attempt id won't be 
used anyway since this is a task level event. I've changed it to -1 and add 
comments in new patch.

> Allow vertex manager to send event to processor
> ---
>
> Key: TEZ-3855
> URL: https://issues.apache.org/jira/browse/TEZ-3855
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Zhiyuan Yang
>Assignee: Zhiyuan Yang
>Priority: Major
> Attachments: TEZ-3855.1.patch, TEZ-3855.2.patch, 
> TEZ-3855.prototype.patch
>
>
> Hive is trying to propagate some info from vertex manager to processor. The 
> task framework support processor event but there is no interface for VM to 
> send event out.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (TEZ-3837) Parallel sorting with inline sampling

2017-11-02 Thread Zhiyuan Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/TEZ-3837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhiyuan Yang updated TEZ-3837:
--
Attachment: (was: TEZ-3837.1.patch.example)

> Parallel sorting with inline sampling
> -
>
> Key: TEZ-3837
> URL: https://issues.apache.org/jira/browse/TEZ-3837
> Project: Apache Tez
>  Issue Type: New Feature
>Reporter: Zhiyuan Yang
>Assignee: Zhiyuan Yang
>Priority: Major
> Attachments: Parallel Sorting In Tez.pdf, TEZ-3837.1.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Resolved] (TEZ-3838) API for enabling sampler and specify configuration

2017-11-02 Thread Zhiyuan Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/TEZ-3838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhiyuan Yang resolved TEZ-3838.
---
Resolution: Won't Fix

Close since it's included in TEZ-3837 patch.

> API for enabling sampler and specify configuration
> --
>
> Key: TEZ-3838
> URL: https://issues.apache.org/jira/browse/TEZ-3838
> Project: Apache Tez
>  Issue Type: Sub-task
>Reporter: Zhiyuan Yang
>Assignee: Zhiyuan Yang
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Assigned] (TEZ-3837) Parallel sorting with inline sampling

2017-11-02 Thread Zhiyuan Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/TEZ-3837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhiyuan Yang reassigned TEZ-3837:
-

Assignee: Zhiyuan Yang

> Parallel sorting with inline sampling
> -
>
> Key: TEZ-3837
> URL: https://issues.apache.org/jira/browse/TEZ-3837
> Project: Apache Tez
>  Issue Type: New Feature
>Reporter: Zhiyuan Yang
>Assignee: Zhiyuan Yang
>Priority: Major
> Attachments: Parallel Sorting In Tez.pdf, TEZ-3837.1.patch, 
> TEZ-3837.1.patch.example
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

1 2 3 4 5 6 7 8 >

1 - 100 of 755 matches

Mail list logo