[jira] [Updated] (HIVE-14731) Use Tez cartesian product edge in Hive (unpartitioned case only)

2016-12-21 Thread Zhiyuan Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhiyuan Yang updated HIVE-14731:

Attachment: HIVE-14731.13.patch

> Use Tez cartesian product edge in Hive (unpartitioned case only)
> 
>
> Key: HIVE-14731
> URL: https://issues.apache.org/jira/browse/HIVE-14731
> Project: Hive
>  Issue Type: Bug
>Reporter: Zhiyuan Yang
>Assignee: Zhiyuan Yang
> Attachments: HIVE-14731.1.patch, HIVE-14731.10.patch, 
> HIVE-14731.11.patch, HIVE-14731.12.patch, HIVE-14731.13.patch, 
> HIVE-14731.2.patch, HIVE-14731.3.patch, HIVE-14731.4.patch, 
> HIVE-14731.5.patch, HIVE-14731.6.patch, HIVE-14731.7.patch, 
> HIVE-14731.8.patch, HIVE-14731.9.patch
>
>
> Given cartesian product edge is available in Tez now (see TEZ-3230), let's 
> integrate it into Hive on Tez. This allows us to have more than one reducer 
> in cross product queries.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14731) Use Tez cartesian product edge in Hive (unpartitioned case only)

2016-12-21 Thread Zhiyuan Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15768148#comment-15768148
 ] 

Zhiyuan Yang commented on HIVE-14731:
-

Thanks [~hagleitn] for review! Final patch has been uploaded.

> Use Tez cartesian product edge in Hive (unpartitioned case only)
> 
>
> Key: HIVE-14731
> URL: https://issues.apache.org/jira/browse/HIVE-14731
> Project: Hive
>  Issue Type: Bug
>Reporter: Zhiyuan Yang
>Assignee: Zhiyuan Yang
> Attachments: HIVE-14731.1.patch, HIVE-14731.10.patch, 
> HIVE-14731.11.patch, HIVE-14731.12.patch, HIVE-14731.13.patch, 
> HIVE-14731.2.patch, HIVE-14731.3.patch, HIVE-14731.4.patch, 
> HIVE-14731.5.patch, HIVE-14731.6.patch, HIVE-14731.7.patch, 
> HIVE-14731.8.patch, HIVE-14731.9.patch
>
>
> Given cartesian product edge is available in Tez now (see TEZ-3230), let's 
> integrate it into Hive on Tez. This allows us to have more than one reducer 
> in cross product queries.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HIVE-15570) LLAP: Exception in HostAffinitySplitLocationProvider when running in container mode

2017-01-20 Thread Zhiyuan Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhiyuan Yang reassigned HIVE-15570:
---

Assignee: Zhiyuan Yang

> LLAP: Exception in HostAffinitySplitLocationProvider when running in 
> container mode
> ---
>
> Key: HIVE-15570
> URL: https://issues.apache.org/jira/browse/HIVE-15570
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Reporter: Rajesh Balamohan
>Assignee: Zhiyuan Yang
>Priority: Minor
> Attachments: HIVE-15570.1.patch
>
>
> Sometimes user might prefer to run with "hive.execution.mode=container" mode 
> when LLAP is stopped. If hive config for LLAP had 
> "hive.llap.client.consistent.splits=true" in client side, it would end up 
> throwing the following exception in {{Utils.java}}.
> {noformat}
> Caused by: java.lang.reflect.InvocationTargetException
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
> Method)
> at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
> at 
> org.apache.tez.common.ReflectionUtils.getNewInstance(ReflectionUtils.java:68)
> ... 25 more
> Caused by: java.lang.IllegalStateException: 
> org.apache.hadoop.hive.ql.exec.tez.HostAffinitySplitLocationProvider needs at 
> least 1 location to function
> at 
> com.google.common.base.Preconditions.checkState(Preconditions.java:149)
> at 
> org.apache.hadoop.hive.ql.exec.tez.HostAffinitySplitLocationProvider.(HostAffinitySplitLocationProvider.java:52)
> at 
> org.apache.hadoop.hive.ql.exec.tez.Utils.getSplitLocationProvider(Utils.java:54)
> at 
> org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.(HiveSplitGenerator.java:121)
> ... 30 more
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15570) LLAP: Exception in HostAffinitySplitLocationProvider when running in container mode

2017-01-20 Thread Zhiyuan Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhiyuan Yang updated HIVE-15570:

Attachment: HIVE-15570.2.patch

Same exception may be thrown in following cases and is hard to debug.
1. llap mode is used but there is no llap daemon
2. llap mode is used but llap daemon is during recovery
3. container mode is used but hive.llap.client.consistent.splits is true

In the new patch, hive.llap.client.consistent.splits won't be effective if 
container mode is used. If llap mode is used but there is no running daemon, we 
fall back to locations provided by splits. Then If there is no llap daemon at 
all, LlapTaskSchedulerService will detect this and report "No LLAP Daemons are 
running"; or if llap daemon finish recovery, query can still succeed.

> LLAP: Exception in HostAffinitySplitLocationProvider when running in 
> container mode
> ---
>
> Key: HIVE-15570
> URL: https://issues.apache.org/jira/browse/HIVE-15570
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Reporter: Rajesh Balamohan
>Assignee: Zhiyuan Yang
>Priority: Minor
> Attachments: HIVE-15570.1.patch, HIVE-15570.2.patch
>
>
> Sometimes user might prefer to run with "hive.execution.mode=container" mode 
> when LLAP is stopped. If hive config for LLAP had 
> "hive.llap.client.consistent.splits=true" in client side, it would end up 
> throwing the following exception in {{Utils.java}}.
> {noformat}
> Caused by: java.lang.reflect.InvocationTargetException
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
> Method)
> at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
> at 
> org.apache.tez.common.ReflectionUtils.getNewInstance(ReflectionUtils.java:68)
> ... 25 more
> Caused by: java.lang.IllegalStateException: 
> org.apache.hadoop.hive.ql.exec.tez.HostAffinitySplitLocationProvider needs at 
> least 1 location to function
> at 
> com.google.common.base.Preconditions.checkState(Preconditions.java:149)
> at 
> org.apache.hadoop.hive.ql.exec.tez.HostAffinitySplitLocationProvider.(HostAffinitySplitLocationProvider.java:52)
> at 
> org.apache.hadoop.hive.ql.exec.tez.Utils.getSplitLocationProvider(Utils.java:54)
> at 
> org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.(HiveSplitGenerator.java:121)
> ... 30 more
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15570) LLAP: Exception in HostAffinitySplitLocationProvider when running in container mode

2017-02-16 Thread Zhiyuan Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhiyuan Yang updated HIVE-15570:

Attachment: HIVE-15570.3.patch

Thanks [~sseth] and [~sershe]! I've revised the patch according to your 
comments. Now it simply ignores 'hive.llap.client.consistent.splits' in 
container mode. If it's in llap mode but no daemon is available, an error will 
be thrown.

> LLAP: Exception in HostAffinitySplitLocationProvider when running in 
> container mode
> ---
>
> Key: HIVE-15570
> URL: https://issues.apache.org/jira/browse/HIVE-15570
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Reporter: Rajesh Balamohan
>Assignee: Zhiyuan Yang
>Priority: Minor
> Attachments: HIVE-15570.1.patch, HIVE-15570.2.patch, 
> HIVE-15570.3.patch
>
>
> Sometimes user might prefer to run with "hive.execution.mode=container" mode 
> when LLAP is stopped. If hive config for LLAP had 
> "hive.llap.client.consistent.splits=true" in client side, it would end up 
> throwing the following exception in {{Utils.java}}.
> {noformat}
> Caused by: java.lang.reflect.InvocationTargetException
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
> Method)
> at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
> at 
> org.apache.tez.common.ReflectionUtils.getNewInstance(ReflectionUtils.java:68)
> ... 25 more
> Caused by: java.lang.IllegalStateException: 
> org.apache.hadoop.hive.ql.exec.tez.HostAffinitySplitLocationProvider needs at 
> least 1 location to function
> at 
> com.google.common.base.Preconditions.checkState(Preconditions.java:149)
> at 
> org.apache.hadoop.hive.ql.exec.tez.HostAffinitySplitLocationProvider.(HostAffinitySplitLocationProvider.java:52)
> at 
> org.apache.hadoop.hive.ql.exec.tez.Utils.getSplitLocationProvider(Utils.java:54)
> at 
> org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.(HiveSplitGenerator.java:121)
> ... 30 more
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-14731) Use Tez cartesian product edge in Hive (unpartitioned case only)

2017-02-23 Thread Zhiyuan Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhiyuan Yang updated HIVE-14731:

Attachment: HIVE-14731.14.patch

Patch rebased. Now we're detecting both SIMPLE_EDGE and CUSTOM_SIMPLE_EDGE in 
CrossProductHandler. [~hagleitn]

> Use Tez cartesian product edge in Hive (unpartitioned case only)
> 
>
> Key: HIVE-14731
> URL: https://issues.apache.org/jira/browse/HIVE-14731
> Project: Hive
>  Issue Type: Bug
>Reporter: Zhiyuan Yang
>Assignee: Zhiyuan Yang
> Attachments: HIVE-14731.10.patch, HIVE-14731.11.patch, 
> HIVE-14731.12.patch, HIVE-14731.13.patch, HIVE-14731.14.patch, 
> HIVE-14731.1.patch, HIVE-14731.2.patch, HIVE-14731.3.patch, 
> HIVE-14731.4.patch, HIVE-14731.5.patch, HIVE-14731.6.patch, 
> HIVE-14731.7.patch, HIVE-14731.8.patch, HIVE-14731.9.patch
>
>
> Given cartesian product edge is available in Tez now (see TEZ-3230), let's 
> integrate it into Hive on Tez. This allows us to have more than one reducer 
> in cross product queries.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (HIVE-16082) Allow user to change number of listener thread in LlapTaskCommunicator

2017-03-01 Thread Zhiyuan Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhiyuan Yang reassigned HIVE-16082:
---


> Allow user to change number of listener thread in LlapTaskCommunicator
> --
>
> Key: HIVE-16082
> URL: https://issues.apache.org/jira/browse/HIVE-16082
> Project: Hive
>  Issue Type: Bug
>Reporter: Zhiyuan Yang
>Assignee: Zhiyuan Yang
>
> Now LlapTaskCommunicator always has same number of RPC listener thread with 
> TezTaskCommunicatorImpl. There are scenarios when we want them different: for 
> example, in Llap only mode, we want less TezTaskCommunicatorImpl's listener 
> thread to reduce off-heap memory usage.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16082) Allow user to change number of listener thread in LlapTaskCommunicator

2017-03-01 Thread Zhiyuan Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhiyuan Yang updated HIVE-16082:

Attachment: HIVE-16082.1.patch

Please help review @[~sseth], [~rajesh.balamohan].

> Allow user to change number of listener thread in LlapTaskCommunicator
> --
>
> Key: HIVE-16082
> URL: https://issues.apache.org/jira/browse/HIVE-16082
> Project: Hive
>  Issue Type: Bug
>Reporter: Zhiyuan Yang
>Assignee: Zhiyuan Yang
> Attachments: HIVE-16082.1.patch
>
>
> Now LlapTaskCommunicator always has same number of RPC listener thread with 
> TezTaskCommunicatorImpl. There are scenarios when we want them different: for 
> example, in Llap only mode, we want less TezTaskCommunicatorImpl's listener 
> thread to reduce off-heap memory usage.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16082) Allow user to change number of listener thread in LlapTaskCommunicator

2017-03-01 Thread Zhiyuan Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhiyuan Yang updated HIVE-16082:

Attachment: (was: HIVE-16082.1.patch)

> Allow user to change number of listener thread in LlapTaskCommunicator
> --
>
> Key: HIVE-16082
> URL: https://issues.apache.org/jira/browse/HIVE-16082
> Project: Hive
>  Issue Type: Bug
>Reporter: Zhiyuan Yang
>Assignee: Zhiyuan Yang
>
> Now LlapTaskCommunicator always has same number of RPC listener thread with 
> TezTaskCommunicatorImpl. There are scenarios when we want them different: for 
> example, in Llap only mode, we want less TezTaskCommunicatorImpl's listener 
> thread to reduce off-heap memory usage.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16082) Allow user to change number of listener thread in LlapTaskCommunicator

2017-03-01 Thread Zhiyuan Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhiyuan Yang updated HIVE-16082:

Attachment: HIVE-16082.1.patch

> Allow user to change number of listener thread in LlapTaskCommunicator
> --
>
> Key: HIVE-16082
> URL: https://issues.apache.org/jira/browse/HIVE-16082
> Project: Hive
>  Issue Type: Bug
>Reporter: Zhiyuan Yang
>Assignee: Zhiyuan Yang
> Attachments: HIVE-16082.1.patch
>
>
> Now LlapTaskCommunicator always has same number of RPC listener thread with 
> TezTaskCommunicatorImpl. There are scenarios when we want them different: for 
> example, in Llap only mode, we want less TezTaskCommunicatorImpl's listener 
> thread to reduce off-heap memory usage.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16082) Allow user to change number of listener thread in LlapTaskCommunicator

2017-03-01 Thread Zhiyuan Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhiyuan Yang updated HIVE-16082:

Status: Patch Available  (was: Open)

> Allow user to change number of listener thread in LlapTaskCommunicator
> --
>
> Key: HIVE-16082
> URL: https://issues.apache.org/jira/browse/HIVE-16082
> Project: Hive
>  Issue Type: Bug
>Reporter: Zhiyuan Yang
>Assignee: Zhiyuan Yang
> Attachments: HIVE-16082.1.patch
>
>
> Now LlapTaskCommunicator always has same number of RPC listener thread with 
> TezTaskCommunicatorImpl. There are scenarios when we want them different: for 
> example, in Llap only mode, we want less TezTaskCommunicatorImpl's listener 
> thread to reduce off-heap memory usage.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (HIVE-17641) Visibility issue of Task.done cause Driver skip stages in parallel execution

2017-09-28 Thread Zhiyuan Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhiyuan Yang reassigned HIVE-17641:
---


> Visibility issue of Task.done cause Driver skip stages in parallel execution
> 
>
> Key: HIVE-17641
> URL: https://issues.apache.org/jira/browse/HIVE-17641
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.2.1
>Reporter: Zhiyuan Yang
>Assignee: Zhiyuan Yang
>
> Task.done is not volatile. In case of parallel execution, TaskRunner thread 
> set this value, and Driver thread read this value when it determines whether 
> a child task is runnable
> DriverContext.java
> {code}
> public static boolean isLaunchable(Task tsk) {
> return !tsk.getQueued() && !tsk.getInitialized() && tsk.isRunnable();
> {code}
> Task.java
> {code}
> public boolean isRunnable() {
> boolean isrunnable = true;
> if (parentTasks != null) {
>   for (Task parent : parentTasks) {
> if (!parent.done()) {
> {code}
> This happens without any synchronization, so a child can be not runnable even 
> all parents finish.
> To make it worse, Driver think query is successful when there is no running 
> task or runnable task, so query may finish without executing some stages.
> Driver.java
> {code}
> while (!destroyed && driverCxt.isRunning()) {
> {code}
> DriverContext.java
> {code}
> public synchronized boolean isRunning() {
> return !shutdown && (!running.isEmpty() || !runnable.isEmpty());
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17641) Visibility issue of Task.done cause Driver skip stages in parallel execution

2017-09-28 Thread Zhiyuan Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhiyuan Yang updated HIVE-17641:

Attachment: HIVE-17641.1.patch

> Visibility issue of Task.done cause Driver skip stages in parallel execution
> 
>
> Key: HIVE-17641
> URL: https://issues.apache.org/jira/browse/HIVE-17641
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.2.1
>Reporter: Zhiyuan Yang
>Assignee: Zhiyuan Yang
> Fix For: 1.2.1
>
> Attachments: HIVE-17641.1.patch
>
>
> Task.done is not volatile. In case of parallel execution, TaskRunner thread 
> set this value, and Driver thread read this value when it determines whether 
> a child task is runnable
> DriverContext.java
> {code}
> public static boolean isLaunchable(Task tsk) {
> return !tsk.getQueued() && !tsk.getInitialized() && tsk.isRunnable();
> {code}
> Task.java
> {code}
> public boolean isRunnable() {
> boolean isrunnable = true;
> if (parentTasks != null) {
>   for (Task parent : parentTasks) {
> if (!parent.done()) {
> {code}
> This happens without any synchronization, so a child can be not runnable even 
> all parents finish.
> To make it worse, Driver think query is successful when there is no running 
> task or runnable task, so query may finish without executing some stages.
> Driver.java
> {code}
> while (!destroyed && driverCxt.isRunning()) {
> {code}
> DriverContext.java
> {code}
> public synchronized boolean isRunning() {
> return !shutdown && (!running.isEmpty() || !runnable.isEmpty());
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17641) Visibility issue of Task.done cause Driver skip stages in parallel execution

2017-09-28 Thread Zhiyuan Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhiyuan Yang updated HIVE-17641:

Attachment: (was: HIVE-17641.1.patch)

> Visibility issue of Task.done cause Driver skip stages in parallel execution
> 
>
> Key: HIVE-17641
> URL: https://issues.apache.org/jira/browse/HIVE-17641
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.2.1
>Reporter: Zhiyuan Yang
>Assignee: Zhiyuan Yang
> Fix For: 1.2.1
>
> Attachments: HIVE-17641.1.patch
>
>
> Task.done is not volatile. In case of parallel execution, TaskRunner thread 
> set this value, and Driver thread read this value when it determines whether 
> a child task is runnable
> DriverContext.java
> {code}
> public static boolean isLaunchable(Task tsk) {
> return !tsk.getQueued() && !tsk.getInitialized() && tsk.isRunnable();
> {code}
> Task.java
> {code}
> public boolean isRunnable() {
> boolean isrunnable = true;
> if (parentTasks != null) {
>   for (Task parent : parentTasks) {
> if (!parent.done()) {
> {code}
> This happens without any synchronization, so a child can be not runnable even 
> all parents finish.
> To make it worse, Driver think query is successful when there is no running 
> task or runnable task, so query may finish without executing some stages.
> Driver.java
> {code}
> while (!destroyed && driverCxt.isRunning()) {
> {code}
> DriverContext.java
> {code}
> public synchronized boolean isRunning() {
> return !shutdown && (!running.isEmpty() || !runnable.isEmpty());
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17641) Visibility issue of Task.done cause Driver skip stages in parallel execution

2017-09-28 Thread Zhiyuan Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhiyuan Yang updated HIVE-17641:

Attachment: HIVE-17641.1.patch

> Visibility issue of Task.done cause Driver skip stages in parallel execution
> 
>
> Key: HIVE-17641
> URL: https://issues.apache.org/jira/browse/HIVE-17641
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.2.1
>Reporter: Zhiyuan Yang
>Assignee: Zhiyuan Yang
> Attachments: HIVE-17641.1.patch
>
>
> Task.done is not volatile. In case of parallel execution, TaskRunner thread 
> set this value, and Driver thread read this value when it determines whether 
> a child task is runnable
> DriverContext.java
> {code}
> public static boolean isLaunchable(Task tsk) {
> return !tsk.getQueued() && !tsk.getInitialized() && tsk.isRunnable();
> {code}
> Task.java
> {code}
> public boolean isRunnable() {
> boolean isrunnable = true;
> if (parentTasks != null) {
>   for (Task parent : parentTasks) {
> if (!parent.done()) {
> {code}
> This happens without any synchronization, so a child can be not runnable even 
> all parents finish.
> To make it worse, Driver think query is successful when there is no running 
> task or runnable task, so query may finish without executing some stages.
> Driver.java
> {code}
> while (!destroyed && driverCxt.isRunning()) {
> {code}
> DriverContext.java
> {code}
> public synchronized boolean isRunning() {
> return !shutdown && (!running.isEmpty() || !runnable.isEmpty());
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17641) Visibility issue of Task.done cause Driver skip stages in parallel execution

2017-09-28 Thread Zhiyuan Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhiyuan Yang updated HIVE-17641:

Fix Version/s: 1.2.1

> Visibility issue of Task.done cause Driver skip stages in parallel execution
> 
>
> Key: HIVE-17641
> URL: https://issues.apache.org/jira/browse/HIVE-17641
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.2.1
>Reporter: Zhiyuan Yang
>Assignee: Zhiyuan Yang
> Fix For: 1.2.1
>
> Attachments: HIVE-17641.1.patch
>
>
> Task.done is not volatile. In case of parallel execution, TaskRunner thread 
> set this value, and Driver thread read this value when it determines whether 
> a child task is runnable
> DriverContext.java
> {code}
> public static boolean isLaunchable(Task tsk) {
> return !tsk.getQueued() && !tsk.getInitialized() && tsk.isRunnable();
> {code}
> Task.java
> {code}
> public boolean isRunnable() {
> boolean isrunnable = true;
> if (parentTasks != null) {
>   for (Task parent : parentTasks) {
> if (!parent.done()) {
> {code}
> This happens without any synchronization, so a child can be not runnable even 
> all parents finish.
> To make it worse, Driver think query is successful when there is no running 
> task or runnable task, so query may finish without executing some stages.
> Driver.java
> {code}
> while (!destroyed && driverCxt.isRunning()) {
> {code}
> DriverContext.java
> {code}
> public synchronized boolean isRunning() {
> return !shutdown && (!running.isEmpty() || !runnable.isEmpty());
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Work started] (HIVE-17641) Visibility issue of Task.done cause Driver skip stages in parallel execution

2017-09-28 Thread Zhiyuan Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-17641 started by Zhiyuan Yang.
---
> Visibility issue of Task.done cause Driver skip stages in parallel execution
> 
>
> Key: HIVE-17641
> URL: https://issues.apache.org/jira/browse/HIVE-17641
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.2.1
>Reporter: Zhiyuan Yang
>Assignee: Zhiyuan Yang
> Fix For: 1.2.1
>
> Attachments: HIVE-17641.1.patch
>
>
> Task.done is not volatile. In case of parallel execution, TaskRunner thread 
> set this value, and Driver thread read this value when it determines whether 
> a child task is runnable
> DriverContext.java
> {code}
> public static boolean isLaunchable(Task tsk) {
> return !tsk.getQueued() && !tsk.getInitialized() && tsk.isRunnable();
> {code}
> Task.java
> {code}
> public boolean isRunnable() {
> boolean isrunnable = true;
> if (parentTasks != null) {
>   for (Task parent : parentTasks) {
> if (!parent.done()) {
> {code}
> This happens without any synchronization, so a child can be not runnable even 
> all parents finish.
> To make it worse, Driver think query is successful when there is no running 
> task or runnable task, so query may finish without executing some stages.
> Driver.java
> {code}
> while (!destroyed && driverCxt.isRunning()) {
> {code}
> DriverContext.java
> {code}
> public synchronized boolean isRunning() {
> return !shutdown && (!running.isEmpty() || !runnable.isEmpty());
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17641) Visibility issue of Task.done cause Driver skip stages in parallel execution

2017-09-28 Thread Zhiyuan Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhiyuan Yang updated HIVE-17641:

Attachment: (was: HIVE-17641.1.patch)

> Visibility issue of Task.done cause Driver skip stages in parallel execution
> 
>
> Key: HIVE-17641
> URL: https://issues.apache.org/jira/browse/HIVE-17641
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.2.1
>Reporter: Zhiyuan Yang
>Assignee: Zhiyuan Yang
> Fix For: 1.2.1
>
>
> Task.done is not volatile. In case of parallel execution, TaskRunner thread 
> set this value, and Driver thread read this value when it determines whether 
> a child task is runnable
> DriverContext.java
> {code}
> public static boolean isLaunchable(Task tsk) {
> return !tsk.getQueued() && !tsk.getInitialized() && tsk.isRunnable();
> {code}
> Task.java
> {code}
> public boolean isRunnable() {
> boolean isrunnable = true;
> if (parentTasks != null) {
>   for (Task parent : parentTasks) {
> if (!parent.done()) {
> {code}
> This happens without any synchronization, so a child can be not runnable even 
> all parents finish.
> To make it worse, Driver think query is successful when there is no running 
> task or runnable task, so query may finish without executing some stages.
> Driver.java
> {code}
> while (!destroyed && driverCxt.isRunning()) {
> {code}
> DriverContext.java
> {code}
> public synchronized boolean isRunning() {
> return !shutdown && (!running.isEmpty() || !runnable.isEmpty());
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17641) Visibility issue of Task.done cause Driver skip stages in parallel execution

2017-09-28 Thread Zhiyuan Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhiyuan Yang updated HIVE-17641:

Fix Version/s: (was: 1.2.1)

> Visibility issue of Task.done cause Driver skip stages in parallel execution
> 
>
> Key: HIVE-17641
> URL: https://issues.apache.org/jira/browse/HIVE-17641
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.2.1
>Reporter: Zhiyuan Yang
>Assignee: Zhiyuan Yang
>
> Task.done is not volatile. In case of parallel execution, TaskRunner thread 
> set this value, and Driver thread read this value when it determines whether 
> a child task is runnable
> DriverContext.java
> {code}
> public static boolean isLaunchable(Task tsk) {
> return !tsk.getQueued() && !tsk.getInitialized() && tsk.isRunnable();
> {code}
> Task.java
> {code}
> public boolean isRunnable() {
> boolean isrunnable = true;
> if (parentTasks != null) {
>   for (Task parent : parentTasks) {
> if (!parent.done()) {
> {code}
> This happens without any synchronization, so a child can be not runnable even 
> all parents finish.
> To make it worse, Driver think query is successful when there is no running 
> task or runnable task, so query may finish without executing some stages.
> Driver.java
> {code}
> while (!destroyed && driverCxt.isRunning()) {
> {code}
> DriverContext.java
> {code}
> public synchronized boolean isRunning() {
> return !shutdown && (!running.isEmpty() || !runnable.isEmpty());
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17641) Visibility issue of Task.done cause Driver skip stages in parallel execution

2017-09-28 Thread Zhiyuan Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhiyuan Yang updated HIVE-17641:

Affects Version/s: (was: 1.2.1)
   0.14.1

> Visibility issue of Task.done cause Driver skip stages in parallel execution
> 
>
> Key: HIVE-17641
> URL: https://issues.apache.org/jira/browse/HIVE-17641
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.14.1
>Reporter: Zhiyuan Yang
>Assignee: Zhiyuan Yang
>
> Task.done is not volatile. In case of parallel execution, TaskRunner thread 
> set this value, and Driver thread read this value when it determines whether 
> a child task is runnable
> DriverContext.java
> {code}
> public static boolean isLaunchable(Task tsk) {
> return !tsk.getQueued() && !tsk.getInitialized() && tsk.isRunnable();
> {code}
> Task.java
> {code}
> public boolean isRunnable() {
> boolean isrunnable = true;
> if (parentTasks != null) {
>   for (Task parent : parentTasks) {
> if (!parent.done()) {
> {code}
> This happens without any synchronization, so a child can be not runnable even 
> all parents finish.
> To make it worse, Driver think query is successful when there is no running 
> task or runnable task, so query may finish without executing some stages.
> Driver.java
> {code}
> while (!destroyed && driverCxt.isRunning()) {
> {code}
> DriverContext.java
> {code}
> public synchronized boolean isRunning() {
> return !shutdown && (!running.isEmpty() || !runnable.isEmpty());
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17641) Visibility issue of Task.done cause Driver skip stages in parallel execution

2017-09-28 Thread Zhiyuan Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhiyuan Yang updated HIVE-17641:

Status: Patch Available  (was: In Progress)

> Visibility issue of Task.done cause Driver skip stages in parallel execution
> 
>
> Key: HIVE-17641
> URL: https://issues.apache.org/jira/browse/HIVE-17641
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.14.1
>Reporter: Zhiyuan Yang
>Assignee: Zhiyuan Yang
> Attachments: HIVE-17641.1-branch-0.14.patch
>
>
> Task.done is not volatile. In case of parallel execution, TaskRunner thread 
> set this value, and Driver thread read this value when it determines whether 
> a child task is runnable
> DriverContext.java
> {code}
> public static boolean isLaunchable(Task tsk) {
> return !tsk.getQueued() && !tsk.getInitialized() && tsk.isRunnable();
> {code}
> Task.java
> {code}
> public boolean isRunnable() {
> boolean isrunnable = true;
> if (parentTasks != null) {
>   for (Task parent : parentTasks) {
> if (!parent.done()) {
> {code}
> This happens without any synchronization, so a child can be not runnable even 
> all parents finish.
> To make it worse, Driver think query is successful when there is no running 
> task or runnable task, so query may finish without executing some stages.
> Driver.java
> {code}
> while (!destroyed && driverCxt.isRunning()) {
> {code}
> DriverContext.java
> {code}
> public synchronized boolean isRunning() {
> return !shutdown && (!running.isEmpty() || !runnable.isEmpty());
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17641) Visibility issue of Task.done cause Driver skip stages in parallel execution

2017-09-28 Thread Zhiyuan Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhiyuan Yang updated HIVE-17641:

Attachment: HIVE-17641.1-branch-0.14.patch

> Visibility issue of Task.done cause Driver skip stages in parallel execution
> 
>
> Key: HIVE-17641
> URL: https://issues.apache.org/jira/browse/HIVE-17641
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.14.1
>Reporter: Zhiyuan Yang
>Assignee: Zhiyuan Yang
> Attachments: HIVE-17641.1-branch-0.14.patch
>
>
> Task.done is not volatile. In case of parallel execution, TaskRunner thread 
> set this value, and Driver thread read this value when it determines whether 
> a child task is runnable
> DriverContext.java
> {code}
> public static boolean isLaunchable(Task tsk) {
> return !tsk.getQueued() && !tsk.getInitialized() && tsk.isRunnable();
> {code}
> Task.java
> {code}
> public boolean isRunnable() {
> boolean isrunnable = true;
> if (parentTasks != null) {
>   for (Task parent : parentTasks) {
> if (!parent.done()) {
> {code}
> This happens without any synchronization, so a child can be not runnable even 
> all parents finish.
> To make it worse, Driver think query is successful when there is no running 
> task or runnable task, so query may finish without executing some stages.
> Driver.java
> {code}
> while (!destroyed && driverCxt.isRunning()) {
> {code}
> DriverContext.java
> {code}
> public synchronized boolean isRunning() {
> return !shutdown && (!running.isEmpty() || !runnable.isEmpty());
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17643) recent WM changes broke reopen due to spurious overloads

2017-09-28 Thread Zhiyuan Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16185104#comment-16185104
 ] 

Zhiyuan Yang commented on HIVE-17643:
-

Oops, sorry I didn't catch that in previous review...Will take a look.

> recent WM changes broke reopen due to spurious overloads
> 
>
> Key: HIVE-17643
> URL: https://issues.apache.org/jira/browse/HIVE-17643
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-17643.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17641) Visibility issue of Task.done cause Driver skip stages in parallel execution

2017-09-29 Thread Zhiyuan Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16186192#comment-16186192
 ] 

Zhiyuan Yang commented on HIVE-17641:
-

[~ashutoshc] Can you help review this patch? It's a one-line fix with extra 
logging. Thanks!

> Visibility issue of Task.done cause Driver skip stages in parallel execution
> 
>
> Key: HIVE-17641
> URL: https://issues.apache.org/jira/browse/HIVE-17641
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.14.1
>Reporter: Zhiyuan Yang
>Assignee: Zhiyuan Yang
> Attachments: HIVE-17641.1-branch-0.14.patch
>
>
> Task.done is not volatile. In case of parallel execution, TaskRunner thread 
> set this value, and Driver thread read this value when it determines whether 
> a child task is runnable
> DriverContext.java
> {code}
> public static boolean isLaunchable(Task tsk) {
> return !tsk.getQueued() && !tsk.getInitialized() && tsk.isRunnable();
> {code}
> Task.java
> {code}
> public boolean isRunnable() {
> boolean isrunnable = true;
> if (parentTasks != null) {
>   for (Task parent : parentTasks) {
> if (!parent.done()) {
> {code}
> This happens without any synchronization, so a child can be not runnable even 
> all parents finish.
> To make it worse, Driver think query is successful when there is no running 
> task or runnable task, so query may finish without executing some stages.
> Driver.java
> {code}
> while (!destroyed && driverCxt.isRunning()) {
> {code}
> DriverContext.java
> {code}
> public synchronized boolean isRunning() {
> return !shutdown && (!running.isEmpty() || !runnable.isEmpty());
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17641) Visibility issue of Task.done cause Driver skip stages in parallel execution

2017-10-02 Thread Zhiyuan Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhiyuan Yang updated HIVE-17641:

Attachment: HIVE-17641.2-branch-0.14.patch

Thanks [~ashutoshc] for review! I've attached new patch to address comment. 
Actually that exception msg can be omit because other place already print how 
many task in total, how many task has been executed. LOG.info won't be noise. 
That function is not called a lot. Anyway, fixed all of them in new patch.

> Visibility issue of Task.done cause Driver skip stages in parallel execution
> 
>
> Key: HIVE-17641
> URL: https://issues.apache.org/jira/browse/HIVE-17641
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.14.1
>Reporter: Zhiyuan Yang
>Assignee: Zhiyuan Yang
> Attachments: HIVE-17641.1-branch-0.14.patch, 
> HIVE-17641.2-branch-0.14.patch
>
>
> Task.done is not volatile. In case of parallel execution, TaskRunner thread 
> set this value, and Driver thread read this value when it determines whether 
> a child task is runnable
> DriverContext.java
> {code}
> public static boolean isLaunchable(Task tsk) {
> return !tsk.getQueued() && !tsk.getInitialized() && tsk.isRunnable();
> {code}
> Task.java
> {code}
> public boolean isRunnable() {
> boolean isrunnable = true;
> if (parentTasks != null) {
>   for (Task parent : parentTasks) {
> if (!parent.done()) {
> {code}
> This happens without any synchronization, so a child can be not runnable even 
> all parents finish.
> To make it worse, Driver think query is successful when there is no running 
> task or runnable task, so query may finish without executing some stages.
> Driver.java
> {code}
> while (!destroyed && driverCxt.isRunning()) {
> {code}
> DriverContext.java
> {code}
> public synchronized boolean isRunning() {
> return !shutdown && (!running.isEmpty() || !runnable.isEmpty());
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17641) Visibility issue of Task.done cause Driver skip stages in parallel execution

2017-10-02 Thread Zhiyuan Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16189069#comment-16189069
 ] 

Zhiyuan Yang commented on HIVE-17641:
-

Thanks [~ashutoshc]!

> Visibility issue of Task.done cause Driver skip stages in parallel execution
> 
>
> Key: HIVE-17641
> URL: https://issues.apache.org/jira/browse/HIVE-17641
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.14.1
>Reporter: Zhiyuan Yang
>Assignee: Zhiyuan Yang
> Attachments: HIVE-17641.1-branch-0.14.patch, 
> HIVE-17641.2-branch-0.14.patch
>
>
> Task.done is not volatile. In case of parallel execution, TaskRunner thread 
> set this value, and Driver thread read this value when it determines whether 
> a child task is runnable
> DriverContext.java
> {code}
> public static boolean isLaunchable(Task tsk) {
> return !tsk.getQueued() && !tsk.getInitialized() && tsk.isRunnable();
> {code}
> Task.java
> {code}
> public boolean isRunnable() {
> boolean isrunnable = true;
> if (parentTasks != null) {
>   for (Task parent : parentTasks) {
> if (!parent.done()) {
> {code}
> This happens without any synchronization, so a child can be not runnable even 
> all parents finish.
> To make it worse, Driver think query is successful when there is no running 
> task or runnable task, so query may finish without executing some stages.
> Driver.java
> {code}
> while (!destroyed && driverCxt.isRunning()) {
> {code}
> DriverContext.java
> {code}
> public synchronized boolean isRunning() {
> return !shutdown && (!running.isEmpty() || !runnable.isEmpty());
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17641) Visibility issue of Task.done cause Driver skip stages in parallel execution

2017-10-04 Thread Zhiyuan Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhiyuan Yang updated HIVE-17641:

Attachment: HIVE-17641.3-branch-0.14.patch

Removed check in Driver class since we don't have a way to tell how many stage 
to run (due to conditional task)

> Visibility issue of Task.done cause Driver skip stages in parallel execution
> 
>
> Key: HIVE-17641
> URL: https://issues.apache.org/jira/browse/HIVE-17641
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.14.1
>Reporter: Zhiyuan Yang
>Assignee: Zhiyuan Yang
> Attachments: HIVE-17641.1-branch-0.14.patch, 
> HIVE-17641.2-branch-0.14.patch, HIVE-17641.3-branch-0.14.patch
>
>
> Task.done is not volatile. In case of parallel execution, TaskRunner thread 
> set this value, and Driver thread read this value when it determines whether 
> a child task is runnable
> DriverContext.java
> {code}
> public static boolean isLaunchable(Task tsk) {
> return !tsk.getQueued() && !tsk.getInitialized() && tsk.isRunnable();
> {code}
> Task.java
> {code}
> public boolean isRunnable() {
> boolean isrunnable = true;
> if (parentTasks != null) {
>   for (Task parent : parentTasks) {
> if (!parent.done()) {
> {code}
> This happens without any synchronization, so a child can be not runnable even 
> all parents finish.
> To make it worse, Driver think query is successful when there is no running 
> task or runnable task, so query may finish without executing some stages.
> Driver.java
> {code}
> while (!destroyed && driverCxt.isRunning()) {
> {code}
> DriverContext.java
> {code}
> public synchronized boolean isRunning() {
> return !shutdown && (!running.isEmpty() || !runnable.isEmpty());
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-14731) Use Tez cartesian product edge in Hive (unpartitioned case only)

2017-10-12 Thread Zhiyuan Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhiyuan Yang updated HIVE-14731:

Attachment: HIVE-14731.21.patch

> Use Tez cartesian product edge in Hive (unpartitioned case only)
> 
>
> Key: HIVE-14731
> URL: https://issues.apache.org/jira/browse/HIVE-14731
> Project: Hive
>  Issue Type: Bug
>Reporter: Zhiyuan Yang
>Assignee: Zhiyuan Yang
> Attachments: HIVE-14731.1.patch, HIVE-14731.10.patch, 
> HIVE-14731.11.patch, HIVE-14731.12.patch, HIVE-14731.13.patch, 
> HIVE-14731.14.patch, HIVE-14731.15.patch, HIVE-14731.16.patch, 
> HIVE-14731.17.patch, HIVE-14731.18.patch, HIVE-14731.19.patch, 
> HIVE-14731.2.patch, HIVE-14731.20.patch, HIVE-14731.21.patch, 
> HIVE-14731.3.patch, HIVE-14731.4.patch, HIVE-14731.5.patch, 
> HIVE-14731.6.patch, HIVE-14731.7.patch, HIVE-14731.8.patch, HIVE-14731.9.patch
>
>
> Given cartesian product edge is available in Tez now (see TEZ-3230), let's 
> integrate it into Hive on Tez. This allows us to have more than one reducer 
> in cross product queries.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-14731) Use Tez cartesian product edge in Hive (unpartitioned case only)

2017-10-13 Thread Zhiyuan Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhiyuan Yang updated HIVE-14731:

Attachment: HIVE-14731.22.patch

> Use Tez cartesian product edge in Hive (unpartitioned case only)
> 
>
> Key: HIVE-14731
> URL: https://issues.apache.org/jira/browse/HIVE-14731
> Project: Hive
>  Issue Type: Bug
>Reporter: Zhiyuan Yang
>Assignee: Zhiyuan Yang
> Attachments: HIVE-14731.1.patch, HIVE-14731.10.patch, 
> HIVE-14731.11.patch, HIVE-14731.12.patch, HIVE-14731.13.patch, 
> HIVE-14731.14.patch, HIVE-14731.15.patch, HIVE-14731.16.patch, 
> HIVE-14731.17.patch, HIVE-14731.18.patch, HIVE-14731.19.patch, 
> HIVE-14731.2.patch, HIVE-14731.20.patch, HIVE-14731.21.patch, 
> HIVE-14731.22.patch, HIVE-14731.3.patch, HIVE-14731.4.patch, 
> HIVE-14731.5.patch, HIVE-14731.6.patch, HIVE-14731.7.patch, 
> HIVE-14731.8.patch, HIVE-14731.9.patch
>
>
> Given cartesian product edge is available in Tez now (see TEZ-3230), let's 
> integrate it into Hive on Tez. This allows us to have more than one reducer 
> in cross product queries.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-14731) Use Tez cartesian product edge in Hive (unpartitioned case only)

2017-10-15 Thread Zhiyuan Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16205373#comment-16205373
 ] 

Zhiyuan Yang commented on HIVE-14731:
-

Rebased the patch again. Test was clean except for flaky or irrelevant test. 
Please review. CC [~hagleitn]

> Use Tez cartesian product edge in Hive (unpartitioned case only)
> 
>
> Key: HIVE-14731
> URL: https://issues.apache.org/jira/browse/HIVE-14731
> Project: Hive
>  Issue Type: Bug
>Reporter: Zhiyuan Yang
>Assignee: Zhiyuan Yang
> Attachments: HIVE-14731.1.patch, HIVE-14731.10.patch, 
> HIVE-14731.11.patch, HIVE-14731.12.patch, HIVE-14731.13.patch, 
> HIVE-14731.14.patch, HIVE-14731.15.patch, HIVE-14731.16.patch, 
> HIVE-14731.17.patch, HIVE-14731.18.patch, HIVE-14731.19.patch, 
> HIVE-14731.2.patch, HIVE-14731.20.patch, HIVE-14731.21.patch, 
> HIVE-14731.22.patch, HIVE-14731.3.patch, HIVE-14731.4.patch, 
> HIVE-14731.5.patch, HIVE-14731.6.patch, HIVE-14731.7.patch, 
> HIVE-14731.8.patch, HIVE-14731.9.patch
>
>
> Given cartesian product edge is available in Tez now (see TEZ-3230), let's 
> integrate it into Hive on Tez. This allows us to have more than one reducer 
> in cross product queries.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-14731) Use Tez cartesian product edge in Hive (unpartitioned case only)

2017-10-23 Thread Zhiyuan Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhiyuan Yang updated HIVE-14731:

Attachment: HIVE-14731.23.patch

Rebase again...

> Use Tez cartesian product edge in Hive (unpartitioned case only)
> 
>
> Key: HIVE-14731
> URL: https://issues.apache.org/jira/browse/HIVE-14731
> Project: Hive
>  Issue Type: Bug
>Reporter: Zhiyuan Yang
>Assignee: Zhiyuan Yang
> Attachments: HIVE-14731.1.patch, HIVE-14731.10.patch, 
> HIVE-14731.11.patch, HIVE-14731.12.patch, HIVE-14731.13.patch, 
> HIVE-14731.14.patch, HIVE-14731.15.patch, HIVE-14731.16.patch, 
> HIVE-14731.17.patch, HIVE-14731.18.patch, HIVE-14731.19.patch, 
> HIVE-14731.2.patch, HIVE-14731.20.patch, HIVE-14731.21.patch, 
> HIVE-14731.22.patch, HIVE-14731.23.patch, HIVE-14731.3.patch, 
> HIVE-14731.4.patch, HIVE-14731.5.patch, HIVE-14731.6.patch, 
> HIVE-14731.7.patch, HIVE-14731.8.patch, HIVE-14731.9.patch
>
>
> Given cartesian product edge is available in Tez now (see TEZ-3230), let's 
> integrate it into Hive on Tez. This allows us to have more than one reducer 
> in cross product queries.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17823) Fix subquery Qtest of Hive on Spark

2017-10-25 Thread Zhiyuan Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16219646#comment-16219646
 ] 

Zhiyuan Yang commented on HIVE-17823:
-

subquery_multi on TestSparkCliDriver seems to generate different output order 
on different machine. Specifically this query
{code}
select * from part_null where p_size IN (select p_size from part_null) AND 
p_brand IN (select p_brand from part_null)
{code}

it failed on my local machine (before HIVE-14731 was committed) like this:
{code}
237d236
< 78487 NULLManufacturer#6  Brand#52LARGE BRUSHED BRASS 23  
MED BAG 1464.48 hely blith
238a238
> 78487 NULLManufacturer#6  Brand#52LARGE BRUSHED BRASS 23  
> MED BAG 1464.48 hely blith
{code}
After I overwrite it with my local result, it failed on Apache jenkins.

> Fix subquery Qtest of Hive on Spark
> ---
>
> Key: HIVE-17823
> URL: https://issues.apache.org/jira/browse/HIVE-17823
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Dapeng Sun
>Assignee: Dapeng Sun
> Attachments: HIVE-17823.001.patch
>
>
> The JIRA is targeted to fix the Qtest files failures of HoS due to HIVE-17726 
> introduced subquery fix.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (HIVE-17823) Fix subquery Qtest of Hive on Spark

2017-10-25 Thread Zhiyuan Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16219646#comment-16219646
 ] 

Zhiyuan Yang edited comment on HIVE-17823 at 10/25/17 10:16 PM:


subquery_multi on TestSparkCliDriver seems to generate different output order 
on different machine. Specifically this query
{code}
select * from part_null where p_size IN (select p_size from part_null) AND 
p_brand IN (select p_brand from part_null)
{code}

it failed on my local machine (before HIVE-14731 was committed) like this:
{code}
237d236
< 78487 NULLManufacturer#6  Brand#52LARGE BRUSHED BRASS 23  
MED BAG 1464.48 hely blith
238a238
> 78487 NULLManufacturer#6  Brand#52LARGE BRUSHED BRASS 23  
> MED BAG 1464.48 hely blith
{code}
After I overwrite it with my local result, it failed on Apache jenkins with a 
similar diff.


was (Author: aplusplus):
subquery_multi on TestSparkCliDriver seems to generate different output order 
on different machine. Specifically this query
{code}
select * from part_null where p_size IN (select p_size from part_null) AND 
p_brand IN (select p_brand from part_null)
{code}

it failed on my local machine (before HIVE-14731 was committed) like this:
{code}
237d236
< 78487 NULLManufacturer#6  Brand#52LARGE BRUSHED BRASS 23  
MED BAG 1464.48 hely blith
238a238
> 78487 NULLManufacturer#6  Brand#52LARGE BRUSHED BRASS 23  
> MED BAG 1464.48 hely blith
{code}
After I overwrite it with my local result, it failed on Apache jenkins.

> Fix subquery Qtest of Hive on Spark
> ---
>
> Key: HIVE-17823
> URL: https://issues.apache.org/jira/browse/HIVE-17823
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Dapeng Sun
>Assignee: Dapeng Sun
> Attachments: HIVE-17823.001.patch
>
>
> The JIRA is targeted to fix the Qtest files failures of HoS due to HIVE-17726 
> introduced subquery fix.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Reopened] (HIVE-17823) Fix subquery Qtest of Hive on Spark

2017-10-25 Thread Zhiyuan Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhiyuan Yang reopened HIVE-17823:
-

I'll revert relevant changes in HIVE-14731 to unblock jenkins run, but this 
still need investigating.

> Fix subquery Qtest of Hive on Spark
> ---
>
> Key: HIVE-17823
> URL: https://issues.apache.org/jira/browse/HIVE-17823
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Dapeng Sun
>Assignee: Dapeng Sun
> Attachments: HIVE-17823.001.patch
>
>
> The JIRA is targeted to fix the Qtest files failures of HoS due to HIVE-17726 
> introduced subquery fix.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-14731) Use Tez cartesian product edge in Hive (unpartitioned case only)

2017-10-25 Thread Zhiyuan Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhiyuan Yang updated HIVE-14731:

Attachment: HIVE-14731.addendum.patch

Previous patch breaks qtest subquery_multi on SparkCliDriver. But same test 
failed on my local machine before committing the patch. There seems to be some 
non-determinism for this test on SparkCliDriver (reopened HIVE-17823 for this). 
But let's revert relevant change here to unblock jenkins run. CC [~hagleitn]

> Use Tez cartesian product edge in Hive (unpartitioned case only)
> 
>
> Key: HIVE-14731
> URL: https://issues.apache.org/jira/browse/HIVE-14731
> Project: Hive
>  Issue Type: Bug
>Reporter: Zhiyuan Yang
>Assignee: Zhiyuan Yang
> Attachments: HIVE-14731.1.patch, HIVE-14731.10.patch, 
> HIVE-14731.11.patch, HIVE-14731.12.patch, HIVE-14731.13.patch, 
> HIVE-14731.14.patch, HIVE-14731.15.patch, HIVE-14731.16.patch, 
> HIVE-14731.17.patch, HIVE-14731.18.patch, HIVE-14731.19.patch, 
> HIVE-14731.2.patch, HIVE-14731.20.patch, HIVE-14731.21.patch, 
> HIVE-14731.22.patch, HIVE-14731.23.patch, HIVE-14731.3.patch, 
> HIVE-14731.4.patch, HIVE-14731.5.patch, HIVE-14731.6.patch, 
> HIVE-14731.7.patch, HIVE-14731.8.patch, HIVE-14731.9.patch, 
> HIVE-14731.addendum.patch
>
>
> Given cartesian product edge is available in Tez now (see TEZ-3230), let's 
> integrate it into Hive on Tez. This allows us to have more than one reducer 
> in cross product queries.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-14731) Use Tez cartesian product edge in Hive (unpartitioned case only)

2017-10-26 Thread Zhiyuan Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16220885#comment-16220885
 ] 

Zhiyuan Yang commented on HIVE-14731:
-

[~kgyrtkirk] The main reason is mapjoin decide parallelism according to 
#splits, which is not good enough for cross product. The cost of xprod is 
mostly determined by #records instead of #split.

> Use Tez cartesian product edge in Hive (unpartitioned case only)
> 
>
> Key: HIVE-14731
> URL: https://issues.apache.org/jira/browse/HIVE-14731
> Project: Hive
>  Issue Type: Bug
>Reporter: Zhiyuan Yang
>Assignee: Zhiyuan Yang
> Attachments: HIVE-14731.1.patch, HIVE-14731.10.patch, 
> HIVE-14731.11.patch, HIVE-14731.12.patch, HIVE-14731.13.patch, 
> HIVE-14731.14.patch, HIVE-14731.15.patch, HIVE-14731.16.patch, 
> HIVE-14731.17.patch, HIVE-14731.18.patch, HIVE-14731.19.patch, 
> HIVE-14731.2.patch, HIVE-14731.20.patch, HIVE-14731.21.patch, 
> HIVE-14731.22.patch, HIVE-14731.23.patch, HIVE-14731.3.patch, 
> HIVE-14731.4.patch, HIVE-14731.5.patch, HIVE-14731.6.patch, 
> HIVE-14731.7.patch, HIVE-14731.8.patch, HIVE-14731.9.patch, 
> HIVE-14731.addendum.patch
>
>
> Given cartesian product edge is available in Tez now (see TEZ-3230), let's 
> integrate it into Hive on Tez. This allows us to have more than one reducer 
> in cross product queries.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-15016) Run tests with Hadoop 3.0.0-beta1

2017-10-27 Thread Zhiyuan Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16223116#comment-16223116
 ] 

Zhiyuan Yang commented on HIVE-15016:
-

[~aihuaxu] Can you point me to the full log?

> Run tests with Hadoop 3.0.0-beta1
> -
>
> Key: HIVE-15016
> URL: https://issues.apache.org/jira/browse/HIVE-15016
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive
>Affects Versions: 3.0.0
>Reporter: Sergio Peña
>Assignee: Aihua Xu
> Attachments: HIVE-15016.2.patch, HIVE-15016.3.patch, 
> HIVE-15016.4.patch, HIVE-15016.5.patch, HIVE-15016.6.patch, 
> HIVE-15016.7.patch, HIVE-15016.8.patch, HIVE-15016.patch, 
> Hadoop3Upstream.patch
>
>
> Hadoop 3.0.0-alpha1 was released back on Sep/16 to allow other components run 
> tests against this new version before GA.
> We should start running tests with Hive to validate compatibility against 
> Hadoop 3.0.
> NOTE: The patch used to test must not be committed to Hive until Hadoop 3.0 
> GA is released.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-15016) Run tests with Hadoop 3.0.0-beta1

2017-10-30 Thread Zhiyuan Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16225505#comment-16225505
 ] 

Zhiyuan Yang commented on HIVE-15016:
-

[~aihuaxu] I reproduced the error locally and that negative number come from 
Hive, specifically this line

 InputSplit[] groupedSplits =
  tezGrouper.getGroupedSplits(conf, rawSplits, 
*bucketTaskMap.get(bucketId),*
  HiveInputFormat.class.getName(), new 
ColumnarSplitSizeEstimator(), splitLocationProvider);



> Run tests with Hadoop 3.0.0-beta1
> -
>
> Key: HIVE-15016
> URL: https://issues.apache.org/jira/browse/HIVE-15016
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive
>Affects Versions: 3.0.0
>Reporter: Sergio Peña
>Assignee: Aihua Xu
> Attachments: HIVE-15016.2.patch, HIVE-15016.3.patch, 
> HIVE-15016.4.patch, HIVE-15016.5.patch, HIVE-15016.6.patch, 
> HIVE-15016.7.patch, HIVE-15016.8.patch, HIVE-15016.patch, 
> Hadoop3Upstream.patch
>
>
> Hadoop 3.0.0-alpha1 was released back on Sep/16 to allow other components run 
> tests against this new version before GA.
> We should start running tests with Hive to validate compatibility against 
> Hadoop 3.0.
> NOTE: The patch used to test must not be committed to Hive until Hadoop 3.0 
> GA is released.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-18064) Hive on Tez parallel order by

2017-11-14 Thread Zhiyuan Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhiyuan Yang updated HIVE-18064:

Issue Type: New Feature  (was: Bug)

> Hive on Tez parallel order by
> -
>
> Key: HIVE-18064
> URL: https://issues.apache.org/jira/browse/HIVE-18064
> Project: Hive
>  Issue Type: New Feature
>Reporter: Zhiyuan Yang
>Assignee: Zhiyuan Yang
>
> We've built parallel sorting in TEZ-3837. It does sampling as output is 
> generated and figure out a range partitioner for shuffle edge. Each reducer 
> output a sorted span. This is mainly for external consumption since output 
> files need to be read in certain order.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (HIVE-18064) Hive on Tez parallel order by

2017-11-14 Thread Zhiyuan Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhiyuan Yang reassigned HIVE-18064:
---


> Hive on Tez parallel order by
> -
>
> Key: HIVE-18064
> URL: https://issues.apache.org/jira/browse/HIVE-18064
> Project: Hive
>  Issue Type: Bug
>Reporter: Zhiyuan Yang
>Assignee: Zhiyuan Yang
>
> We've built parallel sorting in TEZ-3837. It does sampling as output is 
> generated and figure out a range partitioner for shuffle edge. Each reducer 
> output a sorted span. This is mainly for external consumption since output 
> files need to be read in certain order.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-18064) Hive on Tez parallel order by

2017-11-14 Thread Zhiyuan Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhiyuan Yang updated HIVE-18064:

Status: Open  (was: Patch Available)

> Hive on Tez parallel order by
> -
>
> Key: HIVE-18064
> URL: https://issues.apache.org/jira/browse/HIVE-18064
> Project: Hive
>  Issue Type: New Feature
>Reporter: Zhiyuan Yang
>Assignee: Zhiyuan Yang
> Attachments: HIVE-18064.1.patch
>
>
> We've built parallel sorting in TEZ-3837. It does sampling as output is 
> generated and figure out a range partitioner for shuffle edge. Each reducer 
> output a sorted span. This is mainly for external consumption since output 
> files need to be read in certain order.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-18064) Hive on Tez parallel order by

2017-11-14 Thread Zhiyuan Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhiyuan Yang updated HIVE-18064:

Attachment: HIVE-18064.1.patch

> Hive on Tez parallel order by
> -
>
> Key: HIVE-18064
> URL: https://issues.apache.org/jira/browse/HIVE-18064
> Project: Hive
>  Issue Type: New Feature
>Reporter: Zhiyuan Yang
>Assignee: Zhiyuan Yang
> Attachments: HIVE-18064.1.patch
>
>
> We've built parallel sorting in TEZ-3837. It does sampling as output is 
> generated and figure out a range partitioner for shuffle edge. Each reducer 
> output a sorted span. This is mainly for external consumption since output 
> files need to be read in certain order.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-18064) Hive on Tez parallel order by

2017-11-14 Thread Zhiyuan Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhiyuan Yang updated HIVE-18064:

Status: Patch Available  (was: Open)

> Hive on Tez parallel order by
> -
>
> Key: HIVE-18064
> URL: https://issues.apache.org/jira/browse/HIVE-18064
> Project: Hive
>  Issue Type: New Feature
>Reporter: Zhiyuan Yang
>Assignee: Zhiyuan Yang
> Attachments: HIVE-18064.1.patch
>
>
> We've built parallel sorting in TEZ-3837. It does sampling as output is 
> generated and figure out a range partitioner for shuffle edge. Each reducer 
> output a sorted span. This is mainly for external consumption since output 
> files need to be read in certain order.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (HIVE-18099) Hive shouldn't pickup mapreduce conf for Tez

2017-11-17 Thread Zhiyuan Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhiyuan Yang reassigned HIVE-18099:
---


> Hive shouldn't pickup mapreduce conf for Tez
> 
>
> Key: HIVE-18099
> URL: https://issues.apache.org/jira/browse/HIVE-18099
> Project: Hive
>  Issue Type: Bug
>Reporter: Zhiyuan Yang
>Assignee: Zhiyuan Yang
>
> Right now Hive is reading some mapreduce conf for Tez engine. 
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/tez/DagUtils.java?utf8=%E2%9C%93#L720
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/tez/DagUtils.java?#L796
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/tez/DagUtils.java?#L860



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-18099) Hive shouldn't pickup mapreduce conf for Tez

2017-11-17 Thread Zhiyuan Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16257733#comment-16257733
 ] 

Zhiyuan Yang commented on HIVE-18099:
-

Most likely in very old code. Guessing it was for smooth transition from MR to 
Tez, but shouldn't be there any more. Trying remove it to see whether anything 
get broken.

> Hive shouldn't pickup mapreduce conf for Tez
> 
>
> Key: HIVE-18099
> URL: https://issues.apache.org/jira/browse/HIVE-18099
> Project: Hive
>  Issue Type: Bug
>Reporter: Zhiyuan Yang
>Assignee: Zhiyuan Yang
>
> Right now Hive is reading some mapreduce conf for Tez engine. 
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/tez/DagUtils.java?utf8=%E2%9C%93#L720
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/tez/DagUtils.java?#L796
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/tez/DagUtils.java?#L860



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-18099) Hive shouldn't pickup mapreduce conf for Tez

2017-11-17 Thread Zhiyuan Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhiyuan Yang updated HIVE-18099:

Attachment: HIVE-18099.1.patch

> Hive shouldn't pickup mapreduce conf for Tez
> 
>
> Key: HIVE-18099
> URL: https://issues.apache.org/jira/browse/HIVE-18099
> Project: Hive
>  Issue Type: Bug
>Reporter: Zhiyuan Yang
>Assignee: Zhiyuan Yang
> Attachments: HIVE-18099.1.patch
>
>
> Right now Hive is reading some mapreduce conf for Tez engine. 
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/tez/DagUtils.java?utf8=%E2%9C%93#L720
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/tez/DagUtils.java?#L796
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/tez/DagUtils.java?#L860



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-18099) Hive shouldn't pickup mapreduce conf for Tez

2017-11-17 Thread Zhiyuan Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhiyuan Yang updated HIVE-18099:

Status: Patch Available  (was: Open)

> Hive shouldn't pickup mapreduce conf for Tez
> 
>
> Key: HIVE-18099
> URL: https://issues.apache.org/jira/browse/HIVE-18099
> Project: Hive
>  Issue Type: Bug
>Reporter: Zhiyuan Yang
>Assignee: Zhiyuan Yang
> Attachments: HIVE-18099.1.patch
>
>
> Right now Hive is reading some mapreduce conf for Tez engine. 
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/tez/DagUtils.java?utf8=%E2%9C%93#L720
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/tez/DagUtils.java?#L796
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/tez/DagUtils.java?#L860



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (HIVE-16596) CrossProductCheck failed to detect cross product between two unions

2017-05-05 Thread Zhiyuan Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhiyuan Yang reassigned HIVE-16596:
---


> CrossProductCheck failed to detect cross product between two unions
> ---
>
> Key: HIVE-16596
> URL: https://issues.apache.org/jira/browse/HIVE-16596
> Project: Hive
>  Issue Type: Bug
>Reporter: Zhiyuan Yang
>Assignee: Zhiyuan Yang
>
> To reproduce:
> {code}
> create table f (a int, b string);
> set hive.auto.convert.join=false;
> explain select * from (select * from f union all select * from f) a join 
> (select * from f union all select * from f) b;
> {code}
> No cross product warning is given.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-14731) Use Tez cartesian product edge in Hive (unpartitioned case only)

2017-05-08 Thread Zhiyuan Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhiyuan Yang updated HIVE-14731:

Attachment: HIVE-14731.15.patch

Upload new patch that use new unpartitioned cross product edge implemented in 
TEZ-3708. 

Key features:
1. allow arbitrary parallelism by partitioning source output
2. estimate the workload based on #record
3. group by #cross-product-operation to evenly distribute the workload

CC [~hagleitn]

> Use Tez cartesian product edge in Hive (unpartitioned case only)
> 
>
> Key: HIVE-14731
> URL: https://issues.apache.org/jira/browse/HIVE-14731
> Project: Hive
>  Issue Type: Bug
>Reporter: Zhiyuan Yang
>Assignee: Zhiyuan Yang
> Attachments: HIVE-14731.10.patch, HIVE-14731.11.patch, 
> HIVE-14731.12.patch, HIVE-14731.13.patch, HIVE-14731.14.patch, 
> HIVE-14731.15.patch, HIVE-14731.1.patch, HIVE-14731.2.patch, 
> HIVE-14731.3.patch, HIVE-14731.4.patch, HIVE-14731.5.patch, 
> HIVE-14731.6.patch, HIVE-14731.7.patch, HIVE-14731.8.patch, HIVE-14731.9.patch
>
>
> Given cartesian product edge is available in Tez now (see TEZ-3230), let's 
> integrate it into Hive on Tez. This allows us to have more than one reducer 
> in cross product queries.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-14731) Use Tez cartesian product edge in Hive (unpartitioned case only)

2017-05-08 Thread Zhiyuan Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhiyuan Yang updated HIVE-14731:

Status: Open  (was: Patch Available)

Cancelling patch since TEZ-3708 isn't committed and Jenkins won't be able to 
compile.

> Use Tez cartesian product edge in Hive (unpartitioned case only)
> 
>
> Key: HIVE-14731
> URL: https://issues.apache.org/jira/browse/HIVE-14731
> Project: Hive
>  Issue Type: Bug
>Reporter: Zhiyuan Yang
>Assignee: Zhiyuan Yang
> Attachments: HIVE-14731.10.patch, HIVE-14731.11.patch, 
> HIVE-14731.12.patch, HIVE-14731.13.patch, HIVE-14731.14.patch, 
> HIVE-14731.15.patch, HIVE-14731.1.patch, HIVE-14731.2.patch, 
> HIVE-14731.3.patch, HIVE-14731.4.patch, HIVE-14731.5.patch, 
> HIVE-14731.6.patch, HIVE-14731.7.patch, HIVE-14731.8.patch, HIVE-14731.9.patch
>
>
> Given cartesian product edge is available in Tez now (see TEZ-3230), let's 
> integrate it into Hive on Tez. This allows us to have more than one reducer 
> in cross product queries.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-14731) Use Tez cartesian product edge in Hive (unpartitioned case only)

2017-05-09 Thread Zhiyuan Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhiyuan Yang updated HIVE-14731:

Attachment: HIVE-14731.16.patch

Upload new patch that won't use map join in case of cross product to get better 
parallelism. Please review [~hagleitn]

> Use Tez cartesian product edge in Hive (unpartitioned case only)
> 
>
> Key: HIVE-14731
> URL: https://issues.apache.org/jira/browse/HIVE-14731
> Project: Hive
>  Issue Type: Bug
>Reporter: Zhiyuan Yang
>Assignee: Zhiyuan Yang
> Attachments: HIVE-14731.10.patch, HIVE-14731.11.patch, 
> HIVE-14731.12.patch, HIVE-14731.13.patch, HIVE-14731.14.patch, 
> HIVE-14731.15.patch, HIVE-14731.16.patch, HIVE-14731.1.patch, 
> HIVE-14731.2.patch, HIVE-14731.3.patch, HIVE-14731.4.patch, 
> HIVE-14731.5.patch, HIVE-14731.6.patch, HIVE-14731.7.patch, 
> HIVE-14731.8.patch, HIVE-14731.9.patch
>
>
> Given cartesian product edge is available in Tez now (see TEZ-3230), let's 
> integrate it into Hive on Tez. This allows us to have more than one reducer 
> in cross product queries.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-14731) Use Tez cartesian product edge in Hive (unpartitioned case only)

2017-05-11 Thread Zhiyuan Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhiyuan Yang updated HIVE-14731:

Status: Patch Available  (was: Open)

> Use Tez cartesian product edge in Hive (unpartitioned case only)
> 
>
> Key: HIVE-14731
> URL: https://issues.apache.org/jira/browse/HIVE-14731
> Project: Hive
>  Issue Type: Bug
>Reporter: Zhiyuan Yang
>Assignee: Zhiyuan Yang
> Attachments: HIVE-14731.10.patch, HIVE-14731.11.patch, 
> HIVE-14731.12.patch, HIVE-14731.13.patch, HIVE-14731.14.patch, 
> HIVE-14731.15.patch, HIVE-14731.16.patch, HIVE-14731.17.patch, 
> HIVE-14731.1.patch, HIVE-14731.2.patch, HIVE-14731.3.patch, 
> HIVE-14731.4.patch, HIVE-14731.5.patch, HIVE-14731.6.patch, 
> HIVE-14731.7.patch, HIVE-14731.8.patch, HIVE-14731.9.patch
>
>
> Given cartesian product edge is available in Tez now (see TEZ-3230), let's 
> integrate it into Hive on Tez. This allows us to have more than one reducer 
> in cross product queries.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-14731) Use Tez cartesian product edge in Hive (unpartitioned case only)

2017-05-11 Thread Zhiyuan Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhiyuan Yang updated HIVE-14731:

Attachment: HIVE-14731.17.patch

> Use Tez cartesian product edge in Hive (unpartitioned case only)
> 
>
> Key: HIVE-14731
> URL: https://issues.apache.org/jira/browse/HIVE-14731
> Project: Hive
>  Issue Type: Bug
>Reporter: Zhiyuan Yang
>Assignee: Zhiyuan Yang
> Attachments: HIVE-14731.10.patch, HIVE-14731.11.patch, 
> HIVE-14731.12.patch, HIVE-14731.13.patch, HIVE-14731.14.patch, 
> HIVE-14731.15.patch, HIVE-14731.16.patch, HIVE-14731.17.patch, 
> HIVE-14731.1.patch, HIVE-14731.2.patch, HIVE-14731.3.patch, 
> HIVE-14731.4.patch, HIVE-14731.5.patch, HIVE-14731.6.patch, 
> HIVE-14731.7.patch, HIVE-14731.8.patch, HIVE-14731.9.patch
>
>
> Given cartesian product edge is available in Tez now (see TEZ-3230), let's 
> integrate it into Hive on Tez. This allows us to have more than one reducer 
> in cross product queries.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-14731) Use Tez cartesian product edge in Hive (unpartitioned case only)

2017-05-11 Thread Zhiyuan Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16007602#comment-16007602
 ] 

Zhiyuan Yang commented on HIVE-14731:
-

Thanks [~hagleitn] for review! New patch was uploaded to address your comment.

> Use Tez cartesian product edge in Hive (unpartitioned case only)
> 
>
> Key: HIVE-14731
> URL: https://issues.apache.org/jira/browse/HIVE-14731
> Project: Hive
>  Issue Type: Bug
>Reporter: Zhiyuan Yang
>Assignee: Zhiyuan Yang
> Attachments: HIVE-14731.10.patch, HIVE-14731.11.patch, 
> HIVE-14731.12.patch, HIVE-14731.13.patch, HIVE-14731.14.patch, 
> HIVE-14731.15.patch, HIVE-14731.16.patch, HIVE-14731.17.patch, 
> HIVE-14731.1.patch, HIVE-14731.2.patch, HIVE-14731.3.patch, 
> HIVE-14731.4.patch, HIVE-14731.5.patch, HIVE-14731.6.patch, 
> HIVE-14731.7.patch, HIVE-14731.8.patch, HIVE-14731.9.patch
>
>
> Given cartesian product edge is available in Tez now (see TEZ-3230), let's 
> integrate it into Hive on Tez. This allows us to have more than one reducer 
> in cross product queries.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-14731) Use Tez cartesian product edge in Hive (unpartitioned case only)

2017-05-12 Thread Zhiyuan Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhiyuan Yang updated HIVE-14731:

Attachment: HIVE-14731.18.patch

> Use Tez cartesian product edge in Hive (unpartitioned case only)
> 
>
> Key: HIVE-14731
> URL: https://issues.apache.org/jira/browse/HIVE-14731
> Project: Hive
>  Issue Type: Bug
>Reporter: Zhiyuan Yang
>Assignee: Zhiyuan Yang
> Attachments: HIVE-14731.10.patch, HIVE-14731.11.patch, 
> HIVE-14731.12.patch, HIVE-14731.13.patch, HIVE-14731.14.patch, 
> HIVE-14731.15.patch, HIVE-14731.16.patch, HIVE-14731.17.patch, 
> HIVE-14731.18.patch, HIVE-14731.1.patch, HIVE-14731.2.patch, 
> HIVE-14731.3.patch, HIVE-14731.4.patch, HIVE-14731.5.patch, 
> HIVE-14731.6.patch, HIVE-14731.7.patch, HIVE-14731.8.patch, HIVE-14731.9.patch
>
>
> Given cartesian product edge is available in Tez now (see TEZ-3230), let's 
> integrate it into Hive on Tez. This allows us to have more than one reducer 
> in cross product queries.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (HIVE-16690) Configure Tez cartesian product edge based on LLAP cluster size

2017-05-16 Thread Zhiyuan Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhiyuan Yang reassigned HIVE-16690:
---


> Configure Tez cartesian product edge based on LLAP cluster size
> ---
>
> Key: HIVE-16690
> URL: https://issues.apache.org/jira/browse/HIVE-16690
> Project: Hive
>  Issue Type: Bug
>Reporter: Zhiyuan Yang
>Assignee: Zhiyuan Yang
>
> In HIVE-14731 we are using default value for target parallelism of fair 
> cartesian product edge. Ideally this should be set according to cluster size. 
> In case of LLAP it's pretty easy to get cluster size, i.e., number of 
> executors.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16690) Configure Tez cartesian product edge based on LLAP cluster size

2017-05-16 Thread Zhiyuan Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhiyuan Yang updated HIVE-16690:

Attachment: HIVE-16690.1.patch

Please help review, [~hagleitn], [~sseth]

> Configure Tez cartesian product edge based on LLAP cluster size
> ---
>
> Key: HIVE-16690
> URL: https://issues.apache.org/jira/browse/HIVE-16690
> Project: Hive
>  Issue Type: Bug
>Reporter: Zhiyuan Yang
>Assignee: Zhiyuan Yang
> Attachments: HIVE-16690.1.patch
>
>
> In HIVE-14731 we are using default value for target parallelism of fair 
> cartesian product edge. Ideally this should be set according to cluster size. 
> In case of LLAP it's pretty easy to get cluster size, i.e., number of 
> executors.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16690) Configure Tez cartesian product edge based on LLAP cluster size

2017-05-16 Thread Zhiyuan Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16013378#comment-16013378
 ] 

Zhiyuan Yang commented on HIVE-16690:
-

Thanks [~sershe] for review! It works in a cluster. Later I'll submit 
additional patch combining HIVE-14731 and this one for jenkins.

> Configure Tez cartesian product edge based on LLAP cluster size
> ---
>
> Key: HIVE-16690
> URL: https://issues.apache.org/jira/browse/HIVE-16690
> Project: Hive
>  Issue Type: Bug
>Reporter: Zhiyuan Yang
>Assignee: Zhiyuan Yang
> Attachments: HIVE-16690.1.patch
>
>
> In HIVE-14731 we are using default value for target parallelism of fair 
> cartesian product edge. Ideally this should be set according to cluster size. 
> In case of LLAP it's pretty easy to get cluster size, i.e., number of 
> executors.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (HIVE-16710) Make MAX_MS_TYPENAME_LENGTH configurable

2017-05-18 Thread Zhiyuan Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhiyuan Yang reassigned HIVE-16710:
---


> Make MAX_MS_TYPENAME_LENGTH configurable
> 
>
> Key: HIVE-16710
> URL: https://issues.apache.org/jira/browse/HIVE-16710
> Project: Hive
>  Issue Type: Bug
>Reporter: Zhiyuan Yang
>Assignee: Zhiyuan Yang
> Fix For: 2.2.0
>
>
> HIVE-11985 introduced type name length check in 2.0.0. Before 2.3 
> (HIVE-12274), users have no way to work around this check if they do get very 
> long type name. We should make max type name length configurable before 2.3.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16710) Make MAX_MS_TYPENAME_LENGTH configurable

2017-05-18 Thread Zhiyuan Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhiyuan Yang updated HIVE-16710:

Status: Patch Available  (was: Open)

> Make MAX_MS_TYPENAME_LENGTH configurable
> 
>
> Key: HIVE-16710
> URL: https://issues.apache.org/jira/browse/HIVE-16710
> Project: Hive
>  Issue Type: Bug
>Reporter: Zhiyuan Yang
>Assignee: Zhiyuan Yang
> Fix For: 2.2.0
>
> Attachments: HIVE-16710.1.patch
>
>
> HIVE-11985 introduced type name length check in 2.0.0. Before 2.3 
> (HIVE-12274), users have no way to work around this check if they do get very 
> long type name. We should make max type name length configurable before 2.3.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16710) Make MAX_MS_TYPENAME_LENGTH configurable

2017-05-18 Thread Zhiyuan Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhiyuan Yang updated HIVE-16710:

Attachment: HIVE-16710.1.patch

Please help review. [~sershe] 

> Make MAX_MS_TYPENAME_LENGTH configurable
> 
>
> Key: HIVE-16710
> URL: https://issues.apache.org/jira/browse/HIVE-16710
> Project: Hive
>  Issue Type: Bug
>Reporter: Zhiyuan Yang
>Assignee: Zhiyuan Yang
> Fix For: 2.2.0
>
> Attachments: HIVE-16710.1.patch
>
>
> HIVE-11985 introduced type name length check in 2.0.0. Before 2.3 
> (HIVE-12274), users have no way to work around this check if they do get very 
> long type name. We should make max type name length configurable before 2.3.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16710) Make MAX_MS_TYPENAME_LENGTH configurable

2017-05-18 Thread Zhiyuan Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhiyuan Yang updated HIVE-16710:

Attachment: HIVE-16710.2.patch

Thanks [~sershe] for review! I've upload new patch to address your comment.

> Make MAX_MS_TYPENAME_LENGTH configurable
> 
>
> Key: HIVE-16710
> URL: https://issues.apache.org/jira/browse/HIVE-16710
> Project: Hive
>  Issue Type: Bug
>Reporter: Zhiyuan Yang
>Assignee: Zhiyuan Yang
> Fix For: 2.2.0
>
> Attachments: HIVE-16710.1.patch, HIVE-16710.2.patch
>
>
> HIVE-11985 introduced type name length check in 2.0.0. Before 2.3 
> (HIVE-12274), users have no way to work around this check if they do get very 
> long type name. We should make max type name length configurable before 2.3.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16710) Make MAX_MS_TYPENAME_LENGTH configurable

2017-05-19 Thread Zhiyuan Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16017794#comment-16017794
 ] 

Zhiyuan Yang commented on HIVE-16710:
-

Do you know how can we trigger branch-2.2 build? [~sershe]

> Make MAX_MS_TYPENAME_LENGTH configurable
> 
>
> Key: HIVE-16710
> URL: https://issues.apache.org/jira/browse/HIVE-16710
> Project: Hive
>  Issue Type: Bug
>Reporter: Zhiyuan Yang
>Assignee: Zhiyuan Yang
> Fix For: 2.2.0
>
> Attachments: HIVE-16710.1.patch, HIVE-16710.2.patch
>
>
> HIVE-11985 introduced type name length check in 2.0.0. Before 2.3 
> (HIVE-12274), users have no way to work around this check if they do get very 
> long type name. We should make max type name length configurable before 2.3.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16710) Make MAX_MS_TYPENAME_LENGTH configurable

2017-05-19 Thread Zhiyuan Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16017807#comment-16017807
 ] 

Zhiyuan Yang commented on HIVE-16710:
-

Thanks [~sershe]! Agree that test is not useful. 

> Make MAX_MS_TYPENAME_LENGTH configurable
> 
>
> Key: HIVE-16710
> URL: https://issues.apache.org/jira/browse/HIVE-16710
> Project: Hive
>  Issue Type: Bug
>Reporter: Zhiyuan Yang
>Assignee: Zhiyuan Yang
> Fix For: 2.2.0
>
> Attachments: HIVE-16710.1.patch, HIVE-16710.2.patch
>
>
> HIVE-11985 introduced type name length check in 2.0.0. Before 2.3 
> (HIVE-12274), users have no way to work around this check if they do get very 
> long type name. We should make max type name length configurable before 2.3.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (HIVE-17047) Allow table property to be populated to jobConf to make FixedLengthInputFormat work

2017-07-05 Thread Zhiyuan Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhiyuan Yang reassigned HIVE-17047:
---


> Allow table property to be populated to jobConf to make 
> FixedLengthInputFormat work
> ---
>
> Key: HIVE-17047
> URL: https://issues.apache.org/jira/browse/HIVE-17047
> Project: Hive
>  Issue Type: Bug
>Reporter: Zhiyuan Yang
>Assignee: Zhiyuan Yang
> Fix For: 1.2.1
>
>
> To make FixedLengthInputFormat work in Hive, we need table specific value for 
> the configuration "fixedlengthinputformat.record.length". Right now the best 
> place would be table property. Unfortunately, table property is not alway 
> populated to InputFormat configurations because of this in HiveInputFormat:
> {code}
> PartitionDesc part = pathToPartitionInfo.get(hsplit.getPath().toString());
> if ((part != null) && (part.getTableDesc() != null)) {
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17047) Allow table property to be populated to jobConf to make FixedLengthInputFormat work

2017-07-05 Thread Zhiyuan Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16075556#comment-16075556
 ] 

Zhiyuan Yang commented on HIVE-17047:
-

It turns out HIVE-15147 has fix for this accidentally. HIVE-15147 was for Hive 
2.2.0 LLAP but not for earlier versions. Uploading a partial patch from 
HIVE-15147 for earlier versions.

> Allow table property to be populated to jobConf to make 
> FixedLengthInputFormat work
> ---
>
> Key: HIVE-17047
> URL: https://issues.apache.org/jira/browse/HIVE-17047
> Project: Hive
>  Issue Type: Bug
>Reporter: Zhiyuan Yang
>Assignee: Zhiyuan Yang
> Fix For: 1.2.1
>
>
> To make FixedLengthInputFormat work in Hive, we need table specific value for 
> the configuration "fixedlengthinputformat.record.length". Right now the best 
> place would be table property. Unfortunately, table property is not alway 
> populated to InputFormat configurations because of this in HiveInputFormat:
> {code}
> PartitionDesc part = pathToPartitionInfo.get(hsplit.getPath().toString());
> if ((part != null) && (part.getTableDesc() != null)) {
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17047) Allow table property to be populated to jobConf to make FixedLengthInputFormat work

2017-07-05 Thread Zhiyuan Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhiyuan Yang updated HIVE-17047:

Target Version/s: 1.2.1

> Allow table property to be populated to jobConf to make 
> FixedLengthInputFormat work
> ---
>
> Key: HIVE-17047
> URL: https://issues.apache.org/jira/browse/HIVE-17047
> Project: Hive
>  Issue Type: Bug
>Reporter: Zhiyuan Yang
>Assignee: Zhiyuan Yang
> Attachments: HIVE-17047.1.patch
>
>
> To make FixedLengthInputFormat work in Hive, we need table specific value for 
> the configuration "fixedlengthinputformat.record.length". Right now the best 
> place would be table property. Unfortunately, table property is not alway 
> populated to InputFormat configurations because of this in HiveInputFormat:
> {code}
> PartitionDesc part = pathToPartitionInfo.get(hsplit.getPath().toString());
> if ((part != null) && (part.getTableDesc() != null)) {
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17047) Allow table property to be populated to jobConf to make FixedLengthInputFormat work

2017-07-05 Thread Zhiyuan Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhiyuan Yang updated HIVE-17047:

Fix Version/s: (was: 1.2.1)

> Allow table property to be populated to jobConf to make 
> FixedLengthInputFormat work
> ---
>
> Key: HIVE-17047
> URL: https://issues.apache.org/jira/browse/HIVE-17047
> Project: Hive
>  Issue Type: Bug
>Reporter: Zhiyuan Yang
>Assignee: Zhiyuan Yang
> Attachments: HIVE-17047.1.patch
>
>
> To make FixedLengthInputFormat work in Hive, we need table specific value for 
> the configuration "fixedlengthinputformat.record.length". Right now the best 
> place would be table property. Unfortunately, table property is not alway 
> populated to InputFormat configurations because of this in HiveInputFormat:
> {code}
> PartitionDesc part = pathToPartitionInfo.get(hsplit.getPath().toString());
> if ((part != null) && (part.getTableDesc() != null)) {
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17047) Allow table property to be populated to jobConf to make FixedLengthInputFormat work

2017-07-05 Thread Zhiyuan Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhiyuan Yang updated HIVE-17047:

Attachment: HIVE-17047.1.patch

> Allow table property to be populated to jobConf to make 
> FixedLengthInputFormat work
> ---
>
> Key: HIVE-17047
> URL: https://issues.apache.org/jira/browse/HIVE-17047
> Project: Hive
>  Issue Type: Bug
>Reporter: Zhiyuan Yang
>Assignee: Zhiyuan Yang
> Attachments: HIVE-17047.1.patch
>
>
> To make FixedLengthInputFormat work in Hive, we need table specific value for 
> the configuration "fixedlengthinputformat.record.length". Right now the best 
> place would be table property. Unfortunately, table property is not alway 
> populated to InputFormat configurations because of this in HiveInputFormat:
> {code}
> PartitionDesc part = pathToPartitionInfo.get(hsplit.getPath().toString());
> if ((part != null) && (part.getTableDesc() != null)) {
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17047) Allow table property to be populated to jobConf to make FixedLengthInputFormat work

2017-07-05 Thread Zhiyuan Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhiyuan Yang updated HIVE-17047:

Status: Patch Available  (was: Open)

> Allow table property to be populated to jobConf to make 
> FixedLengthInputFormat work
> ---
>
> Key: HIVE-17047
> URL: https://issues.apache.org/jira/browse/HIVE-17047
> Project: Hive
>  Issue Type: Bug
>Reporter: Zhiyuan Yang
>Assignee: Zhiyuan Yang
> Attachments: HIVE-17047.1.patch
>
>
> To make FixedLengthInputFormat work in Hive, we need table specific value for 
> the configuration "fixedlengthinputformat.record.length". Right now the best 
> place would be table property. Unfortunately, table property is not alway 
> populated to InputFormat configurations because of this in HiveInputFormat:
> {code}
> PartitionDesc part = pathToPartitionInfo.get(hsplit.getPath().toString());
> if ((part != null) && (part.getTableDesc() != null)) {
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16690) Configure Tez cartesian product edge based on LLAP cluster size

2017-07-10 Thread Zhiyuan Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhiyuan Yang updated HIVE-16690:

Attachment: HIVE-16690.addendum.patch

Upload addendum patch which avoids accessing uninitialized Llap cluster info 
(which cause NPE).

> Configure Tez cartesian product edge based on LLAP cluster size
> ---
>
> Key: HIVE-16690
> URL: https://issues.apache.org/jira/browse/HIVE-16690
> Project: Hive
>  Issue Type: Bug
>Reporter: Zhiyuan Yang
>Assignee: Zhiyuan Yang
> Attachments: HIVE-16690.1.patch, HIVE-16690.addendum.patch
>
>
> In HIVE-14731 we are using default value for target parallelism of fair 
> cartesian product edge. Ideally this should be set according to cluster size. 
> In case of LLAP it's pretty easy to get cluster size, i.e., number of 
> executors.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (HIVE-17228) Bump tez version to 0.9.0

2017-08-01 Thread Zhiyuan Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhiyuan Yang reassigned HIVE-17228:
---


> Bump tez version to 0.9.0
> -
>
> Key: HIVE-17228
> URL: https://issues.apache.org/jira/browse/HIVE-17228
> Project: Hive
>  Issue Type: Bug
>Reporter: Zhiyuan Yang
>Assignee: Zhiyuan Yang
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17228) Bump tez version to 0.9.0

2017-08-01 Thread Zhiyuan Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhiyuan Yang updated HIVE-17228:

Attachment: HIVE-17228.1.patch

> Bump tez version to 0.9.0
> -
>
> Key: HIVE-17228
> URL: https://issues.apache.org/jira/browse/HIVE-17228
> Project: Hive
>  Issue Type: Bug
>Reporter: Zhiyuan Yang
>Assignee: Zhiyuan Yang
> Attachments: HIVE-17228.1.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17228) Bump tez version to 0.9.0

2017-08-02 Thread Zhiyuan Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhiyuan Yang updated HIVE-17228:

Status: Patch Available  (was: Open)

> Bump tez version to 0.9.0
> -
>
> Key: HIVE-17228
> URL: https://issues.apache.org/jira/browse/HIVE-17228
> Project: Hive
>  Issue Type: Bug
>Reporter: Zhiyuan Yang
>Assignee: Zhiyuan Yang
> Attachments: HIVE-17228.1.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17228) Bump tez version to 0.9.0

2017-08-02 Thread Zhiyuan Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhiyuan Yang updated HIVE-17228:

Attachment: HIVE-17228.1.patch

> Bump tez version to 0.9.0
> -
>
> Key: HIVE-17228
> URL: https://issues.apache.org/jira/browse/HIVE-17228
> Project: Hive
>  Issue Type: Bug
>Reporter: Zhiyuan Yang
>Assignee: Zhiyuan Yang
> Attachments: HIVE-17228.1.patch, HIVE-17228.1.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17228) Bump tez version to 0.9.0

2017-08-02 Thread Zhiyuan Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16111428#comment-16111428
 ] 

Zhiyuan Yang commented on HIVE-17228:
-

Test failures are unrelated. Please help review, [~hagleitn], [~sseth]

> Bump tez version to 0.9.0
> -
>
> Key: HIVE-17228
> URL: https://issues.apache.org/jira/browse/HIVE-17228
> Project: Hive
>  Issue Type: Bug
>Reporter: Zhiyuan Yang
>Assignee: Zhiyuan Yang
> Attachments: HIVE-17228.1.patch, HIVE-17228.1.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17228) Bump tez version to 0.9.0

2017-08-08 Thread Zhiyuan Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16118818#comment-16118818
 ] 

Zhiyuan Yang commented on HIVE-17228:
-

Thanks [~hagleitn]!

> Bump tez version to 0.9.0
> -
>
> Key: HIVE-17228
> URL: https://issues.apache.org/jira/browse/HIVE-17228
> Project: Hive
>  Issue Type: Bug
>Reporter: Zhiyuan Yang
>Assignee: Zhiyuan Yang
> Fix For: 3.0.0
>
> Attachments: HIVE-17228.1.patch, HIVE-17228.1.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-14731) Use Tez cartesian product edge in Hive (unpartitioned case only)

2017-08-11 Thread Zhiyuan Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16124139#comment-16124139
 ] 

Zhiyuan Yang commented on HIVE-14731:
-

[~hagleitn] Non deterministic behavior should come from multiple cross product 
reducers

> Use Tez cartesian product edge in Hive (unpartitioned case only)
> 
>
> Key: HIVE-14731
> URL: https://issues.apache.org/jira/browse/HIVE-14731
> Project: Hive
>  Issue Type: Bug
>Reporter: Zhiyuan Yang
>Assignee: Zhiyuan Yang
> Attachments: HIVE-14731.10.patch, HIVE-14731.11.patch, 
> HIVE-14731.12.patch, HIVE-14731.13.patch, HIVE-14731.14.patch, 
> HIVE-14731.15.patch, HIVE-14731.16.patch, HIVE-14731.17.patch, 
> HIVE-14731.18.patch, HIVE-14731.19.patch, HIVE-14731.1.patch, 
> HIVE-14731.2.patch, HIVE-14731.3.patch, HIVE-14731.4.patch, 
> HIVE-14731.5.patch, HIVE-14731.6.patch, HIVE-14731.7.patch, 
> HIVE-14731.8.patch, HIVE-14731.9.patch
>
>
> Given cartesian product edge is available in Tez now (see TEZ-3230), let's 
> integrate it into Hive on Tez. This allows us to have more than one reducer 
> in cross product queries.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-14731) Use Tez cartesian product edge in Hive (unpartitioned case only)

2017-08-21 Thread Zhiyuan Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhiyuan Yang updated HIVE-14731:

Attachment: HIVE-14731.20.patch

Attached rebased patch.

> Use Tez cartesian product edge in Hive (unpartitioned case only)
> 
>
> Key: HIVE-14731
> URL: https://issues.apache.org/jira/browse/HIVE-14731
> Project: Hive
>  Issue Type: Bug
>Reporter: Zhiyuan Yang
>Assignee: Zhiyuan Yang
> Attachments: HIVE-14731.10.patch, HIVE-14731.11.patch, 
> HIVE-14731.12.patch, HIVE-14731.13.patch, HIVE-14731.14.patch, 
> HIVE-14731.15.patch, HIVE-14731.16.patch, HIVE-14731.17.patch, 
> HIVE-14731.18.patch, HIVE-14731.19.patch, HIVE-14731.1.patch, 
> HIVE-14731.20.patch, HIVE-14731.2.patch, HIVE-14731.3.patch, 
> HIVE-14731.4.patch, HIVE-14731.5.patch, HIVE-14731.6.patch, 
> HIVE-14731.7.patch, HIVE-14731.8.patch, HIVE-14731.9.patch
>
>
> Given cartesian product edge is available in Tez now (see TEZ-3230), let's 
> integrate it into Hive on Tez. This allows us to have more than one reducer 
> in cross product queries.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (HIVE-17393) AMReporter need hearbeat every external 'AM'

2017-08-25 Thread Zhiyuan Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhiyuan Yang reassigned HIVE-17393:
---


> AMReporter need hearbeat every external 'AM'
> 
>
> Key: HIVE-17393
> URL: https://issues.apache.org/jira/browse/HIVE-17393
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Reporter: Zhiyuan Yang
>Assignee: Zhiyuan Yang
>
> AMReporter only remember first AM that submit the query and heartbeat to it. 
> In case of external client, there might be multiple 'AM's and every of them 
> need node heartbeat.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17393) AMReporter need hearbeat every external 'AM'

2017-08-25 Thread Zhiyuan Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhiyuan Yang updated HIVE-17393:

Fix Version/s: 3.0.0

> AMReporter need hearbeat every external 'AM'
> 
>
> Key: HIVE-17393
> URL: https://issues.apache.org/jira/browse/HIVE-17393
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Reporter: Zhiyuan Yang
>Assignee: Zhiyuan Yang
> Fix For: 3.0.0
>
>
> AMReporter only remember first AM that submit the query and heartbeat to it. 
> In case of external client, there might be multiple 'AM's and every of them 
> need node heartbeat.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17393) AMReporter need hearbeat every external 'AM'

2017-08-25 Thread Zhiyuan Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhiyuan Yang updated HIVE-17393:

Attachment: HIVE-17393.1.patch

> AMReporter need hearbeat every external 'AM'
> 
>
> Key: HIVE-17393
> URL: https://issues.apache.org/jira/browse/HIVE-17393
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Reporter: Zhiyuan Yang
>Assignee: Zhiyuan Yang
> Fix For: 3.0.0
>
> Attachments: HIVE-17393.1.patch
>
>
> AMReporter only remember first AM that submit the query and heartbeat to it. 
> In case of external client, there might be multiple 'AM's and every of them 
> need node heartbeat.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17393) AMReporter need hearbeat every external 'AM'

2017-08-25 Thread Zhiyuan Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhiyuan Yang updated HIVE-17393:

Status: Patch Available  (was: Open)

> AMReporter need hearbeat every external 'AM'
> 
>
> Key: HIVE-17393
> URL: https://issues.apache.org/jira/browse/HIVE-17393
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Reporter: Zhiyuan Yang
>Assignee: Zhiyuan Yang
> Fix For: 3.0.0
>
> Attachments: HIVE-17393.1.patch
>
>
> AMReporter only remember first AM that submit the query and heartbeat to it. 
> In case of external client, there might be multiple 'AM's and every of them 
> need node heartbeat.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17393) AMReporter need hearbeat every external 'AM'

2017-08-25 Thread Zhiyuan Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhiyuan Yang updated HIVE-17393:

Attachment: HIVE-17393.2.patch

> AMReporter need hearbeat every external 'AM'
> 
>
> Key: HIVE-17393
> URL: https://issues.apache.org/jira/browse/HIVE-17393
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Reporter: Zhiyuan Yang
>Assignee: Zhiyuan Yang
> Fix For: 3.0.0
>
> Attachments: HIVE-17393.1.patch, HIVE-17393.2.patch
>
>
> AMReporter only remember first AM that submit the query and heartbeat to it. 
> In case of external client, there might be multiple 'AM's and every of them 
> need node heartbeat.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17393) AMReporter need hearbeat every external 'AM'

2017-08-28 Thread Zhiyuan Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16144466#comment-16144466
 ] 

Zhiyuan Yang commented on HIVE-17393:
-

Thanks [~sershe] for review! We need AMNodeInfo not to be static so that we can 
override its method to for unit test.

> AMReporter need hearbeat every external 'AM'
> 
>
> Key: HIVE-17393
> URL: https://issues.apache.org/jira/browse/HIVE-17393
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Reporter: Zhiyuan Yang
>Assignee: Zhiyuan Yang
> Fix For: 3.0.0
>
> Attachments: HIVE-17393.1.patch, HIVE-17393.2.patch
>
>
> AMReporter only remember first AM that submit the query and heartbeat to it. 
> In case of external client, there might be multiple 'AM's and every of them 
> need node heartbeat.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17393) AMReporter need hearbeat every external 'AM'

2017-08-28 Thread Zhiyuan Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhiyuan Yang updated HIVE-17393:

Attachment: HIVE-17393.3.patch

Attached new patch to address comments.

> AMReporter need hearbeat every external 'AM'
> 
>
> Key: HIVE-17393
> URL: https://issues.apache.org/jira/browse/HIVE-17393
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Reporter: Zhiyuan Yang
>Assignee: Zhiyuan Yang
> Fix For: 3.0.0
>
> Attachments: HIVE-17393.1.patch, HIVE-17393.2.patch, 
> HIVE-17393.3.patch
>
>
> AMReporter only remember first AM that submit the query and heartbeat to it. 
> In case of external client, there might be multiple 'AM's and every of them 
> need node heartbeat.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17297) allow AM to use LLAP guaranteed tasks

2017-08-31 Thread Zhiyuan Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16149766#comment-16149766
 ] 

Zhiyuan Yang commented on HIVE-17297:
-

Sorry for the delay. Patch mostly looks good.
Comments: 
{code}
static final ThreadLocal instance = new ThreadLocal<>();
{code}
1. TLS trick depends on assumption that scheduler and task communicator are 
initialized in the same thread. This is not guaranteed although unlikely to be 
broken. Need Tez support to remove this trick.

bq. The state could be inconsistent, making a deadlock possible in extreme 
cases if not handled. This will be detected by heartbeat.
2. How will heartbeat solve the problem? Periodically syncing #ducks?

3. Performance wise, there will be one rpc call for each update. Whether this 
can be a problem depends on how frequent the update will be. Could consider 
updating multiple tasks with single rpc call in future.

Nits
{code}
// See the comment in handleSinglePriorityLevelForUpdate
{code}
1. Wrong function name in comments

{code}
private int handleUpdateForSinglePriorityLevel(int updateCount, int count,
{code}
2. updateCount and count can be combined into one argument

> allow AM to use LLAP guaranteed tasks
> -
>
> Key: HIVE-17297
> URL: https://issues.apache.org/jira/browse/HIVE-17297
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-17297.01.nogen.patch, HIVE-17297.01.patch, 
> HIVE-17297.02.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17297) allow AM to use LLAP guaranteed tasks

2017-08-31 Thread Zhiyuan Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16149767#comment-16149767
 ] 

Zhiyuan Yang commented on HIVE-17297:
-

CC [~hagleitn]

> allow AM to use LLAP guaranteed tasks
> -
>
> Key: HIVE-17297
> URL: https://issues.apache.org/jira/browse/HIVE-17297
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-17297.01.nogen.patch, HIVE-17297.01.patch, 
> HIVE-17297.02.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17297) allow AM to use LLAP guaranteed tasks

2017-09-01 Thread Zhiyuan Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16150969#comment-16150969
 ] 

Zhiyuan Yang commented on HIVE-17297:
-

+1 (non-binding)

> allow AM to use LLAP guaranteed tasks
> -
>
> Key: HIVE-17297
> URL: https://issues.apache.org/jira/browse/HIVE-17297
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-17297.01.nogen.patch, HIVE-17297.01.patch, 
> HIVE-17297.02.patch, HIVE-17297.03.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17393) AMReporter need hearbeat every external 'AM'

2017-09-05 Thread Zhiyuan Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16154243#comment-16154243
 ] 

Zhiyuan Yang commented on HIVE-17393:
-

Thanks [~sershe] for reivew!

> AMReporter need hearbeat every external 'AM'
> 
>
> Key: HIVE-17393
> URL: https://issues.apache.org/jira/browse/HIVE-17393
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Reporter: Zhiyuan Yang
>Assignee: Zhiyuan Yang
> Fix For: 3.0.0
>
> Attachments: HIVE-17393.1.patch, HIVE-17393.2.patch, 
> HIVE-17393.3.patch
>
>
> AMReporter only remember first AM that submit the query and heartbeat to it. 
> In case of external client, there might be multiple 'AM's and every of them 
> need node heartbeat.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17386) support LLAP workload management in HS2 (low level only)

2017-09-07 Thread Zhiyuan Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16158061#comment-16158061
 ] 

Zhiyuan Yang commented on HIVE-17386:
-

Yep. Sorry for the delay. I was working on other stuff and will start reviewing 
this tomorrow.

> support LLAP workload management in HS2 (low level only)
> 
>
> Key: HIVE-17386
> URL: https://issues.apache.org/jira/browse/HIVE-17386
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-17386.01.only.patch, HIVE-17386.01.patch, 
> HIVE-17386.01.patch, HIVE-17386.02.patch, HIVE-17386.only.patch, 
> HIVE-17386.patch
>
>
> This makes use of HIVE-17297 and creates building blocks for workload 
> management policies, etc.
> For now, there are no policies - a single yarn queue is designated for all 
> LLAP query AMs, and the capacity is distributed equally.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17386) support LLAP workload management in HS2 (low level only)

2017-09-12 Thread Zhiyuan Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16163756#comment-16163756
 ] 

Zhiyuan Yang commented on HIVE-17386:
-

Comments was posted on rb. Please take a look.

> support LLAP workload management in HS2 (low level only)
> 
>
> Key: HIVE-17386
> URL: https://issues.apache.org/jira/browse/HIVE-17386
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-17386.01.only.patch, HIVE-17386.01.patch, 
> HIVE-17386.01.patch, HIVE-17386.02.patch, HIVE-17386.03.patch, 
> HIVE-17386.only.patch, HIVE-17386.patch
>
>
> This makes use of HIVE-17297 and creates building blocks for workload 
> management policies, etc.
> For now, there are no policies - a single yarn queue is designated for all 
> LLAP query AMs, and the capacity is distributed equally.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17386) support LLAP workload management in HS2 (low level only)

2017-09-13 Thread Zhiyuan Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16165535#comment-16165535
 ] 

Zhiyuan Yang commented on HIVE-17386:
-

Sure, I will review it soon.

> support LLAP workload management in HS2 (low level only)
> 
>
> Key: HIVE-17386
> URL: https://issues.apache.org/jira/browse/HIVE-17386
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-17386.01.only.patch, HIVE-17386.01.patch, 
> HIVE-17386.01.patch, HIVE-17386.02.patch, HIVE-17386.03.patch, 
> HIVE-17386.04.patch, HIVE-17386.only.patch, HIVE-17386.patch
>
>
> This makes use of HIVE-17297 and creates building blocks for workload 
> management policies, etc.
> For now, there are no policies - a single yarn queue is designated for all 
> LLAP query AMs, and the capacity is distributed equally.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17386) support LLAP workload management in HS2 (low level only)

2017-09-25 Thread Zhiyuan Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16179966#comment-16179966
 ] 

Zhiyuan Yang commented on HIVE-17386:
-

+1 (non-binding). CC [~hagleitn]

> support LLAP workload management in HS2 (low level only)
> 
>
> Key: HIVE-17386
> URL: https://issues.apache.org/jira/browse/HIVE-17386
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-17386.01.only.patch, HIVE-17386.01.patch, 
> HIVE-17386.01.patch, HIVE-17386.02.patch, HIVE-17386.03.patch, 
> HIVE-17386.04.patch, HIVE-17386.only.patch, HIVE-17386.patch
>
>
> This makes use of HIVE-17297 and creates building blocks for workload 
> management policies, etc.
> For now, there are no policies - a single yarn queue is designated for all 
> LLAP query AMs, and the capacity is distributed equally.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16082) Allow user to change number of listener thread in LlapTaskCommunicator

2017-03-02 Thread Zhiyuan Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhiyuan Yang updated HIVE-16082:

Attachment: HIVE-16082.2.patch

Re-submit patch for testing.

> Allow user to change number of listener thread in LlapTaskCommunicator
> --
>
> Key: HIVE-16082
> URL: https://issues.apache.org/jira/browse/HIVE-16082
> Project: Hive
>  Issue Type: Bug
>Reporter: Zhiyuan Yang
>Assignee: Zhiyuan Yang
> Attachments: HIVE-16082.1.patch, HIVE-16082.2.patch
>
>
> Now LlapTaskCommunicator always has same number of RPC listener thread with 
> TezTaskCommunicatorImpl. There are scenarios when we want them different: for 
> example, in Llap only mode, we want less TezTaskCommunicatorImpl's listener 
> thread to reduce off-heap memory usage.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16082) Allow user to change number of listener thread in LlapTaskCommunicator

2017-03-02 Thread Zhiyuan Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhiyuan Yang updated HIVE-16082:

Status: Open  (was: Patch Available)

> Allow user to change number of listener thread in LlapTaskCommunicator
> --
>
> Key: HIVE-16082
> URL: https://issues.apache.org/jira/browse/HIVE-16082
> Project: Hive
>  Issue Type: Bug
>Reporter: Zhiyuan Yang
>Assignee: Zhiyuan Yang
> Attachments: HIVE-16082.1.patch, HIVE-16082.2.patch
>
>
> Now LlapTaskCommunicator always has same number of RPC listener thread with 
> TezTaskCommunicatorImpl. There are scenarios when we want them different: for 
> example, in Llap only mode, we want less TezTaskCommunicatorImpl's listener 
> thread to reduce off-heap memory usage.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16082) Allow user to change number of listener thread in LlapTaskCommunicator

2017-03-02 Thread Zhiyuan Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhiyuan Yang updated HIVE-16082:

Status: Patch Available  (was: Open)

> Allow user to change number of listener thread in LlapTaskCommunicator
> --
>
> Key: HIVE-16082
> URL: https://issues.apache.org/jira/browse/HIVE-16082
> Project: Hive
>  Issue Type: Bug
>Reporter: Zhiyuan Yang
>Assignee: Zhiyuan Yang
> Attachments: HIVE-16082.1.patch, HIVE-16082.2.patch
>
>
> Now LlapTaskCommunicator always has same number of RPC listener thread with 
> TezTaskCommunicatorImpl. There are scenarios when we want them different: for 
> example, in Llap only mode, we want less TezTaskCommunicatorImpl's listener 
> thread to reduce off-heap memory usage.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-14951) ArrayIndexOutOfBoundsException in GroupByOperator

2016-10-13 Thread Zhiyuan Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15573083#comment-15573083
 ] 

Zhiyuan Yang commented on HIVE-14951:
-

Here's the error message:
{code:java}
Error: Error while running task ( failure ) : 
attempt_1475017598908_0282_2_02_00_3:java.lang.RuntimeException: 
java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
Hive Runtime Error while processing row (tag=0) {"key":{"_col0":2},"value":null}
at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:173)
at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:139)
at 
org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:370)
at 
org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
at 
org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
at 
org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
at 
org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.RuntimeException: 
org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
processing row (tag=0) {"key":{"_col0":2},"value":null}
at 
org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecord(ReduceRecordSource.java:284)
at 
org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.run(ReduceRecordProcessor.java:252)
at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:150)
... 14 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error 
while processing row (tag=0) {"key":{"_col0":2},"value":null}
at 
org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:352)
at 
org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecord(ReduceRecordSource.java:274)
... 16 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
Hive Runtime Error while processing row (tag=1) {"key":{"_col0":1},"value":null}
at 
org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.fetchOneRow(CommonMergeJoinOperator.java:416)
at 
org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.fetchNextGroup(CommonMergeJoinOperator.java:379)
at 
org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.doFirstFetchIfNeeded(CommonMergeJoinOperator.java:485)
at 
org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.process(CommonMergeJoinOperator.java:207)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:838)
at 
org.apache.hadoop.hive.ql.exec.GroupByOperator.forward(GroupByOperator.java:1016)
at 
org.apache.hadoop.hive.ql.exec.GroupByOperator.processAggr(GroupByOperator.java:821)
at 
org.apache.hadoop.hive.ql.exec.GroupByOperator.processKey(GroupByOperator.java:695)
at 
org.apache.hadoop.hive.ql.exec.GroupByOperator.process(GroupByOperator.java:761)
at 
org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:343)
... 17 more
Caused by: java.lang.RuntimeException: 
org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
processing row (tag=1) {"key":{"_col0":1},"value":null}
at 
org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecord(ReduceRecordSource.java:284)
at 
org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.fetchOneRow(CommonMergeJoinOperator.java:404)
... 26 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error 
while processing row (tag=1) {"key":{"_col0":1},"value":null}
at 
org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:352)
at 
org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecord(ReduceRecordSource.java:274)
... 27 more
Caused by: java.lang.ArrayIndexOutOfBoundsException: 1
at 
org.apache.hadoop.hive.ql.exec.GroupByOperator.process(GroupByOperator.java:708)
at 
o

[jira] [Updated] (HIVE-14951) ArrayIndexOutOfBoundsException in GroupByOperator (Hive on Tez)

2016-10-13 Thread Zhiyuan Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhiyuan Yang updated HIVE-14951:

Summary: ArrayIndexOutOfBoundsException in GroupByOperator (Hive on Tez)  
(was: ArrayIndexOutOfBoundsException in GroupByOperator)

> ArrayIndexOutOfBoundsException in GroupByOperator (Hive on Tez)
> ---
>
> Key: HIVE-14951
> URL: https://issues.apache.org/jira/browse/HIVE-14951
> Project: Hive
>  Issue Type: Bug
>Reporter: Zhiyuan Yang
>
> Query:
> select * from (select distinct a from f16) as f16, (select distinct a from 
> f1) as fprime where f16.a = fprime.a;
> Table: 
> create table f1 (a int, b string);
> create table f16 (a int, b string);
> Config:
> set hive.auto.convert.sortmerge.join=true;
> set hive.auto.convert.join=false;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14951) ArrayIndexOutOfBoundsException in GroupByOperator

2016-10-13 Thread Zhiyuan Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhiyuan Yang updated HIVE-14951:

Summary: ArrayIndexOutOfBoundsException in GroupByOperator  (was: 
ArrayIndexOutOfBoundsException in GroupByOperator (Hive on Tez))

> ArrayIndexOutOfBoundsException in GroupByOperator
> -
>
> Key: HIVE-14951
> URL: https://issues.apache.org/jira/browse/HIVE-14951
> Project: Hive
>  Issue Type: Bug
>Reporter: Zhiyuan Yang
>
> Engine: 
> Tez
> Query:
> select * from (select distinct a from f16) as f16, (select distinct a from 
> f1) as fprime where f16.a = fprime.a;
> Table: 
> create table f1 (a int, b string);
> create table f16 (a int, b string);
> Config:
> set hive.auto.convert.sortmerge.join=true;
> set hive.auto.convert.join=false;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   >