[jira] [Updated] (HIVE-20990) ORC case when/if with coalesce wrong results or case: java.lang.AssertionError: Output column number expected to be 0 when isRepeating

2024-02-05 Thread Yi Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-20990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yi Zhang updated HIVE-20990:

Summary: ORC case when/if with coalesce wrong results or case: 
java.lang.AssertionError: Output column number expected to be 0 when 
isRepeating  (was: ORC, group by, case: java.lang.AssertionError: Output column 
number expected to be 0 when isRepeating)

> ORC case when/if with coalesce wrong results or case: 
> java.lang.AssertionError: Output column number expected to be 0 when 
> isRepeating
> --
>
> Key: HIVE-20990
> URL: https://issues.apache.org/jira/browse/HIVE-20990
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Affects Versions: 3.0.0, 3.1.3
> Environment: * Hive 3.0.0 from hdp 3.0.1.
>  * centos 7
>  * 5 datanodes, 2 masters
>  
>Reporter: Guillaume
>Priority: Major
>  Labels: duplicate
>
> Run this to replicate:
> {code:sql}
> drop table if exists ds;
> create table ds stored as orc as select
>   inline(array(
> struct('gmail.com'), -- when branch of the case statement
> struct('apache.org') -- else branch of the case statement
>   )) as (domain)
> ;
> select
>   case
> when domain='gmail.com' then 'gmail'
> else coalesce(domain, 'other')
>   end as domaingroup
> from ds
> group by 1 -- useless (datawise) for this example, but triggers the bug.
> ;
> {code}
> Exceptions out with:  
> {noformat}
> java.lang.RuntimeException: java.lang.AssertionError: Output column number 
> expected to be 0 when isRepeating
> {noformat}
>  
> Full exception is shown below.
> Of interest:
>  * if the case is removed (eg. replaced only by the else clause), the query 
> works,
>  * if only the else clause from the case matches, the query works,
>  * replacing the case by a bunch of nested if does not change anything,
>  * removing the group by and the query works,
>  * replacing the table (ds) by a CTE and the query works.
> Workaround, at the cost of performance: 
> {code:java}
> set hive.vectorized.execution.enabled = false; 
> {code}
>  
> Full exception:
> {noformat}
> ERROR : FAILED: Execution Error, return code 2 from 
> org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, vertexName=Map 1, 
> vertexId=vertex_1543326624484_8258_45_00, diagnostics=[Task failed, 
> taskId=task_1543326624484_8258_45_00_00, diagnostics=[TaskAttempt 0 
> failed, info=[Error: Error while running task ( failure ) : 
> java.lang.RuntimeException: java.lang.AssertionError: Output column number 
> expected to be 0 when isRepeating
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:101)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:76)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:419)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:267)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
>   at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>   at 
> com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:108)
>   at 
> com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:41)
>   at 
> com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:77)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.AssertionError: Output column number expected to be 0 
> when isRepeating
>   at 
> org.apache.hadoop.hive.ql.exec.vector.BytesColumnVector.setElement(BytesColumnVector.java:492)
>   at 
> 

[jira] [Updated] (HIVE-20990) ORC, group by, case: java.lang.AssertionError: Output column number expected to be 0 when isRepeating

2024-02-05 Thread Yi Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-20990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yi Zhang updated HIVE-20990:

Affects Version/s: 3.1.3

> ORC, group by, case: java.lang.AssertionError: Output column number expected 
> to be 0 when isRepeating
> -
>
> Key: HIVE-20990
> URL: https://issues.apache.org/jira/browse/HIVE-20990
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Affects Versions: 3.0.0, 3.1.3
> Environment: * Hive 3.0.0 from hdp 3.0.1.
>  * centos 7
>  * 5 datanodes, 2 masters
>  
>Reporter: Guillaume
>Priority: Major
>  Labels: duplicate
>
> Run this to replicate:
> {code:sql}
> drop table if exists ds;
> create table ds stored as orc as select
>   inline(array(
> struct('gmail.com'), -- when branch of the case statement
> struct('apache.org') -- else branch of the case statement
>   )) as (domain)
> ;
> select
>   case
> when domain='gmail.com' then 'gmail'
> else coalesce(domain, 'other')
>   end as domaingroup
> from ds
> group by 1 -- useless (datawise) for this example, but triggers the bug.
> ;
> {code}
> Exceptions out with:  
> {noformat}
> java.lang.RuntimeException: java.lang.AssertionError: Output column number 
> expected to be 0 when isRepeating
> {noformat}
>  
> Full exception is shown below.
> Of interest:
>  * if the case is removed (eg. replaced only by the else clause), the query 
> works,
>  * if only the else clause from the case matches, the query works,
>  * replacing the case by a bunch of nested if does not change anything,
>  * removing the group by and the query works,
>  * replacing the table (ds) by a CTE and the query works.
> Workaround, at the cost of performance: 
> {code:java}
> set hive.vectorized.execution.enabled = false; 
> {code}
>  
> Full exception:
> {noformat}
> ERROR : FAILED: Execution Error, return code 2 from 
> org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, vertexName=Map 1, 
> vertexId=vertex_1543326624484_8258_45_00, diagnostics=[Task failed, 
> taskId=task_1543326624484_8258_45_00_00, diagnostics=[TaskAttempt 0 
> failed, info=[Error: Error while running task ( failure ) : 
> java.lang.RuntimeException: java.lang.AssertionError: Output column number 
> expected to be 0 when isRepeating
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:101)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:76)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:419)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:267)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
>   at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>   at 
> com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:108)
>   at 
> com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:41)
>   at 
> com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:77)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.AssertionError: Output column number expected to be 0 
> when isRepeating
>   at 
> org.apache.hadoop.hive.ql.exec.vector.BytesColumnVector.setElement(BytesColumnVector.java:492)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.expressions.IfExprColumnCondExpr.evaluate(IfExprColumnCondExpr.java:117)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.process(VectorSelectOperator.java:146)
>   at 
> org.apache.hadoop.hive.ql.exec.Operator.vectorForward(Operator.java:965)
>   

[jira] [Commented] (HIVE-20990) ORC, group by, case: java.lang.AssertionError: Output column number expected to be 0 when isRepeating

2024-02-05 Thread Yi Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-20990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17814529#comment-17814529
 ] 

Yi Zhang commented on HIVE-20990:
-

this is fixed by HIVE-26408. 

this could give wrong results or throw exception, so caution to work around it 
(if not patching the fix) by hive.vectorized.if.expr.mode=good

> ORC, group by, case: java.lang.AssertionError: Output column number expected 
> to be 0 when isRepeating
> -
>
> Key: HIVE-20990
> URL: https://issues.apache.org/jira/browse/HIVE-20990
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Affects Versions: 3.0.0
> Environment: * Hive 3.0.0 from hdp 3.0.1.
>  * centos 7
>  * 5 datanodes, 2 masters
>  
>Reporter: Guillaume
>Priority: Major
>
> Run this to replicate:
> {code:sql}
> drop table if exists ds;
> create table ds stored as orc as select
>   inline(array(
> struct('gmail.com'), -- when branch of the case statement
> struct('apache.org') -- else branch of the case statement
>   )) as (domain)
> ;
> select
>   case
> when domain='gmail.com' then 'gmail'
> else coalesce(domain, 'other')
>   end as domaingroup
> from ds
> group by 1 -- useless (datawise) for this example, but triggers the bug.
> ;
> {code}
> Exceptions out with:  
> {noformat}
> java.lang.RuntimeException: java.lang.AssertionError: Output column number 
> expected to be 0 when isRepeating
> {noformat}
>  
> Full exception is shown below.
> Of interest:
>  * if the case is removed (eg. replaced only by the else clause), the query 
> works,
>  * if only the else clause from the case matches, the query works,
>  * replacing the case by a bunch of nested if does not change anything,
>  * removing the group by and the query works,
>  * replacing the table (ds) by a CTE and the query works.
> Workaround, at the cost of performance: 
> {code:java}
> set hive.vectorized.execution.enabled = false; 
> {code}
>  
> Full exception:
> {noformat}
> ERROR : FAILED: Execution Error, return code 2 from 
> org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, vertexName=Map 1, 
> vertexId=vertex_1543326624484_8258_45_00, diagnostics=[Task failed, 
> taskId=task_1543326624484_8258_45_00_00, diagnostics=[TaskAttempt 0 
> failed, info=[Error: Error while running task ( failure ) : 
> java.lang.RuntimeException: java.lang.AssertionError: Output column number 
> expected to be 0 when isRepeating
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:101)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:76)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:419)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:267)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
>   at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>   at 
> com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:108)
>   at 
> com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:41)
>   at 
> com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:77)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.AssertionError: Output column number expected to be 0 
> when isRepeating
>   at 
> org.apache.hadoop.hive.ql.exec.vector.BytesColumnVector.setElement(BytesColumnVector.java:492)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.expressions.IfExprColumnCondExpr.evaluate(IfExprColumnCondExpr.java:117)
>   at 
> 

[jira] [Updated] (HIVE-20990) ORC, group by, case: java.lang.AssertionError: Output column number expected to be 0 when isRepeating

2024-02-05 Thread Yi Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-20990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yi Zhang updated HIVE-20990:

Labels: duplicate  (was: )

> ORC, group by, case: java.lang.AssertionError: Output column number expected 
> to be 0 when isRepeating
> -
>
> Key: HIVE-20990
> URL: https://issues.apache.org/jira/browse/HIVE-20990
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Affects Versions: 3.0.0
> Environment: * Hive 3.0.0 from hdp 3.0.1.
>  * centos 7
>  * 5 datanodes, 2 masters
>  
>Reporter: Guillaume
>Priority: Major
>  Labels: duplicate
>
> Run this to replicate:
> {code:sql}
> drop table if exists ds;
> create table ds stored as orc as select
>   inline(array(
> struct('gmail.com'), -- when branch of the case statement
> struct('apache.org') -- else branch of the case statement
>   )) as (domain)
> ;
> select
>   case
> when domain='gmail.com' then 'gmail'
> else coalesce(domain, 'other')
>   end as domaingroup
> from ds
> group by 1 -- useless (datawise) for this example, but triggers the bug.
> ;
> {code}
> Exceptions out with:  
> {noformat}
> java.lang.RuntimeException: java.lang.AssertionError: Output column number 
> expected to be 0 when isRepeating
> {noformat}
>  
> Full exception is shown below.
> Of interest:
>  * if the case is removed (eg. replaced only by the else clause), the query 
> works,
>  * if only the else clause from the case matches, the query works,
>  * replacing the case by a bunch of nested if does not change anything,
>  * removing the group by and the query works,
>  * replacing the table (ds) by a CTE and the query works.
> Workaround, at the cost of performance: 
> {code:java}
> set hive.vectorized.execution.enabled = false; 
> {code}
>  
> Full exception:
> {noformat}
> ERROR : FAILED: Execution Error, return code 2 from 
> org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, vertexName=Map 1, 
> vertexId=vertex_1543326624484_8258_45_00, diagnostics=[Task failed, 
> taskId=task_1543326624484_8258_45_00_00, diagnostics=[TaskAttempt 0 
> failed, info=[Error: Error while running task ( failure ) : 
> java.lang.RuntimeException: java.lang.AssertionError: Output column number 
> expected to be 0 when isRepeating
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:101)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:76)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:419)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:267)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
>   at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>   at 
> com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:108)
>   at 
> com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:41)
>   at 
> com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:77)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.AssertionError: Output column number expected to be 0 
> when isRepeating
>   at 
> org.apache.hadoop.hive.ql.exec.vector.BytesColumnVector.setElement(BytesColumnVector.java:492)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.expressions.IfExprColumnCondExpr.evaluate(IfExprColumnCondExpr.java:117)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.process(VectorSelectOperator.java:146)
>   at 
> org.apache.hadoop.hive.ql.exec.Operator.vectorForward(Operator.java:965)
>   at 

[jira] [Updated] (HIVE-26408) Vectorization: Fix deallocation of scratch columns, don't reuse a child ConstantVectorExpression as an output

2024-02-05 Thread Yi Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yi Zhang updated HIVE-26408:

Labels: hive-3.2.0-candidate pull-request-available  (was: 
pull-request-available)

> Vectorization: Fix deallocation of scratch columns, don't reuse a child 
> ConstantVectorExpression as an output
> -
>
> Key: HIVE-26408
> URL: https://issues.apache.org/jira/browse/HIVE-26408
> Project: Hive
>  Issue Type: Bug
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
>  Labels: hive-3.2.0-candidate, pull-request-available
> Fix For: 4.0.0-alpha-2
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> This is similar to HIVE-15588. With a customer query, I reproduced a 
> vectorized expression tree like the below one (I'll attach a simple repro 
> query when it's possible):
> {code}
> selectExpressions: IfExprCondExprColumn(col 67:boolean, col 63:string, col 
> 61:string)(children: StringColumnInList(col 13, values TermDeposit, 
> RecurringDeposit, CertificateOfDeposit) -> 67:boolean, VectorCoalesce(columns 
> [61, 62])(children: VectorUDFAdaptor(from_unixtime(to_unix_timestamp(CAST( 
> _col1 AS DATE)), 'MM-dd-'))(children: VectorUDFUnixTimeStampDate(col 
> 68)(children: CastStringToDate(col 33:string) -> 68:date) -> 69:bigint) -> 
> 61:string, ConstantVectorExpression(val  ) -> 62:string) -> 63:string, 
> ConstantVectorExpression(val ) -> 61:string) -> 62:string
> {code}
> query part was:
> {code}
>   CASE WHEN DLY_BAL.PDELP_VALUE in (
> 'TermDeposit', 'RecurringDeposit',
> 'CertificateOfDeposit'
>   ) THEN NVL(
> (
>   from_unixtime(
> unix_timestamp(
>   cast(DLY_BAL.APATD_MTRTY_DATE as date)
> ),
> 'MM-dd-'
>   )
> ),
> ' '
>   ) ELSE '' END AS MAT_DTE
> {code}
> Here is the problem described:
> 1. IfExprCondExprColumn has 62:string as its 
> [outputColumn|https://github.com/apache/hive/blob/d3309c0ea9da907af4d27427805084b7331a6c24/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/IfExprCondExprColumn.java#L64],
>  which is a reused scratch column (see 5) )
> 2. in evaluation time, [isRepeating is 
> reset|https://github.com/apache/hive/blob/d3309c0ea9da907af4d27427805084b7331a6c24/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/IfExprCondExprColumn.java#L68]
> 3. in order to evaluate IfExprCondExprColumn, the conditional evaluation of 
> children is required, so we go to 
> [conditionalEvaluate|https://github.com/apache/hive/blob/d3309c0ea9da907af4d27427805084b7331a6c24/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/IfExprCondExprColumn.java#L95]
> 4. one of the children is ConstantVectorExpression(val  ) -> 62:string, which 
> belongs to the second branch of VectorCoalesce, so to the '' empty string in 
> NVL's second argument
> 5. in 4) 62: string column is set to an isRepeating column (and it's released 
> by 
> [freeNonColumns|https://github.com/apache/hive/blob/d3309c0ea9da907af4d27427805084b7331a6c24/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorizationContext.java#L2459]),
>  so it's marked as a reusable scratch column
> 6. after the conditional evaluation in 3), the final output of 
> IfExprCondExprColumn set 
> [here|https://github.com/apache/hive/blob/d3309c0ea9da907af4d27427805084b7331a6c24/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/IfExprCondExprColumn.java#L99],
>  but here we get an exception 
> [here|https://github.com/apache/hive/blob/d3309c0ea9da907af4d27427805084b7331a6c24/storage-api/src/java/org/apache/hadoop/hive/ql/exec/vector/BytesColumnVector.java#L484]:
> {code}
> 2022-07-01T04:26:24,567 ERROR [TezTR-745267_1_35_6_0_0] tez.MapRecordSource: 
> java.lang.AssertionError: Output column number expected to be 0 when 
> isRepeating
>   at 
> org.apache.hadoop.hive.ql.exec.vector.BytesColumnVector.setElement(BytesColumnVector.java:494)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.expressions.IfExprCondExprColumn.evaluate(IfExprCondExprColumn.java:108)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.process(VectorSelectOperator.java:146)
>   at 
> org.apache.hadoop.hive.ql.exec.Operator.vectorForward(Operator.java:969)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinGenerateResultOperator.forwardBigTableBatch(VectorMapJoinGenerateResultOperator.java:694)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinInnerBigOnlyStringOperator.processBatch(VectorMapJoinInnerBigOnlyStringOperator.java:371)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinCommonOperator.process(VectorMapJoinCommonOperator.java:839)
>   at 
> 

[jira] [Updated] (HIVE-27951) hcatalog dynamic partitioning fails with partition already exist error when exist parent partitions path

2023-12-18 Thread Yi Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yi Zhang updated HIVE-27951:

Attachment: (was: HIVE-27951.patch)

> hcatalog dynamic partitioning fails with partition already exist error when 
> exist parent partitions path
> 
>
> Key: HIVE-27951
> URL: https://issues.apache.org/jira/browse/HIVE-27951
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog
>Affects Versions: 4.0.0-beta-1
>Reporter: Yi Zhang
>Assignee: Yi Zhang
>Priority: Critical
>  Labels: pull-request-available
>
> if a table have multiple partitions (part1=x1, part2=y1), when insert into a 
> new partition(part1=x1, part2=y2) hcatalog FileOutputCommitterContainer 
> throws path already exists error
>  
> reproduce:
> create table source(id int, part1 string, part2 string);
> create table target(id int) partitioned by (part1 string, part2 string)
> insert into table source values (1, "x1", "y1"), (2, "x1", "y2");
>  
> pig -useHcatalog
> A = load 'source' using org.apache.hive.hcatalog.pig.HCatLoader();
> B = filter A by (part2 == 'y1');
> // following succeeds
> store B into 'target' USING org.apache.hive.hcatalog.pig.HCatStorer();
> //following fails with duplicate publishing error
> C = filter A by (part2 == 'y2');
> store C into 'target' USING org.apache.hive.hcatalog.pig.HCatStorer();
>  
>  
> ```
> Partition already present with given partition key values : Data already 
> exists in /user/hive/warehouse/target_data/part1=x1, duplicate publish not 
> possible.
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputCommitter.commitJob(PigOutputCommitter.java:243)
> at 
> org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler$EventProcessor.handleJobCommit(CommitterEventHandler.java:286)
>  
> Caused by: org.apache.hive.hcatalog.common.HCatException : 2002 : Partition 
> already present with given partition key values : Data already exists in 
> /user/hive/warehouse/target_data/part1=x1, duplicate publish not possible.
> at 
> org.apache.hive.hcatalog.mapreduce.FileOutputCommitterContainer.moveTaskOutputs(FileOutputCommitterContainer.java:564)
> at 
> org.apache.hive.hcatalog.mapreduce.FileOutputCommitterContainer.registerPartitions(FileOutputCommitterContainer.java:949)
> at 
> org.apache.hive.hcatalog.mapreduce.FileOutputCommitterContainer.commitJob(FileOutputCommitterContainer.java:273)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputCommitter.commitJob(PigOutputCommitter.java:241)
> ```



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27951) hcatalog dynamic partitioning fails with partition already exist error when exist parent partitions path

2023-12-11 Thread Yi Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yi Zhang updated HIVE-27951:

Attachment: HIVE-27951.patch

> hcatalog dynamic partitioning fails with partition already exist error when 
> exist parent partitions path
> 
>
> Key: HIVE-27951
> URL: https://issues.apache.org/jira/browse/HIVE-27951
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog
>Affects Versions: 4.0.0-beta-1
>Reporter: Yi Zhang
>Assignee: Yi Zhang
>Priority: Critical
> Attachments: HIVE-27951.patch
>
>
> if a table have multiple partitions (part1=x1, part2=y1), when insert into a 
> new partition(part1=x1, part2=y2) hcatalog FileOutputCommitterContainer 
> throws path already exists error
>  
> reproduce:
> create table source(id int, part1 string, part2 string);
> create table target(id int) partitioned by (part1 string, part2 string)
> insert into table source values (1, "x1", "y1"), (2, "x1", "y2");
>  
> pig -useHcatalog
> A = load 'source' using org.apache.hive.hcatalog.pig.HCatLoader();
> B = filter A by (part2 == 'y1');
> // following succeeds
> store B into 'target' USING org.apache.hive.hcatalog.pig.HCatStorer();
> //following fails with duplicate publishing error
> C = filter A by (part2 == 'y2');
> store C into 'target' USING org.apache.hive.hcatalog.pig.HCatStorer();
>  
>  
> ```
> Partition already present with given partition key values : Data already 
> exists in /user/hive/warehouse/target_data/part1=x1, duplicate publish not 
> possible.
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputCommitter.commitJob(PigOutputCommitter.java:243)
> at 
> org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler$EventProcessor.handleJobCommit(CommitterEventHandler.java:286)
>  
> Caused by: org.apache.hive.hcatalog.common.HCatException : 2002 : Partition 
> already present with given partition key values : Data already exists in 
> /user/hive/warehouse/target_data/part1=x1, duplicate publish not possible.
> at 
> org.apache.hive.hcatalog.mapreduce.FileOutputCommitterContainer.moveTaskOutputs(FileOutputCommitterContainer.java:564)
> at 
> org.apache.hive.hcatalog.mapreduce.FileOutputCommitterContainer.registerPartitions(FileOutputCommitterContainer.java:949)
> at 
> org.apache.hive.hcatalog.mapreduce.FileOutputCommitterContainer.commitJob(FileOutputCommitterContainer.java:273)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputCommitter.commitJob(PigOutputCommitter.java:241)
> ```



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-27951) hcatalog dynamic partitioning fails with partition already exist error when exist parent partitions path

2023-12-11 Thread Yi Zhang (Jira)
Yi Zhang created HIVE-27951:
---

 Summary: hcatalog dynamic partitioning fails with partition 
already exist error when exist parent partitions path
 Key: HIVE-27951
 URL: https://issues.apache.org/jira/browse/HIVE-27951
 Project: Hive
  Issue Type: Bug
  Components: HCatalog
Affects Versions: 4.0.0-beta-1
Reporter: Yi Zhang


if a table have multiple partitions (part1=x1, part2=y1), when insert into a 
new partition(part1=x1, part2=y2) hcatalog FileOutputCommitterContainer throws 
path already exists error

 

reproduce:

create table source(id int, part1 string, part2 string);

create table target(id int) partitioned by (part1 string, part2 string)

insert into table source values (1, "x1", "y1"), (2, "x1", "y2");

 

pig -useHcatalog

A = load 'source' using org.apache.hive.hcatalog.pig.HCatLoader();
B = filter A by (part2 == 'y1');

// following succeeds
store B into 'target' USING org.apache.hive.hcatalog.pig.HCatStorer();

//following fails with duplicate publishing error

C = filter A by (part2 == 'y2');
store C into 'target' USING org.apache.hive.hcatalog.pig.HCatStorer();

 

 

```
Partition already present with given partition key values : Data already exists 
in /user/hive/warehouse/target_data/part1=x1, duplicate publish not possible.
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputCommitter.commitJob(PigOutputCommitter.java:243)
at 
org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler$EventProcessor.handleJobCommit(CommitterEventHandler.java:286)
 
Caused by: org.apache.hive.hcatalog.common.HCatException : 2002 : Partition 
already present with given partition key values : Data already exists in 
/user/hive/warehouse/target_data/part1=x1, duplicate publish not possible.
at 
org.apache.hive.hcatalog.mapreduce.FileOutputCommitterContainer.moveTaskOutputs(FileOutputCommitterContainer.java:564)
at 
org.apache.hive.hcatalog.mapreduce.FileOutputCommitterContainer.registerPartitions(FileOutputCommitterContainer.java:949)
at 
org.apache.hive.hcatalog.mapreduce.FileOutputCommitterContainer.commitJob(FileOutputCommitterContainer.java:273)
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputCommitter.commitJob(PigOutputCommitter.java:241)
```



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HIVE-27951) hcatalog dynamic partitioning fails with partition already exist error when exist parent partitions path

2023-12-11 Thread Yi Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yi Zhang reassigned HIVE-27951:
---

Assignee: Yi Zhang

> hcatalog dynamic partitioning fails with partition already exist error when 
> exist parent partitions path
> 
>
> Key: HIVE-27951
> URL: https://issues.apache.org/jira/browse/HIVE-27951
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog
>Affects Versions: 4.0.0-beta-1
>Reporter: Yi Zhang
>Assignee: Yi Zhang
>Priority: Critical
>
> if a table have multiple partitions (part1=x1, part2=y1), when insert into a 
> new partition(part1=x1, part2=y2) hcatalog FileOutputCommitterContainer 
> throws path already exists error
>  
> reproduce:
> create table source(id int, part1 string, part2 string);
> create table target(id int) partitioned by (part1 string, part2 string)
> insert into table source values (1, "x1", "y1"), (2, "x1", "y2");
>  
> pig -useHcatalog
> A = load 'source' using org.apache.hive.hcatalog.pig.HCatLoader();
> B = filter A by (part2 == 'y1');
> // following succeeds
> store B into 'target' USING org.apache.hive.hcatalog.pig.HCatStorer();
> //following fails with duplicate publishing error
> C = filter A by (part2 == 'y2');
> store C into 'target' USING org.apache.hive.hcatalog.pig.HCatStorer();
>  
>  
> ```
> Partition already present with given partition key values : Data already 
> exists in /user/hive/warehouse/target_data/part1=x1, duplicate publish not 
> possible.
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputCommitter.commitJob(PigOutputCommitter.java:243)
> at 
> org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler$EventProcessor.handleJobCommit(CommitterEventHandler.java:286)
>  
> Caused by: org.apache.hive.hcatalog.common.HCatException : 2002 : Partition 
> already present with given partition key values : Data already exists in 
> /user/hive/warehouse/target_data/part1=x1, duplicate publish not possible.
> at 
> org.apache.hive.hcatalog.mapreduce.FileOutputCommitterContainer.moveTaskOutputs(FileOutputCommitterContainer.java:564)
> at 
> org.apache.hive.hcatalog.mapreduce.FileOutputCommitterContainer.registerPartitions(FileOutputCommitterContainer.java:949)
> at 
> org.apache.hive.hcatalog.mapreduce.FileOutputCommitterContainer.commitJob(FileOutputCommitterContainer.java:273)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputCommitter.commitJob(PigOutputCommitter.java:241)
> ```



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27600) Reduce filesystem calls in OrcFileMergeOperator

2023-08-11 Thread Yi Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yi Zhang updated HIVE-27600:

Description: avoid unnecessary creating orc readers in the 
OrcFileMergeOperator

> Reduce filesystem calls in OrcFileMergeOperator
> ---
>
> Key: HIVE-27600
> URL: https://issues.apache.org/jira/browse/HIVE-27600
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Affects Versions: 4.0.0-alpha-2
>Reporter: Yi Zhang
>Assignee: Yi Zhang
>Priority: Minor
>  Labels: pull-request-available
>
> avoid unnecessary creating orc readers in the OrcFileMergeOperator



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HIVE-27600) Reduce filesystem calls in OrcFileMergeOperator

2023-08-11 Thread Yi Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yi Zhang reassigned HIVE-27600:
---

Assignee: Yi Zhang

> Reduce filesystem calls in OrcFileMergeOperator
> ---
>
> Key: HIVE-27600
> URL: https://issues.apache.org/jira/browse/HIVE-27600
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Affects Versions: 4.0.0-alpha-2
>Reporter: Yi Zhang
>Assignee: Yi Zhang
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27600) Reduce filesystem calls in OrcFileMergeOperator

2023-08-11 Thread Yi Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yi Zhang updated HIVE-27600:

Affects Version/s: 4.0.0-alpha-2

> Reduce filesystem calls in OrcFileMergeOperator
> ---
>
> Key: HIVE-27600
> URL: https://issues.apache.org/jira/browse/HIVE-27600
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Affects Versions: 4.0.0-alpha-2
>Reporter: Yi Zhang
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27600) Reduce filesystem calls in OrcFileMergeOperator

2023-08-11 Thread Yi Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yi Zhang updated HIVE-27600:

Summary: Reduce filesystem calls in OrcFileMergeOperator  (was: Reduce 
filesystem calls in OrcFileMerge)

> Reduce filesystem calls in OrcFileMergeOperator
> ---
>
> Key: HIVE-27600
> URL: https://issues.apache.org/jira/browse/HIVE-27600
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Reporter: Yi Zhang
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-27600) Reduce filesystem calls in OrcFileMerge

2023-08-11 Thread Yi Zhang (Jira)
Yi Zhang created HIVE-27600:
---

 Summary: Reduce filesystem calls in OrcFileMerge
 Key: HIVE-27600
 URL: https://issues.apache.org/jira/browse/HIVE-27600
 Project: Hive
  Issue Type: Improvement
  Components: Hive
Reporter: Yi Zhang






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27218) Hive-3 switch hive.materializedview.rewriting default to false

2023-04-04 Thread Yi Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yi Zhang updated HIVE-27218:

Summary: Hive-3 switch hive.materializedview.rewriting default to false  
(was: Hive-3 set hive.materializedview.rewriting default to false)

> Hive-3 switch hive.materializedview.rewriting default to false
> --
>
> Key: HIVE-27218
> URL: https://issues.apache.org/jira/browse/HIVE-27218
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Affects Versions: 3.1.0, 3.1.2, 3.1.3
>Reporter: Yi Zhang
>Priority: Major
>
> https://issues.apache.org/jira/browse/HIVE-19973 switched 
> hive.materializedview.rewriting default from false to true, however user 
> observed high latency at query compilation as they have large amount of 
> databases (5k), each call to a remote metastore DB adds up the latency to 
> minutes, each query compilation time is high.
> as hive-4 have improvements in HIVE-21631 and HIVE-21344  backport is 
> unlikely, suggest to turn this false by default in hive-3.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HIVE-27200) Backport HIVE-24928 to branch-3

2023-03-31 Thread Yi Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yi Zhang reassigned HIVE-27200:
---

Assignee: Yi Zhang

> Backport HIVE-24928 to branch-3
> ---
>
> Key: HIVE-27200
> URL: https://issues.apache.org/jira/browse/HIVE-27200
> Project: Hive
>  Issue Type: Improvement
>  Components: StorageHandler
>Reporter: Yi Zhang
>Assignee: Yi Zhang
>Priority: Critical
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> This is to backport HIVE-24928 so that for HiveStorageHandler table 'ANALYZE 
> TABLE ... COMPUTE STATISTICS' can use storagehandler to provide basic stats 
> with BasicStatsNoJobTask



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27143) Optimize HCatStorer move task

2023-03-30 Thread Yi Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yi Zhang updated HIVE-27143:

Summary: Optimize HCatStorer move task  (was: Improve HCatStorer move task)

> Optimize HCatStorer move task
> -
>
> Key: HIVE-27143
> URL: https://issues.apache.org/jira/browse/HIVE-27143
> Project: Hive
>  Issue Type: Improvement
>  Components: HCatalog
>Affects Versions: 3.1.3
>Reporter: Yi Zhang
>Assignee: Yi Zhang
>Priority: Major
>
> moveTask in hcatalog is inefficient, it does 2 iterations dryRun and 
> execution, and is sequential. This can be improved.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27200) Backport HIVE-24928 to branch-3

2023-03-30 Thread Yi Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yi Zhang updated HIVE-27200:

Summary: Backport HIVE-24928 to branch-3  (was: Backport HIVE-24928 In case 
of non-native tables use basic statistics from HiveStorageHandler)

> Backport HIVE-24928 to branch-3
> ---
>
> Key: HIVE-27200
> URL: https://issues.apache.org/jira/browse/HIVE-27200
> Project: Hive
>  Issue Type: Improvement
>  Components: StorageHandler
>Reporter: Yi Zhang
>Priority: Critical
>
> This is to backport HIVE-24928 so that for HiveStorageHandler table 'ANALYZE 
> TABLE ... COMPUTE STATISTICS' can use storagehandler to provide basic stats 
> with BasicStatsNoJobTask



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27149) StorageHandler PPD query planning statistics not adjusted for pushedPredicate

2023-03-16 Thread Yi Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yi Zhang updated HIVE-27149:

Summary: StorageHandler PPD query planning statistics not adjusted for 
pushedPredicate  (was: StorageHandler PPD query planning statistics not 
adjusted for residualPredicate)

> StorageHandler PPD query planning statistics not adjusted for pushedPredicate
> -
>
> Key: HIVE-27149
> URL: https://issues.apache.org/jira/browse/HIVE-27149
> Project: Hive
>  Issue Type: Bug
>  Components: StorageHandler
>Affects Versions: 4.0.0-alpha-2
>Reporter: Yi Zhang
>Priority: Minor
>
> In StorageHandler PPD, filter predicates can be pushed down to storage and 
> trimmed to a subset residualPredicate, however, at query planning statistics 
> based on filters only consider the 'final' residual predicates, when in fact 
> there are pushedPredicates that should also be considered, this affect 
> reducer parallelism (more reducers than needed)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HIVE-27143) Improve HCatStorer move task

2023-03-15 Thread Yi Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yi Zhang reassigned HIVE-27143:
---

Assignee: Yi Zhang

> Improve HCatStorer move task
> 
>
> Key: HIVE-27143
> URL: https://issues.apache.org/jira/browse/HIVE-27143
> Project: Hive
>  Issue Type: Improvement
>  Components: HCatalog
>Affects Versions: 3.1.3
>Reporter: Yi Zhang
>Assignee: Yi Zhang
>Priority: Major
>
> moveTask in hcatalog is inefficient, it does 2 iterations dryRun and 
> execution, and is sequential. This can be improved.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-27115) HiveInputFormat column project push down wrong fields (MR)

2023-03-07 Thread Yi Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-27115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17697692#comment-17697692
 ] 

Yi Zhang commented on HIVE-27115:
-

I see HIVE-25673 fixed a similar issue. [~pvary] [~Marton Bod] is this the same 
issue as HIVE-25673?

> HiveInputFormat column project push down wrong fields (MR)
> --
>
> Key: HIVE-27115
> URL: https://issues.apache.org/jira/browse/HIVE-27115
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 3.1.3, 4.0.0-alpha-2
>Reporter: Yi Zhang
>Assignee: Yi Zhang
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> For query such as 
> select * from (
> select r_name from r
> union all
> select t_name from t
> ) unioned
>  
> in MR execution, when column project push down for splits, t_name gets pushed 
> down to table r. 
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-27115) HiveInputFormat column project push down wrong fields (MR)

2023-03-07 Thread Yi Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-27115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17697577#comment-17697577
 ] 

Yi Zhang commented on HIVE-27115:
-

[~Marton Bod] wonder if hive-iceberg encountered this issue in MR mode and how 
it is handled?  since HiveIceBergInputFormat uses 

jobConf ColumnProjectionUtils.READ_COLUMN_NAMES_CONF_STR for getSplits

> HiveInputFormat column project push down wrong fields (MR)
> --
>
> Key: HIVE-27115
> URL: https://issues.apache.org/jira/browse/HIVE-27115
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 3.1.3, 4.0.0-alpha-2
>Reporter: Yi Zhang
>Assignee: Yi Zhang
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> For query such as 
> select * from (
> select r_name from r
> union all
> select t_name from t
> ) unioned
>  
> in MR execution, when column project push down for splits, t_name gets pushed 
> down to table r. 
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-27115) HiveInputFormat column project push down wrong fields (MR)

2023-03-06 Thread Yi Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-27115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17697152#comment-17697152
 ] 

Yi Zhang commented on HIVE-27115:
-

[~rajesh.balamohan] Is there scenario where in Tez mode multiple 
TableScanOperators passed to HiveInputFormat? I assume in Tez each TS has its 
own input initialized so it doesn't run into this issue. Understand that MR is 
deprecated, however the logic of HiveInputFormat itself has this bug when 
combines multiple TS. 

In some use cases of storagehandlers that was added in MR mode have some issues 
when run in Tez mode.

> HiveInputFormat column project push down wrong fields (MR)
> --
>
> Key: HIVE-27115
> URL: https://issues.apache.org/jira/browse/HIVE-27115
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 3.1.3, 4.0.0-alpha-2
>Reporter: Yi Zhang
>Assignee: Yi Zhang
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> For query such as 
> select * from (
> select r_name from r
> union all
> select t_name from t
> ) unioned
>  
> in MR execution, when column project push down for splits, t_name gets pushed 
> down to table r. 
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-27115) HiveInputFormat column project push down wrong fields (MR)

2023-03-03 Thread Yi Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-27115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17696327#comment-17696327
 ] 

Yi Zhang commented on HIVE-27115:
-

[~rajesh.balamohan] can you review the pr? thank you!

> HiveInputFormat column project push down wrong fields (MR)
> --
>
> Key: HIVE-27115
> URL: https://issues.apache.org/jira/browse/HIVE-27115
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 3.1.3, 4.0.0-alpha-2
>Reporter: Yi Zhang
>Assignee: Yi Zhang
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> For query such as 
> select * from (
> select r_name from r
> union all
> select t_name from t
> ) unioned
>  
> in MR execution, when column project push down for splits, t_name gets pushed 
> down to table r. 
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HIVE-27115) HiveInputFormat column project push down wrong fields (MR)

2023-03-01 Thread Yi Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yi Zhang reassigned HIVE-27115:
---

Assignee: Yi Zhang

> HiveInputFormat column project push down wrong fields (MR)
> --
>
> Key: HIVE-27115
> URL: https://issues.apache.org/jira/browse/HIVE-27115
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 3.1.3, 4.0.0-alpha-2
>Reporter: Yi Zhang
>Assignee: Yi Zhang
>Priority: Major
>
> For query such as 
> select * from (
> select r_name from r
> union all
> select t_name from t
> ) unioned
>  
> in MR execution, when column project push down for splits, t_name gets pushed 
> down to table r. 
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27115) HiveInputFormat column project push down wrong fields (MR)

2023-02-28 Thread Yi Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yi Zhang updated HIVE-27115:

Description: 
For query such as 

select * from (

select r_name from r

union all

select t_name from t

) unioned

 

in MR execution, when column project push down for splits, t_name gets pushed 
down to table r. 

 

  was:
For query such as 

select * from (

select r_name from r

union all

select t_name from t

) unioned

 

in MR execution, when column project push down for splits on table r, it is 
t_name.

 


> HiveInputFormat column project push down wrong fields (MR)
> --
>
> Key: HIVE-27115
> URL: https://issues.apache.org/jira/browse/HIVE-27115
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 3.1.3, 4.0.0-alpha-2
>Reporter: Yi Zhang
>Priority: Major
>
> For query such as 
> select * from (
> select r_name from r
> union all
> select t_name from t
> ) unioned
>  
> in MR execution, when column project push down for splits, t_name gets pushed 
> down to table r. 
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HIVE-27017) option to use createTable DDLTask in CTAS for StorgeHandler

2023-02-02 Thread Yi Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yi Zhang reassigned HIVE-27017:
---

Assignee: Yi Zhang

> option to use createTable DDLTask in CTAS for StorgeHandler
> ---
>
> Key: HIVE-27017
> URL: https://issues.apache.org/jira/browse/HIVE-27017
> Project: Hive
>  Issue Type: Improvement
>  Components: StorageHandler
>Affects Versions: 3.1.3
>Reporter: Yi Zhang
>Assignee: Yi Zhang
>Priority: Major
>
> This is to add a directInsert option for StorageHandler and advance DDLTask 
> for CTAS when it is storagehandler directInsert mode. This is partial 
> backport of HIVE-26771



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HIVE-27016) Invoke optional output committer in TezProcessor

2023-02-02 Thread Yi Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yi Zhang reassigned HIVE-27016:
---

Assignee: Yi Zhang

> Invoke optional output committer in TezProcessor
> 
>
> Key: HIVE-27016
> URL: https://issues.apache.org/jira/browse/HIVE-27016
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor, StorageHandler
>Affects Versions: 3.1.3
>Reporter: Yi Zhang
>Assignee: Yi Zhang
>Priority: Major
>
> This is backport HIVE-24629 and HIVE-24867, so StorageHandler with their own 
> OutputCommitter run in tez.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-26815) Backport HIVE-26758 (Allow use scratchdir for staging final job)

2022-12-15 Thread Yi Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-26815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17648214#comment-17648214
 ] 

Yi Zhang commented on HIVE-26815:
-

[~sruthim-official] the setting is used with  blob storage settings

hive.use.scratchdir.for.staging=true 

hive.blobstore.optimizations.enabled=true

hive.blobstore.supported.schemes=

hive.blobstore.use.blobstore.as.scratchdir=false

> Backport HIVE-26758 (Allow use scratchdir for staging final job)
> 
>
> Key: HIVE-26815
> URL: https://issues.apache.org/jira/browse/HIVE-26815
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Affects Versions: 3.1.3
>Reporter: Yi Zhang
>Assignee: Yi Zhang
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.2.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> HIVE-26758 add an option to allow choose set final job staging with 
> hive.exec.scratchdir. This is to backport this into 3.2.0



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HIVE-26815) Backport HIVE-26758 (Allow use scratchdir for staging final job)

2022-12-07 Thread Yi Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yi Zhang reassigned HIVE-26815:
---

Assignee: Yi Zhang

> Backport HIVE-26758 (Allow use scratchdir for staging final job)
> 
>
> Key: HIVE-26815
> URL: https://issues.apache.org/jira/browse/HIVE-26815
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Affects Versions: 3.1.3
>Reporter: Yi Zhang
>Assignee: Yi Zhang
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> HIVE-26758 add an option to allow choose set final job staging with 
> hive.exec.scratchdir. This is to backport this into 3.2.0



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HIVE-26819) Vectorization: wrong results when filter on repeating map key orc table

2022-12-07 Thread Yi Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yi Zhang reassigned HIVE-26819:
---

Assignee: Yi Zhang

> Vectorization: wrong results when filter on repeating map key orc table
> ---
>
> Key: HIVE-26819
> URL: https://issues.apache.org/jira/browse/HIVE-26819
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 3.1.3
>Reporter: Yi Zhang
>Assignee: Yi Zhang
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Same issue is fixed by HIVE-26447, this is to fix in 3.2.0. 
> Example reproducible case:
> set hive.vectorized.execution.enabled=true;
> set hive.fetch.task.conversion=none;
> create temporary table foo (id int, x map) stored as orc;
> insert into foo values(1, map('ABC', 9)), (2, map('ABC', 7)), (3, map('ABC', 
> 8)), (4, map('ABC', 9));
> select id from foo where x['ABC']=9;
> this only gives 1, when correct result should be 1,4
> For every VectorizedRowBatch, only the first row is checked.  
> This seems to be a corner case of ORC table have repeating string type key 
> for map field in the MapColumnVector.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-26447) Vectorization: wrong results when filter on repeating map key orc table

2022-12-07 Thread Yi Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yi Zhang updated HIVE-26447:

Affects Version/s: (was: 3.1.3)

> Vectorization: wrong results when filter on repeating map key orc table
> ---
>
> Key: HIVE-26447
> URL: https://issues.apache.org/jira/browse/HIVE-26447
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 4.0.0-alpha-1
>Reporter: Yi Zhang
>Assignee: Yi Zhang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0-alpha-2
>
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> Example reproducible case:
>  
> set hive.vectorized.execution.enabled=true;
> set hive.fetch.task.conversion=none;
> create temporary table foo (id int, x map) stored as orc;
> insert into foo values(1, map('ABC', 9)), (2, map('ABC', 7)), (3, map('ABC', 
> 8)), (4, map('ABC', 9));
> select id from foo where x['ABC']=9;
> this only gives 1, when correct result should be 1,4
> For every VectorizedRowBatch, only the first row is checked.  
> This seems to be a corner case of ORC table have repeating string type key 
> for map field in the MapColumnVector.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-26819) Vectorization: wrong results when filter on repeating map key orc table

2022-12-07 Thread Yi Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yi Zhang updated HIVE-26819:

Description: 
Same issue is fixed by HIVE-26447, this is to fix in 3.2.0. 

Example reproducible case:

set hive.vectorized.execution.enabled=true;

set hive.fetch.task.conversion=none;

create temporary table foo (id int, x map) stored as orc;
insert into foo values(1, map('ABC', 9)), (2, map('ABC', 7)), (3, map('ABC', 
8)), (4, map('ABC', 9));

select id from foo where x['ABC']=9;

this only gives 1, when correct result should be 1,4

For every VectorizedRowBatch, only the first row is checked.  

This seems to be a corner case of ORC table have repeating string type key for 
map field in the MapColumnVector.

  was:Same issue is fixed by HIVE-26447, this is to fix in 3.2.0. 


> Vectorization: wrong results when filter on repeating map key orc table
> ---
>
> Key: HIVE-26819
> URL: https://issues.apache.org/jira/browse/HIVE-26819
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 3.1.3
>Reporter: Yi Zhang
>Priority: Minor
>
> Same issue is fixed by HIVE-26447, this is to fix in 3.2.0. 
> Example reproducible case:
> set hive.vectorized.execution.enabled=true;
> set hive.fetch.task.conversion=none;
> create temporary table foo (id int, x map) stored as orc;
> insert into foo values(1, map('ABC', 9)), (2, map('ABC', 7)), (3, map('ABC', 
> 8)), (4, map('ABC', 9));
> select id from foo where x['ABC']=9;
> this only gives 1, when correct result should be 1,4
> For every VectorizedRowBatch, only the first row is checked.  
> This seems to be a corner case of ORC table have repeating string type key 
> for map field in the MapColumnVector.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-26815) Backport HIVE-26758 (Allow use scratchdir for staging final job)

2022-12-07 Thread Yi Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yi Zhang updated HIVE-26815:

Affects Version/s: 3.1.3
   (was: 3.2.0)

> Backport HIVE-26758 (Allow use scratchdir for staging final job)
> 
>
> Key: HIVE-26815
> URL: https://issues.apache.org/jira/browse/HIVE-26815
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Affects Versions: 3.1.3
>Reporter: Yi Zhang
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> HIVE-26758 add an option to allow choose set final job staging with 
> hive.exec.scratchdir. This is to backport this into 3.2.0



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-26447) Vectorization: wrong results when filter on repeating map key orc table

2022-12-07 Thread Yi Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yi Zhang updated HIVE-26447:

Affects Version/s: 4.0.0-alpha-1
   (was: 4.0.0)

> Vectorization: wrong results when filter on repeating map key orc table
> ---
>
> Key: HIVE-26447
> URL: https://issues.apache.org/jira/browse/HIVE-26447
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 3.1.3, 4.0.0-alpha-1
>Reporter: Yi Zhang
>Assignee: Yi Zhang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0-alpha-2
>
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> Example reproducible case:
>  
> set hive.vectorized.execution.enabled=true;
> set hive.fetch.task.conversion=none;
> create temporary table foo (id int, x map) stored as orc;
> insert into foo values(1, map('ABC', 9)), (2, map('ABC', 7)), (3, map('ABC', 
> 8)), (4, map('ABC', 9));
> select id from foo where x['ABC']=9;
> this only gives 1, when correct result should be 1,4
> For every VectorizedRowBatch, only the first row is checked.  
> This seems to be a corner case of ORC table have repeating string type key 
> for map field in the MapColumnVector.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-26815) Backport HIVE-26758 (Allow use scratchdir for staging final job)

2022-12-07 Thread Yi Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yi Zhang updated HIVE-26815:

Summary: Backport HIVE-26758 (Allow use scratchdir for staging final job)  
(was: Backport HIVE-26758 (Allow use scratchdir for staging final job) to 3.2.0)

> Backport HIVE-26758 (Allow use scratchdir for staging final job)
> 
>
> Key: HIVE-26815
> URL: https://issues.apache.org/jira/browse/HIVE-26815
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Affects Versions: 3.2.0
>Reporter: Yi Zhang
>Priority: Minor
>
> HIVE-26758 add an option to allow choose set final job staging with 
> hive.exec.scratchdir. This is to backport this into 3.2.0



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-26813) Upgrade HikariCP from 2.6.1 to 4.0.3.

2022-12-06 Thread Yi Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-26813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17644151#comment-17644151
 ] 

Yi Zhang commented on HIVE-26813:
-

+1

> Upgrade HikariCP from 2.6.1 to 4.0.3.
> -
>
> Key: HIVE-26813
> URL: https://issues.apache.org/jira/browse/HIVE-26813
> Project: Hive
>  Issue Type: Improvement
>  Components: Standalone Metastore
>Reporter: Chris Nauroth
>Assignee: Chris Nauroth
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The Hive Metastore currently integrates with HikariCP 2.6.1 for database 
> connection pooling. This version was released in 2017. The most recent Java 
> 8-compatible release is 4.0.3, released earlier this year. This bug proposes 
> to upgrade so that we can include the past few years of development and bug 
> fixes in the 4.0.0 GA release.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-26758) Allow use scratchdir for staging final job

2022-11-30 Thread Yi Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-26758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17641680#comment-17641680
 ] 

Yi Zhang commented on HIVE-26758:
-

[~pvary] can you help review this?

> Allow use scratchdir for staging final job
> --
>
> Key: HIVE-26758
> URL: https://issues.apache.org/jira/browse/HIVE-26758
> Project: Hive
>  Issue Type: New Feature
>  Components: Query Planning
>Affects Versions: 4.0.0-alpha-2
>Reporter: Yi Zhang
>Assignee: Yi Zhang
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> The query results are staged in stagingdir that is relative to the 
> destination path //
> during blobstorage optimzation HIVE-17620 final job is set to use stagingdir.
> HIVE-15215 mentioned the possibility of using scratch for staging when write 
> to S3 but it was long time ago and no activity.
>  
> This is to allow final job to use hive.exec.scratchdir as the interim jobs, 
> with a configuration 
> hive.use.scratchdir.for.staging
> This is useful for cross Filesystem, user can use local source filesystem 
> instead of remote filesystem for the staging.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-26758) Allow use scratchdir for staging final job

2022-11-18 Thread Yi Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yi Zhang updated HIVE-26758:

Description: 
The query results are staged in stagingdir that is relative to the destination 
path //

during blobstorage optimzation HIVE-17620 final job is set to use stagingdir.

HIVE-15215 mentioned the possibility of using scratch for staging when write to 
S3 but it was long time ago and no activity.

 

This is to allow final job to use hive.exec.scratchdir as the interim jobs, 
with a configuration 

hive.use.scratchdir.for.staging

This is useful for cross Filesystem, user can use local source filesystem 
instead of remote filesystem for the staging.

  was:
The query results are staged in stagingdir that is relative to the destination 
path //

It used to be able to change hive.exec.stagingdir for a different location, but 
that is lost during blobstorage optimzation HIVE-17620. 

HIVE-15215 mentioned the possibility of using scratch for staging when write to 
S3 but it was long time ago and no activity.

 

This is to allow final job to use hive.exec.scratchdir as the interim jobs, 
with a configuration 

hive.use.scratchdir.for.staging

This is useful for cross Filesystem, user can use local source filesystem 
instead of remote filesystem for the staging.


> Allow use scratchdir for staging final job
> --
>
> Key: HIVE-26758
> URL: https://issues.apache.org/jira/browse/HIVE-26758
> Project: Hive
>  Issue Type: New Feature
>  Components: Query Planning
>Affects Versions: 4.0.0-alpha-2
>Reporter: Yi Zhang
>Assignee: Yi Zhang
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The query results are staged in stagingdir that is relative to the 
> destination path //
> during blobstorage optimzation HIVE-17620 final job is set to use stagingdir.
> HIVE-15215 mentioned the possibility of using scratch for staging when write 
> to S3 but it was long time ago and no activity.
>  
> This is to allow final job to use hive.exec.scratchdir as the interim jobs, 
> with a configuration 
> hive.use.scratchdir.for.staging
> This is useful for cross Filesystem, user can use local source filesystem 
> instead of remote filesystem for the staging.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-26758) Allow use scratchdir for staging final job

2022-11-17 Thread Yi Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yi Zhang updated HIVE-26758:

Description: 
The query results are staged in stagingdir that is relative to the destination 
path //

It used to be able to change hive.exec.stagingdir for a different location, but 
that is lost during blobstorage optimzation HIVE-17620. 

HIVE-15215 mentioned the possibility of using scratch for staging when write to 
S3 but it was long time ago and no activity.

 

This is to allow final job to use hive.exec.scratchdir as the interim jobs, 
with a configuration 

hive.use.scratchdir.for.staging

This is useful for cross Filesystem, user can use local source filesystem 
instead of remote filesystem for the staging.

  was:
The query results are staged in stagingdir that is relative to the destination 
path //

It used to be able to change hive.exec.stagingdir for a different location, but 
that is lost during blobstorage optimzation HIVE-17620.

This is to allow final job to use hive.exec.scratchdir as the interim jobs, 
with a configuration 

hive.use.scratchdir_for_staging

This is useful for cross Filesystem, user can use local source filesystem 
instead of remote filesystem for the staging.

main change:

for dynamic partitions that has static partition it was

///

changes to 

///

or in case of \{hive.use.scratchdir_for_staging}

//

the change is due to that hive relies on parsing the path to discover 
partitions.


> Allow use scratchdir for staging final job
> --
>
> Key: HIVE-26758
> URL: https://issues.apache.org/jira/browse/HIVE-26758
> Project: Hive
>  Issue Type: New Feature
>  Components: Query Planning
>Affects Versions: 4.0.0-alpha-2
>Reporter: Yi Zhang
>Assignee: Yi Zhang
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The query results are staged in stagingdir that is relative to the 
> destination path //
> It used to be able to change hive.exec.stagingdir for a different location, 
> but that is lost during blobstorage optimzation HIVE-17620. 
> HIVE-15215 mentioned the possibility of using scratch for staging when write 
> to S3 but it was long time ago and no activity.
>  
> This is to allow final job to use hive.exec.scratchdir as the interim jobs, 
> with a configuration 
> hive.use.scratchdir.for.staging
> This is useful for cross Filesystem, user can use local source filesystem 
> instead of remote filesystem for the staging.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-26758) Allow use scratchdir for staging final job

2022-11-17 Thread Yi Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yi Zhang updated HIVE-26758:

Summary: Allow use scratchdir for staging final job  (was: Allow use 
scratchdir for staging)

> Allow use scratchdir for staging final job
> --
>
> Key: HIVE-26758
> URL: https://issues.apache.org/jira/browse/HIVE-26758
> Project: Hive
>  Issue Type: New Feature
>  Components: Query Planning
>Affects Versions: 4.0.0-alpha-2
>Reporter: Yi Zhang
>Assignee: Yi Zhang
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The query results are staged in stagingdir that is relative to the 
> destination path //
> It used to be able to change hive.exec.stagingdir for a different location, 
> but that is lost during blobstorage optimzation HIVE-17620.
> This is to allow final job to use hive.exec.scratchdir as the interim jobs, 
> with a configuration 
> hive.use.scratchdir_for_staging
> This is useful for cross Filesystem, user can use local source filesystem 
> instead of remote filesystem for the staging.
> main change:
> for dynamic partitions that has static partition it was
> ///
> changes to 
> ///
> or in case of \{hive.use.scratchdir_for_staging}
> //
> the change is due to that hive relies on parsing the path to discover 
> partitions.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-26758) Allow use scratchdir for staging

2022-11-17 Thread Yi Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yi Zhang updated HIVE-26758:

Description: 
The query results are staged in stagingdir that is relative to the destination 
path //

It used to be able to change hive.exec.stagingdir for a different location, but 
that is lost during blobstorage optimzation HIVE-17620.

This is to allow final job to use hive.exec.scratchdir as the interim jobs, 
with a configuration 

hive.use.scratchdir_for_staging

This is useful for cross Filesystem, user can use local source filesystem 
instead of remote filesystem for the staging.

main change:

for dynamic partitions that has static partition it was

///

changes to 

///

or in case of \{hive.use.scratchdir_for_staging}

//

the change is due to that hive relies on parsing the path to discover 
partitions.

  was:
The query results are staged in stagingdir that is relative to the destination 
path //

It used to be able to change hive.exec.stagingdir for a different location, but 
that is lost during blobstorage optimzation HIVE-17620.

This is to allow final job to use hive.exec.scratchdir as the interim jobs, 
with a configuration 

hive.use.scratchdir_for_staging

This is useful for cross Filesystem, user can use local source filesystem 
instead of remote filesystem for the staging.


> Allow use scratchdir for staging
> 
>
> Key: HIVE-26758
> URL: https://issues.apache.org/jira/browse/HIVE-26758
> Project: Hive
>  Issue Type: New Feature
>  Components: Query Planning
>Affects Versions: 4.0.0-alpha-2
>Reporter: Yi Zhang
>Assignee: Yi Zhang
>Priority: Minor
>
> The query results are staged in stagingdir that is relative to the 
> destination path //
> It used to be able to change hive.exec.stagingdir for a different location, 
> but that is lost during blobstorage optimzation HIVE-17620.
> This is to allow final job to use hive.exec.scratchdir as the interim jobs, 
> with a configuration 
> hive.use.scratchdir_for_staging
> This is useful for cross Filesystem, user can use local source filesystem 
> instead of remote filesystem for the staging.
> main change:
> for dynamic partitions that has static partition it was
> ///
> changes to 
> ///
> or in case of \{hive.use.scratchdir_for_staging}
> //
> the change is due to that hive relies on parsing the path to discover 
> partitions.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HIVE-26758) Allow use scratchdir for staging

2022-11-17 Thread Yi Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yi Zhang reassigned HIVE-26758:
---

Assignee: Yi Zhang

> Allow use scratchdir for staging
> 
>
> Key: HIVE-26758
> URL: https://issues.apache.org/jira/browse/HIVE-26758
> Project: Hive
>  Issue Type: New Feature
>  Components: Query Planning
>Affects Versions: 4.0.0-alpha-2
>Reporter: Yi Zhang
>Assignee: Yi Zhang
>Priority: Minor
>
> The query results are staged in stagingdir that is relative to the 
> destination path //
> It used to be able to change hive.exec.stagingdir for a different location, 
> but that is lost during blobstorage optimzation HIVE-17620.
> This is to allow final job to use hive.exec.scratchdir as the interim jobs, 
> with a configuration 
> hive.use.scratchdir_for_staging
> This is useful for cross Filesystem, user can use local source filesystem 
> instead of remote filesystem for the staging.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-26611) add HiveServer2 History Server?

2022-10-13 Thread Yi Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-26611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17617256#comment-17617256
 ] 

Yi Zhang commented on HIVE-26611:
-

[~ngangam] thank you for the guidance! I posted a 1-pager doc. please review.

> add HiveServer2 History Server?
> ---
>
> Key: HIVE-26611
> URL: https://issues.apache.org/jira/browse/HIVE-26611
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Reporter: Yi Zhang
>Priority: Major
> Attachments: HiveServer2 History Server.pdf
>
>
> HiveServer2 Web UI provides query profile and optional operation log, however 
> these are gone when hs2 server exits. 
> Was there discussion of add a hs2 history server before?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-26611) add HiveServer2 History Server?

2022-10-13 Thread Yi Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26611?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yi Zhang updated HIVE-26611:

Attachment: HiveServer2 History Server.pdf

> add HiveServer2 History Server?
> ---
>
> Key: HIVE-26611
> URL: https://issues.apache.org/jira/browse/HIVE-26611
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Reporter: Yi Zhang
>Priority: Major
> Attachments: HiveServer2 History Server.pdf
>
>
> HiveServer2 Web UI provides query profile and optional operation log, however 
> these are gone when hs2 server exits. 
> Was there discussion of add a hs2 history server before?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-26611) add HiveServer2 History Server?

2022-10-10 Thread Yi Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-26611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17615350#comment-17615350
 ] 

Yi Zhang commented on HIVE-26611:
-

Thank you [~pvary] for the input!  HS2 Web UI does have its limitations in 
production env, thus a stand-alone hs2 history server may work better for 
production env, if there is no third party tools available. Besides the reason 
that there are 3rd party tools, I was wondering if this was not developed 
because it would not bring much value?  [~ngangam] , I am thinking if this can 
bring value, I can look into this, but I am not familiar with hive community's 
current directions, appreciate your input!

> add HiveServer2 History Server?
> ---
>
> Key: HIVE-26611
> URL: https://issues.apache.org/jira/browse/HIVE-26611
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Reporter: Yi Zhang
>Priority: Major
>
> HiveServer2 Web UI provides query profile and optional operation log, however 
> these are gone when hs2 server exits. 
> Was there discussion of add a hs2 history server before?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-26611) add HiveServer2 History Server?

2022-10-07 Thread Yi Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-26611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17614176#comment-17614176
 ] 

Yi Zhang commented on HIVE-26611:
-

[~pvary]  wonder if you have any insight on this? thank you!

> add HiveServer2 History Server?
> ---
>
> Key: HIVE-26611
> URL: https://issues.apache.org/jira/browse/HIVE-26611
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Reporter: Yi Zhang
>Priority: Major
>
> HiveServer2 Web UI provides query profile and optional operation log, however 
> these are gone when hs2 server exits. 
> Was there discussion of add a hs2 history server before?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-26564) Separate query live operation log and historical operation log

2022-09-28 Thread Yi Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yi Zhang updated HIVE-26564:

Description: 
HIVE-24802 added OperationLogManager to support historical operation logs. 

OperationLogManager.createOperationLog creates operation log inside historical 
operation log dir if HIVE_SERVER2_HISTORIC_OPERATION_LOG_ENABLED=true. This is 
confusing, since on session level, SessionManager and HiveSession are using 
original operation log session directory.

Proposed change is to separate live query's operation log and historical 
operation log. Upon operation close, OperationLogManager.closeOperation is 
called to move the operation log from session directory to historical log dir. 
OperationLogManager is only responsible to clean up historical operation logs.

This change also makes it easier to manage historical logs, for example, user 
may want to persist historical logs, it is easier to differentiate live and 
historical operation logs.

 

before this change, if HIVE_SERVER2_HISTORIC_OPERATION_LOG_ENABLED=true, the 
operation logs lay out is as following. in operation_logs_historic has both 
live queries and historic queries's operational logs

```

/tmp/hive/

├── operation_logs

└── operation_logs_historic

    └── hs2hostname_startupTimestamp

        ├── session_id_1

        │   ├── hive_query_id_1

        │   ├── hive_query_id_2

        │   └── hive_query_id_3

        ├── session_id_2

        │   ├── hive_query_id_4

        │   └── hive_query_id_5

        ├── session_id_3

        │   └── hive_query_id_6

        └── session_id_4

            ├── hive_query_id_7

            └── hive_query_id_8

```

after this change, the live queries operation logs are under  
and historical ones under 

/tmp/hive

├── operation_logs

│   ├── session_id_1

│   │   ├── hive_query_id_2

│   │   └── hive_query_id_3

│   └── session_id_4

│       └── hive_query_id_8

└── operation_logs_historic

    └── hs2hostname_startupTimestamp

        ├── session_id_1

        │   └── hive_query_id_1

        ├── session_id_2

        │   ├── hive_query_id_4

        │   └── hive_query_id_5

        ├── session_id_3

        │   └── hive_query_id_6

        └── session_id_4

            └── hive_query_id_7

 

 

 

  was:
HIVE-24802 added OperationLogManager to support historical operation logs. 

OperationLogManager.createOperationLog creates operation log inside historical 
operation log dir if HIVE_SERVER2_HISTORIC_OPERATION_LOG_ENABLED=true. This is 
confusing, since on session level, SessionManager and HiveSession are using 
original operation log session directory.

Proposed change is to separate live query's operation log and historical 
operation log. Upon operation close, OperationLogManager.closeOperation is 
called to move the operation log from session directory to historical log dir. 
OperationLogManager is only responsible to clean up historical operation logs.

This change also makes it easier to manage historical logs, for example, user 
may want to persist historical logs, it is easier to differentiate live and 
historical operation logs.

 

before this change, if HIVE_SERVER2_HISTORIC_OPERATION_LOG_ENABLED=true, the 
operation logs lay out is as following. in operation_logs_historic has both 
live queries and historic queries's operational logs

/tmp/hive/

├── operation_logs

└── operation_logs_historic

    └── hs2hostname_startupTimestamp

        ├── session_id_1

        │   ├── hive_query_id_1

        │   ├── hive_query_id_2

        │   └── hive_query_id_3

        ├── session_id_2

        │   ├── hive_query_id_4

        │   └── hive_query_id_5

        ├── session_id_3

        │   └── hive_query_id_6

        └── session_id_4

            ├── hive_query_id_7

            └── hive_query_id_8

 

after this change, the live queries operation logs are under  
and historical ones under 

/tmp/hive

├── operation_logs

│   ├── session_id_1

│   │   ├── hive_query_id_2

│   │   └── hive_query_id_3

│   └── session_id_4

│       └── hive_query_id_8

└── operation_logs_historic

    └── hs2hostname_startupTimestamp

        ├── session_id_1

        │   └── hive_query_id_1

        ├── session_id_2

        │   ├── hive_query_id_4

        │   └── hive_query_id_5

        ├── session_id_3

        │   └── hive_query_id_6

        └── session_id_4

            └── hive_query_id_7

 

 

 


> Separate query live operation log and historical operation log
> --
>
> Key: HIVE-26564
> URL: https://issues.apache.org/jira/browse/HIVE-26564
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Affects Versions: 4.0.0-alpha-2
>Reporter: Yi Zhang
>Assignee: Yi Zhang
>Priority: Minor
>  Labels: pull-request-available
>

[jira] [Commented] (HIVE-26564) Separate query live operation log and historical operation log

2022-09-28 Thread Yi Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-26564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17610790#comment-17610790
 ] 

Yi Zhang commented on HIVE-26564:
-

[~zabetak]  updated the description with example layouts before and after the 
change.

> Separate query live operation log and historical operation log
> --
>
> Key: HIVE-26564
> URL: https://issues.apache.org/jira/browse/HIVE-26564
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Affects Versions: 4.0.0-alpha-2
>Reporter: Yi Zhang
>Assignee: Yi Zhang
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> HIVE-24802 added OperationLogManager to support historical operation logs. 
> OperationLogManager.createOperationLog creates operation log inside 
> historical operation log dir if 
> HIVE_SERVER2_HISTORIC_OPERATION_LOG_ENABLED=true. This is confusing, since on 
> session level, SessionManager and HiveSession are using original operation 
> log session directory.
> Proposed change is to separate live query's operation log and historical 
> operation log. Upon operation close, OperationLogManager.closeOperation is 
> called to move the operation log from session directory to historical log 
> dir. OperationLogManager is only responsible to clean up historical operation 
> logs.
> This change also makes it easier to manage historical logs, for example, user 
> may want to persist historical logs, it is easier to differentiate live and 
> historical operation logs.
>  
> before this change, if HIVE_SERVER2_HISTORIC_OPERATION_LOG_ENABLED=true, the 
> operation logs lay out is as following. in operation_logs_historic has both 
> live queries and historic queries's operational logs
> /tmp/hive/
> ├── operation_logs
> └── operation_logs_historic
>     └── hs2hostname_startupTimestamp
>         ├── session_id_1
>         │   ├── hive_query_id_1
>         │   ├── hive_query_id_2
>         │   └── hive_query_id_3
>         ├── session_id_2
>         │   ├── hive_query_id_4
>         │   └── hive_query_id_5
>         ├── session_id_3
>         │   └── hive_query_id_6
>         └── session_id_4
>             ├── hive_query_id_7
>             └── hive_query_id_8
>  
> after this change, the live queries operation logs are under  
> and historical ones under 
> /tmp/hive
> ├── operation_logs
> │   ├── session_id_1
> │   │   ├── hive_query_id_2
> │   │   └── hive_query_id_3
> │   └── session_id_4
> │       └── hive_query_id_8
> └── operation_logs_historic
>     └── hs2hostname_startupTimestamp
>         ├── session_id_1
>         │   └── hive_query_id_1
>         ├── session_id_2
>         │   ├── hive_query_id_4
>         │   └── hive_query_id_5
>         ├── session_id_3
>         │   └── hive_query_id_6
>         └── session_id_4
>             └── hive_query_id_7
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-26564) Separate query live operation log and historical operation log

2022-09-28 Thread Yi Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yi Zhang updated HIVE-26564:

Description: 
HIVE-24802 added OperationLogManager to support historical operation logs. 

OperationLogManager.createOperationLog creates operation log inside historical 
operation log dir if HIVE_SERVER2_HISTORIC_OPERATION_LOG_ENABLED=true. This is 
confusing, since on session level, SessionManager and HiveSession are using 
original operation log session directory.

Proposed change is to separate live query's operation log and historical 
operation log. Upon operation close, OperationLogManager.closeOperation is 
called to move the operation log from session directory to historical log dir. 
OperationLogManager is only responsible to clean up historical operation logs.

This change also makes it easier to manage historical logs, for example, user 
may want to persist historical logs, it is easier to differentiate live and 
historical operation logs.

 

before this change, if HIVE_SERVER2_HISTORIC_OPERATION_LOG_ENABLED=true, the 
operation logs lay out is as following. in operation_logs_historic has both 
live queries and historic queries's operational logs

/tmp/hive/

├── operation_logs

└── operation_logs_historic

    └── hs2hostname_startupTimestamp

        ├── session_id_1

        │   ├── hive_query_id_1

        │   ├── hive_query_id_2

        │   └── hive_query_id_3

        ├── session_id_2

        │   ├── hive_query_id_4

        │   └── hive_query_id_5

        ├── session_id_3

        │   └── hive_query_id_6

        └── session_id_4

            ├── hive_query_id_7

            └── hive_query_id_8

 

after this change, the live queries operation logs are under  
and historical ones under 

/tmp/hive

├── operation_logs

│   ├── session_id_1

│   │   ├── hive_query_id_2

│   │   └── hive_query_id_3

│   └── session_id_4

│       └── hive_query_id_8

└── operation_logs_historic

    └── hs2hostname_startupTimestamp

        ├── session_id_1

        │   └── hive_query_id_1

        ├── session_id_2

        │   ├── hive_query_id_4

        │   └── hive_query_id_5

        ├── session_id_3

        │   └── hive_query_id_6

        └── session_id_4

            └── hive_query_id_7

 

 

 

  was:
HIVE-24802 added OperationLogManager to support historical operation logs. 

OperationLogManager.createOperationLog creates operation log inside historical 
operation log dir if HIVE_SERVER2_HISTORIC_OPERATION_LOG_ENABLED=true. This is 
confusing, since on session level, SessionManager and HiveSession are using 
original operation log session directory.

Proposed change is to separate live query's operation log and historical 
operation log. Upon operation close, OperationLogManager.closeOperation is 
called to move the operation log from session directory to historical log dir. 
OperationLogManager is only responsible to clean up historical operation logs.

This change also makes it easier to manage historical logs, for example, user 
may want to persist historical logs, it is easier to differentiate live and 
historical operation logs.


> Separate query live operation log and historical operation log
> --
>
> Key: HIVE-26564
> URL: https://issues.apache.org/jira/browse/HIVE-26564
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Affects Versions: 4.0.0-alpha-2
>Reporter: Yi Zhang
>Assignee: Yi Zhang
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> HIVE-24802 added OperationLogManager to support historical operation logs. 
> OperationLogManager.createOperationLog creates operation log inside 
> historical operation log dir if 
> HIVE_SERVER2_HISTORIC_OPERATION_LOG_ENABLED=true. This is confusing, since on 
> session level, SessionManager and HiveSession are using original operation 
> log session directory.
> Proposed change is to separate live query's operation log and historical 
> operation log. Upon operation close, OperationLogManager.closeOperation is 
> called to move the operation log from session directory to historical log 
> dir. OperationLogManager is only responsible to clean up historical operation 
> logs.
> This change also makes it easier to manage historical logs, for example, user 
> may want to persist historical logs, it is easier to differentiate live and 
> historical operation logs.
>  
> before this change, if HIVE_SERVER2_HISTORIC_OPERATION_LOG_ENABLED=true, the 
> operation logs lay out is as following. in operation_logs_historic has both 
> live queries and historic queries's operational logs
> /tmp/hive/
> ├── operation_logs
> └── operation_logs_historic
>     └── hs2hostname_startupTimestamp
>         ├── session_id_1
>        

[jira] [Assigned] (HIVE-26564) Separate query live operation log and historical operation log

2022-09-26 Thread Yi Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yi Zhang reassigned HIVE-26564:
---


> Separate query live operation log and historical operation log
> --
>
> Key: HIVE-26564
> URL: https://issues.apache.org/jira/browse/HIVE-26564
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Affects Versions: 4.0.0-alpha-2
>Reporter: Yi Zhang
>Assignee: Yi Zhang
>Priority: Minor
>
> HIVE-24802 added OperationLogManager to support historical operation logs. 
> OperationLogManager.createOperationLog creates operation log inside 
> historical operation log dir if 
> HIVE_SERVER2_HISTORIC_OPERATION_LOG_ENABLED=true. This is confusing, since on 
> session level, SessionManager and HiveSession are using original operation 
> log session directory.
> Proposed change is to separate live query's operation log and historical 
> operation log. Upon operation close, OperationLogManager.closeOperation is 
> called to move the operation log from session directory to historical log 
> dir. OperationLogManager is only responsible to clean up historical operation 
> logs.
> This change also makes it easier to manage historical logs, for example, user 
> may want to persist historical logs, it is easier to differentiate live and 
> historical operation logs.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HIVE-26478) Explicitly set Content-Type in QueryProfileServlet

2022-08-17 Thread Yi Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yi Zhang reassigned HIVE-26478:
---


> Explicitly set Content-Type in QueryProfileServlet
> --
>
> Key: HIVE-26478
> URL: https://issues.apache.org/jira/browse/HIVE-26478
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Affects Versions: 3.1.3, 4.0.0
>Reporter: Yi Zhang
>Assignee: Yi Zhang
>Priority: Minor
>
> QueryProfileServlet does not set Content-type, though browser may detect it 
> correctly but for some application that checks Content-type, it would be 
> helpful to set it



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-26447) Vectorization: wrong results when filter on repeating map key orc table

2022-08-02 Thread Yi Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yi Zhang updated HIVE-26447:

Summary: Vectorization: wrong results when filter on repeating map key orc 
table  (was: Vectorization: wrong results when filter on repeating map key)

> Vectorization: wrong results when filter on repeating map key orc table
> ---
>
> Key: HIVE-26447
> URL: https://issues.apache.org/jira/browse/HIVE-26447
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 3.1.3, 4.0.0
>Reporter: Yi Zhang
>Assignee: Yi Zhang
>Priority: Major
>
> Example reproducible case:
>  
> set hive.vectorized.execution.enabled=true;
> set hive.fetch.task.conversion=none;
> create temporary table foo (id int, x map) stored as orc;
> insert into foo values(1, map('ABC', 9)), (2, map('ABC', 7)), (3, map('ABC', 
> 8)), (4, map('ABC', 9));
> select id from foo where x['ABC']=9;
> this only gives 1, when correct result should be 1,4
> For every VectorizedRowBatch, only the first row is checked.  
> This seems to be a corner case of ORC table have repeating string type key 
> for map field in the MapColumnVector.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HIVE-26447) Vectorization: wrong results when filter on repeating map key

2022-08-02 Thread Yi Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yi Zhang reassigned HIVE-26447:
---


> Vectorization: wrong results when filter on repeating map key
> -
>
> Key: HIVE-26447
> URL: https://issues.apache.org/jira/browse/HIVE-26447
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 3.1.3, 4.0.0
>Reporter: Yi Zhang
>Assignee: Yi Zhang
>Priority: Major
>
> Example reproducible case:
>  
> set hive.vectorized.execution.enabled=true;
> set hive.fetch.task.conversion=none;
> create temporary table foo (id int, x map) stored as orc;
> insert into foo values(1, map('ABC', 9)), (2, map('ABC', 7)), (3, map('ABC', 
> 8)), (4, map('ABC', 9));
> select id from foo where x['ABC']=9;
> this only gives 1, when correct result should be 1,4
> For every VectorizedRowBatch, only the first row is checked.  
> This seems to be a corner case of ORC table have repeating string type key 
> for map field in the MapColumnVector.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HIVE-16331) create orc table fails in auto merge job when hive.exec.scratchdir set to viewfs path

2017-03-29 Thread Yi Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yi Zhang resolved HIVE-16331.
-
Resolution: Fixed

looks duplicate to HIVE-12522

> create orc table fails in auto merge job when hive.exec.scratchdir set to 
> viewfs path 
> --
>
> Key: HIVE-16331
> URL: https://issues.apache.org/jira/browse/HIVE-16331
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Yi Zhang
>
> if hive.exec.sracthdir set to viewfs path, but fs.defaultFS not in viewfs, 
> when create ORC table in hive/tez, if auto merge job is kicked off, the auto 
> merge job fails with following error:
> ```
> 2017-03-29 23:10:57,892 INFO [main]: org.apache.hadoop.hive.ql.Driver: 
> Launching Job 3 out of 3
> 2017-03-29 23:10:57,894 INFO [main]: org.apache.hadoop.hive.ql.Driver: 
> Starting task [Stage-4:MAPRED] in serial mode
> 2017-03-29 23:10:57,894 INFO [main]: 
> org.apache.hadoop.hive.ql.exec.tez.TezSessionPoolManager: The current user: 
> yizhang, session user: yizhang
> 2017-03-29 23:10:57,894 INFO [main]: 
> org.apache.hadoop.hive.ql.exec.tez.TezSessionPoolManager: Current queue name 
> is hadoop-sync incoming queue name is hadoop-sync
> 2017-03-29 23:10:57,949 INFO [main]: hive.ql.Context: New scratch dir is 
> viewfs://ns-default/tmp/hive_scratchdir/yizhang/5da3d082-33b3-4194-97e2-005549d1b3c4/hive_2017-03-29_23-09-55_489_4030642791346631679-1
> 2017-03-29 23:10:57,949 DEBUG [main]: 
> org.apache.hadoop.hive.ql.exec.tez.DagUtils: TezDir path set 
> viewfs://ns-default/tmp/hive_scratchdir/yizhang/5da3d082-33b3-4194-97e2-005549d1b3c4/hive_2017-03-29_23-09-55_489_4030642791346631679-1/yizhang/_tez_scratch_dir
>  for user: yizhang
> 2017-03-29 23:10:57,950 DEBUG [main]: org.apache.hadoop.hdfs.DFSClient: 
> /tmp/hive_scratchdir/yizhang/5da3d082-33b3-4194-97e2-005549d1b3c4/hive_2017-03-29_23-09-55_489_4030642791346631679-1/yizhang/_tez_scratch_dir:
>  masked=rwxr-xr-x
> 2017-03-29 23:10:57,950 DEBUG [main]: org.apache.hadoop.ipc.Client: The ping 
> interval is 6 ms.
> 2017-03-29 23:10:57,950 DEBUG [main]: org.apache.hadoop.ipc.Client: 
> Connecting to 
> hadooplithiumnamenode01-sjc1.prod.uber.internal/10.67.143.155:8020
> 2017-03-29 23:10:57,951 DEBUG [IPC Client (85121323) connection to 
> hadooplithiumnamenode01-sjc1.prod.uber.internal/10.67.143.155:8020 from 
> yizhang]: org.apache.hadoop.ipc.Client: IPC Client (85121323) connection to 
> hadooplithiumnamenode01-sjc1.prod.uber.internal/10.67.143.155:8020 from 
> yizhang: starting, having connections 3
> 2017-03-29 23:10:57,951 DEBUG [IPC Parameter Sending Thread #0]: 
> org.apache.hadoop.ipc.Client: IPC Client (85121323) connection to 
> hadooplithiumnamenode01-sjc1.prod.uber.internal/10.67.143.155:8020 from 
> yizhang sending #373
> 2017-03-29 23:10:57,954 DEBUG [IPC Client (85121323) connection to 
> hadooplithiumnamenode01-sjc1.prod.uber.internal/10.67.143.155:8020 from 
> yizhang]: org.apache.hadoop.ipc.Client: IPC Client (85121323) connection to 
> hadooplithiumnamenode01-sjc1.prod.uber.internal/10.67.143.155:8020 from 
> yizhang got value #373
> 2017-03-29 23:10:57,955 DEBUG [main]: 
> org.apache.hadoop.ipc.ProtobufRpcEngine: Call: mkdirs took 5ms
> 2017-03-29 23:10:57,955 INFO [main]: org.apache.hadoop.hive.ql.exec.Task: 
> Session is already open
> 2017-03-29 23:10:57,955 DEBUG [IPC Parameter Sending Thread #0]: 
> org.apache.hadoop.ipc.Client: IPC Client (85121323) connection to 
> hadooplithiumnamenode01-sjc1.prod.uber.internal/10.67.143.155:8020 from 
> yizhang sending #374
> 2017-03-29 23:10:57,956 DEBUG [IPC Client (85121323) connection to 
> hadooplithiumnamenode01-sjc1.prod.uber.internal/10.67.143.155:8020 from 
> yizhang]: org.apache.hadoop.ipc.Client: IPC Client (85121323) connection to 
> hadooplithiumnamenode01-sjc1.prod.uber.internal/10.67.143.155:8020 from 
> yizhang got value #374
> 2017-03-29 23:10:57,956 DEBUG [main]: 
> org.apache.hadoop.ipc.ProtobufRpcEngine: Call: getFileInfo took 1ms
> 2017-03-29 23:10:57,956 DEBUG [IPC Parameter Sending Thread #0]: 
> org.apache.hadoop.ipc.Client: IPC Client (85121323) connection to 
> hadooplithiumnamenode01-sjc1.prod.uber.internal/10.67.143.155:8020 from 
> yizhang sending #375
> 2017-03-29 23:10:57,961 DEBUG [IPC Client (85121323) connection to 
> hadooplithiumnamenode01-sjc1.prod.uber.internal/10.67.143.155:8020 from 
> yizhang]: org.apache.hadoop.ipc.Client: IPC Client (85121323) connection to 
> hadooplithiumnamenode01-sjc1.prod.uber.internal/10.67.143.155:8020 from 
> yizhang got value #375
> 2017-03-29 23:10:57,961 DEBUG [main]: 
> org.apache.hadoop.ipc.ProtobufRpcEngine: Call: getFileInfo took 5ms
> 2017-03-29 23:10:57,962 DEBUG [IPC Parameter Sending Thread #0]: 
> org.apache.hadoop.ipc.Client: IPC Client (85121323) connection to 
> 

[jira] [Updated] (HIVE-16331) create orc table fails in auto merge job when hive.exec.scratchdir set to viewfs path

2017-03-29 Thread Yi Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yi Zhang updated HIVE-16331:

Summary: create orc table fails in auto merge job when hive.exec.scratchdir 
set to viewfs path   (was: create orc table fails when hive.exec.scratchdir set 
to viewfs path in auto merge jobs)

> create orc table fails in auto merge job when hive.exec.scratchdir set to 
> viewfs path 
> --
>
> Key: HIVE-16331
> URL: https://issues.apache.org/jira/browse/HIVE-16331
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Yi Zhang
>
> if hive.exec.sracthdir set to viewfs path, but fs.defaultFS not in viewfs, 
> when create ORC table in hive/tez, if auto merge job is kicked off, the auto 
> merge job fails with following error:
> ```
> 2017-03-29 23:10:57,892 INFO [main]: org.apache.hadoop.hive.ql.Driver: 
> Launching Job 3 out of 3
> 2017-03-29 23:10:57,894 INFO [main]: org.apache.hadoop.hive.ql.Driver: 
> Starting task [Stage-4:MAPRED] in serial mode
> 2017-03-29 23:10:57,894 INFO [main]: 
> org.apache.hadoop.hive.ql.exec.tez.TezSessionPoolManager: The current user: 
> yizhang, session user: yizhang
> 2017-03-29 23:10:57,894 INFO [main]: 
> org.apache.hadoop.hive.ql.exec.tez.TezSessionPoolManager: Current queue name 
> is hadoop-sync incoming queue name is hadoop-sync
> 2017-03-29 23:10:57,949 INFO [main]: hive.ql.Context: New scratch dir is 
> viewfs://ns-default/tmp/hive_scratchdir/yizhang/5da3d082-33b3-4194-97e2-005549d1b3c4/hive_2017-03-29_23-09-55_489_4030642791346631679-1
> 2017-03-29 23:10:57,949 DEBUG [main]: 
> org.apache.hadoop.hive.ql.exec.tez.DagUtils: TezDir path set 
> viewfs://ns-default/tmp/hive_scratchdir/yizhang/5da3d082-33b3-4194-97e2-005549d1b3c4/hive_2017-03-29_23-09-55_489_4030642791346631679-1/yizhang/_tez_scratch_dir
>  for user: yizhang
> 2017-03-29 23:10:57,950 DEBUG [main]: org.apache.hadoop.hdfs.DFSClient: 
> /tmp/hive_scratchdir/yizhang/5da3d082-33b3-4194-97e2-005549d1b3c4/hive_2017-03-29_23-09-55_489_4030642791346631679-1/yizhang/_tez_scratch_dir:
>  masked=rwxr-xr-x
> 2017-03-29 23:10:57,950 DEBUG [main]: org.apache.hadoop.ipc.Client: The ping 
> interval is 6 ms.
> 2017-03-29 23:10:57,950 DEBUG [main]: org.apache.hadoop.ipc.Client: 
> Connecting to 
> hadooplithiumnamenode01-sjc1.prod.uber.internal/10.67.143.155:8020
> 2017-03-29 23:10:57,951 DEBUG [IPC Client (85121323) connection to 
> hadooplithiumnamenode01-sjc1.prod.uber.internal/10.67.143.155:8020 from 
> yizhang]: org.apache.hadoop.ipc.Client: IPC Client (85121323) connection to 
> hadooplithiumnamenode01-sjc1.prod.uber.internal/10.67.143.155:8020 from 
> yizhang: starting, having connections 3
> 2017-03-29 23:10:57,951 DEBUG [IPC Parameter Sending Thread #0]: 
> org.apache.hadoop.ipc.Client: IPC Client (85121323) connection to 
> hadooplithiumnamenode01-sjc1.prod.uber.internal/10.67.143.155:8020 from 
> yizhang sending #373
> 2017-03-29 23:10:57,954 DEBUG [IPC Client (85121323) connection to 
> hadooplithiumnamenode01-sjc1.prod.uber.internal/10.67.143.155:8020 from 
> yizhang]: org.apache.hadoop.ipc.Client: IPC Client (85121323) connection to 
> hadooplithiumnamenode01-sjc1.prod.uber.internal/10.67.143.155:8020 from 
> yizhang got value #373
> 2017-03-29 23:10:57,955 DEBUG [main]: 
> org.apache.hadoop.ipc.ProtobufRpcEngine: Call: mkdirs took 5ms
> 2017-03-29 23:10:57,955 INFO [main]: org.apache.hadoop.hive.ql.exec.Task: 
> Session is already open
> 2017-03-29 23:10:57,955 DEBUG [IPC Parameter Sending Thread #0]: 
> org.apache.hadoop.ipc.Client: IPC Client (85121323) connection to 
> hadooplithiumnamenode01-sjc1.prod.uber.internal/10.67.143.155:8020 from 
> yizhang sending #374
> 2017-03-29 23:10:57,956 DEBUG [IPC Client (85121323) connection to 
> hadooplithiumnamenode01-sjc1.prod.uber.internal/10.67.143.155:8020 from 
> yizhang]: org.apache.hadoop.ipc.Client: IPC Client (85121323) connection to 
> hadooplithiumnamenode01-sjc1.prod.uber.internal/10.67.143.155:8020 from 
> yizhang got value #374
> 2017-03-29 23:10:57,956 DEBUG [main]: 
> org.apache.hadoop.ipc.ProtobufRpcEngine: Call: getFileInfo took 1ms
> 2017-03-29 23:10:57,956 DEBUG [IPC Parameter Sending Thread #0]: 
> org.apache.hadoop.ipc.Client: IPC Client (85121323) connection to 
> hadooplithiumnamenode01-sjc1.prod.uber.internal/10.67.143.155:8020 from 
> yizhang sending #375
> 2017-03-29 23:10:57,961 DEBUG [IPC Client (85121323) connection to 
> hadooplithiumnamenode01-sjc1.prod.uber.internal/10.67.143.155:8020 from 
> yizhang]: org.apache.hadoop.ipc.Client: IPC Client (85121323) connection to 
> hadooplithiumnamenode01-sjc1.prod.uber.internal/10.67.143.155:8020 from 
> yizhang got value #375
> 2017-03-29 23:10:57,961 DEBUG [main]: 
> org.apache.hadoop.ipc.ProtobufRpcEngine: Call: getFileInfo took 5ms
> 

[jira] [Updated] (HIVE-16331) create orc table fails in auto merge job when hive.exec.scratchdir set to viewfs path

2017-03-29 Thread Yi Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yi Zhang updated HIVE-16331:

Description: 
if hive.exec.sracthdir set to viewfs path, but fs.defaultFS not in viewfs, when 
create ORC table in hive/tez, if auto merge job is kicked off, the auto merge 
job fails with following error:

```
2017-03-29 23:10:57,892 INFO [main]: org.apache.hadoop.hive.ql.Driver: 
Launching Job 3 out of 3
2017-03-29 23:10:57,894 INFO [main]: org.apache.hadoop.hive.ql.Driver: Starting 
task [Stage-4:MAPRED] in serial mode
2017-03-29 23:10:57,894 INFO [main]: 
org.apache.hadoop.hive.ql.exec.tez.TezSessionPoolManager: The current user: 
yizhang, session user: yizhang
2017-03-29 23:10:57,894 INFO [main]: 
org.apache.hadoop.hive.ql.exec.tez.TezSessionPoolManager: Current queue name is 
hadoop-sync incoming queue name is hadoop-sync
2017-03-29 23:10:57,949 INFO [main]: hive.ql.Context: New scratch dir is 
viewfs://ns-default/tmp/hive_scratchdir/yizhang/5da3d082-33b3-4194-97e2-005549d1b3c4/hive_2017-03-29_23-09-55_489_4030642791346631679-1
2017-03-29 23:10:57,949 DEBUG [main]: 
org.apache.hadoop.hive.ql.exec.tez.DagUtils: TezDir path set 
viewfs://ns-default/tmp/hive_scratchdir/yizhang/5da3d082-33b3-4194-97e2-005549d1b3c4/hive_2017-03-29_23-09-55_489_4030642791346631679-1/yizhang/_tez_scratch_dir
 for user: yizhang
2017-03-29 23:10:57,950 DEBUG [main]: org.apache.hadoop.hdfs.DFSClient: 
/tmp/hive_scratchdir/yizhang/5da3d082-33b3-4194-97e2-005549d1b3c4/hive_2017-03-29_23-09-55_489_4030642791346631679-1/yizhang/_tez_scratch_dir:
 masked=rwxr-xr-x
2017-03-29 23:10:57,950 DEBUG [main]: org.apache.hadoop.ipc.Client: The ping 
interval is 6 ms.
2017-03-29 23:10:57,950 DEBUG [main]: org.apache.hadoop.ipc.Client: Connecting 
to hadooplithiumnamenode01-sjc1.prod.uber.internal/10.67.143.155:8020
2017-03-29 23:10:57,951 DEBUG [IPC Client (85121323) connection to 
hadooplithiumnamenode01-sjc1.prod.uber.internal/10.67.143.155:8020 from 
yizhang]: org.apache.hadoop.ipc.Client: IPC Client (85121323) connection to 
hadooplithiumnamenode01-sjc1.prod.uber.internal/10.67.143.155:8020 from 
yizhang: starting, having connections 3
2017-03-29 23:10:57,951 DEBUG [IPC Parameter Sending Thread #0]: 
org.apache.hadoop.ipc.Client: IPC Client (85121323) connection to 
hadooplithiumnamenode01-sjc1.prod.uber.internal/10.67.143.155:8020 from yizhang 
sending #373
2017-03-29 23:10:57,954 DEBUG [IPC Client (85121323) connection to 
hadooplithiumnamenode01-sjc1.prod.uber.internal/10.67.143.155:8020 from 
yizhang]: org.apache.hadoop.ipc.Client: IPC Client (85121323) connection to 
hadooplithiumnamenode01-sjc1.prod.uber.internal/10.67.143.155:8020 from yizhang 
got value #373
2017-03-29 23:10:57,955 DEBUG [main]: org.apache.hadoop.ipc.ProtobufRpcEngine: 
Call: mkdirs took 5ms
2017-03-29 23:10:57,955 INFO [main]: org.apache.hadoop.hive.ql.exec.Task: 
Session is already open
2017-03-29 23:10:57,955 DEBUG [IPC Parameter Sending Thread #0]: 
org.apache.hadoop.ipc.Client: IPC Client (85121323) connection to 
hadooplithiumnamenode01-sjc1.prod.uber.internal/10.67.143.155:8020 from yizhang 
sending #374
2017-03-29 23:10:57,956 DEBUG [IPC Client (85121323) connection to 
hadooplithiumnamenode01-sjc1.prod.uber.internal/10.67.143.155:8020 from 
yizhang]: org.apache.hadoop.ipc.Client: IPC Client (85121323) connection to 
hadooplithiumnamenode01-sjc1.prod.uber.internal/10.67.143.155:8020 from yizhang 
got value #374
2017-03-29 23:10:57,956 DEBUG [main]: org.apache.hadoop.ipc.ProtobufRpcEngine: 
Call: getFileInfo took 1ms
2017-03-29 23:10:57,956 DEBUG [IPC Parameter Sending Thread #0]: 
org.apache.hadoop.ipc.Client: IPC Client (85121323) connection to 
hadooplithiumnamenode01-sjc1.prod.uber.internal/10.67.143.155:8020 from yizhang 
sending #375
2017-03-29 23:10:57,961 DEBUG [IPC Client (85121323) connection to 
hadooplithiumnamenode01-sjc1.prod.uber.internal/10.67.143.155:8020 from 
yizhang]: org.apache.hadoop.ipc.Client: IPC Client (85121323) connection to 
hadooplithiumnamenode01-sjc1.prod.uber.internal/10.67.143.155:8020 from yizhang 
got value #375
2017-03-29 23:10:57,961 DEBUG [main]: org.apache.hadoop.ipc.ProtobufRpcEngine: 
Call: getFileInfo took 5ms
2017-03-29 23:10:57,962 DEBUG [IPC Parameter Sending Thread #0]: 
org.apache.hadoop.ipc.Client: IPC Client (85121323) connection to 
hadooplithiumnamenode01-sjc1.prod.uber.internal/10.67.143.155:8020 from yizhang 
sending #376
2017-03-29 23:10:57,962 DEBUG [IPC Client (85121323) connection to 
hadooplithiumnamenode01-sjc1.prod.uber.internal/10.67.143.155:8020 from 
yizhang]: org.apache.hadoop.ipc.Client: IPC Client (85121323) connection to 
hadooplithiumnamenode01-sjc1.prod.uber.internal/10.67.143.155:8020 from yizhang 
got value #376
2017-03-29 23:10:57,962 DEBUG [main]: org.apache.hadoop.ipc.ProtobufRpcEngine: 
Call: getFileInfo took 1ms
2017-03-29 23:10:57,963 INFO [main]: 

[jira] [Updated] (HIVE-12605) Implement JDBC Connection.isValid

2015-12-07 Thread Yi Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yi Zhang updated HIVE-12605:

Description: 
http://docs.oracle.com/javase/7/docs/api/java/sql/Connection.html#isValid(int) 
implementation in Hive JDBC driver throws "SQLException("Method not 
supported")".

That is a method often used by connection pooling libraries.




  was:
http://docs.oracle.com/javase/7/docs/api/java/sql/Connection.html#isValid(int) 
implementation in Hive JDBC driver throws "SQLException("Method not 
supported")".

That is a method often used by connection pooling libraries.

Thanks to [~yeeza] for raising this issue.



> Implement JDBC Connection.isValid
> -
>
> Key: HIVE-12605
> URL: https://issues.apache.org/jira/browse/HIVE-12605
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2, JDBC
>Reporter: Thejas M Nair
>  Labels: newbie, trivial
>
> http://docs.oracle.com/javase/7/docs/api/java/sql/Connection.html#isValid(int)
>  implementation in Hive JDBC driver throws "SQLException("Method not 
> supported")".
> That is a method often used by connection pooling libraries.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11091) Unable to load data into hive table using Load data local inapth command from unix named pipe

2015-07-13 Thread Yi Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14625415#comment-14625415
 ] 

Yi Zhang commented on HIVE-11091:
-

This is tracked to HDFS-8767

 Unable to load data into hive table using Load data local inapth command 
 from unix named pipe
 ---

 Key: HIVE-11091
 URL: https://issues.apache.org/jira/browse/HIVE-11091
 Project: Hive
  Issue Type: Bug
  Components: Hive
Affects Versions: 0.14.0
 Environment: Unix,MacOS
Reporter: Manoranjan Sahoo
Priority: Blocker

 Unable to load data into hive table from unix named pipe in Hive 0.14.0 
 Please find below the execution details in env ( Hadoop2.6.0 + Hive 0.14.0):
 
 $ mkfifo /tmp/test.txt
 $ hive
 hive create table test(id bigint,name string);
 OK
 Time taken: 1.018 seconds
 hive LOAD DATA LOCAL INPATH '/tmp/test.txt' OVERWRITE INTO TABLE test;
 Loading data to table default.test
 Failed with exception addFiles: filesystem error in check phase
 FAILED: Execution Error, return code 1 from 
 org.apache.hadoop.hive.ql.exec.MoveTask
 But in Hadoop 1.3 and hive 0.11.0  it works fine:
 hive LOAD DATA LOCAL INPATH '/tmp/test.txt' OVERWRITE INTO TABLE test;
 Copying data from file:/tmp/test.txt
 Copying file: file:/tmp/test.txt



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10142) Calculating formula based on difference between each row's value and current row's in Windowing function

2015-06-18 Thread Yi Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14592377#comment-14592377
 ] 

Yi Zhang commented on HIVE-10142:
-

This request is more in the line as following decay variable definition:

Exponential rate of change can be modeled algebraically by the following 
formula:

N(t)=N(0)e^(−λt)

where N is the quantity, N0 is the initial quantity, λ is the decay constant, 
and t is time.

And the window function will be a summary of the value of all records in the 
window relative to the current record.

 Calculating formula based on difference between each row's value and current 
 row's in Windowing function
 

 Key: HIVE-10142
 URL: https://issues.apache.org/jira/browse/HIVE-10142
 Project: Hive
  Issue Type: New Feature
  Components: PTF-Windowing
Affects Versions: 1.0.0
Reporter: Yi Zhang
Assignee: Aihua Xu

 For analytics with windowing function, the calculation formula sometimes 
 needs to perform over each row's value against current tow's value. The decay 
 value is a good example, such as sums of value with a decay function based on 
 difference of timestamp between each row and current row.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)