[jira] [Updated] (HIVE-20990) ORC case when/if with coalesce wrong results or case: java.lang.AssertionError: Output column number expected to be 0 when isRepeating
[ https://issues.apache.org/jira/browse/HIVE-20990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yi Zhang updated HIVE-20990: Summary: ORC case when/if with coalesce wrong results or case: java.lang.AssertionError: Output column number expected to be 0 when isRepeating (was: ORC, group by, case: java.lang.AssertionError: Output column number expected to be 0 when isRepeating) > ORC case when/if with coalesce wrong results or case: > java.lang.AssertionError: Output column number expected to be 0 when > isRepeating > -- > > Key: HIVE-20990 > URL: https://issues.apache.org/jira/browse/HIVE-20990 > Project: Hive > Issue Type: Bug > Components: Vectorization >Affects Versions: 3.0.0, 3.1.3 > Environment: * Hive 3.0.0 from hdp 3.0.1. > * centos 7 > * 5 datanodes, 2 masters > >Reporter: Guillaume >Priority: Major > Labels: duplicate > > Run this to replicate: > {code:sql} > drop table if exists ds; > create table ds stored as orc as select > inline(array( > struct('gmail.com'), -- when branch of the case statement > struct('apache.org') -- else branch of the case statement > )) as (domain) > ; > select > case > when domain='gmail.com' then 'gmail' > else coalesce(domain, 'other') > end as domaingroup > from ds > group by 1 -- useless (datawise) for this example, but triggers the bug. > ; > {code} > Exceptions out with: > {noformat} > java.lang.RuntimeException: java.lang.AssertionError: Output column number > expected to be 0 when isRepeating > {noformat} > > Full exception is shown below. > Of interest: > * if the case is removed (eg. replaced only by the else clause), the query > works, > * if only the else clause from the case matches, the query works, > * replacing the case by a bunch of nested if does not change anything, > * removing the group by and the query works, > * replacing the table (ds) by a CTE and the query works. > Workaround, at the cost of performance: > {code:java} > set hive.vectorized.execution.enabled = false; > {code} > > Full exception: > {noformat} > ERROR : FAILED: Execution Error, return code 2 from > org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, vertexName=Map 1, > vertexId=vertex_1543326624484_8258_45_00, diagnostics=[Task failed, > taskId=task_1543326624484_8258_45_00_00, diagnostics=[TaskAttempt 0 > failed, info=[Error: Error while running task ( failure ) : > java.lang.RuntimeException: java.lang.AssertionError: Output column number > expected to be 0 when isRepeating > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:101) > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:76) > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:419) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:267) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250) > at > org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37) > at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) > at > com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:108) > at > com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:41) > at > com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:77) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.lang.AssertionError: Output column number expected to be 0 > when isRepeating > at > org.apache.hadoop.hive.ql.exec.vector.BytesColumnVector.setElement(BytesColumnVector.java:492) > at >
[jira] [Updated] (HIVE-20990) ORC, group by, case: java.lang.AssertionError: Output column number expected to be 0 when isRepeating
[ https://issues.apache.org/jira/browse/HIVE-20990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yi Zhang updated HIVE-20990: Affects Version/s: 3.1.3 > ORC, group by, case: java.lang.AssertionError: Output column number expected > to be 0 when isRepeating > - > > Key: HIVE-20990 > URL: https://issues.apache.org/jira/browse/HIVE-20990 > Project: Hive > Issue Type: Bug > Components: Vectorization >Affects Versions: 3.0.0, 3.1.3 > Environment: * Hive 3.0.0 from hdp 3.0.1. > * centos 7 > * 5 datanodes, 2 masters > >Reporter: Guillaume >Priority: Major > Labels: duplicate > > Run this to replicate: > {code:sql} > drop table if exists ds; > create table ds stored as orc as select > inline(array( > struct('gmail.com'), -- when branch of the case statement > struct('apache.org') -- else branch of the case statement > )) as (domain) > ; > select > case > when domain='gmail.com' then 'gmail' > else coalesce(domain, 'other') > end as domaingroup > from ds > group by 1 -- useless (datawise) for this example, but triggers the bug. > ; > {code} > Exceptions out with: > {noformat} > java.lang.RuntimeException: java.lang.AssertionError: Output column number > expected to be 0 when isRepeating > {noformat} > > Full exception is shown below. > Of interest: > * if the case is removed (eg. replaced only by the else clause), the query > works, > * if only the else clause from the case matches, the query works, > * replacing the case by a bunch of nested if does not change anything, > * removing the group by and the query works, > * replacing the table (ds) by a CTE and the query works. > Workaround, at the cost of performance: > {code:java} > set hive.vectorized.execution.enabled = false; > {code} > > Full exception: > {noformat} > ERROR : FAILED: Execution Error, return code 2 from > org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, vertexName=Map 1, > vertexId=vertex_1543326624484_8258_45_00, diagnostics=[Task failed, > taskId=task_1543326624484_8258_45_00_00, diagnostics=[TaskAttempt 0 > failed, info=[Error: Error while running task ( failure ) : > java.lang.RuntimeException: java.lang.AssertionError: Output column number > expected to be 0 when isRepeating > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:101) > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:76) > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:419) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:267) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250) > at > org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37) > at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) > at > com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:108) > at > com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:41) > at > com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:77) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.lang.AssertionError: Output column number expected to be 0 > when isRepeating > at > org.apache.hadoop.hive.ql.exec.vector.BytesColumnVector.setElement(BytesColumnVector.java:492) > at > org.apache.hadoop.hive.ql.exec.vector.expressions.IfExprColumnCondExpr.evaluate(IfExprColumnCondExpr.java:117) > at > org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.process(VectorSelectOperator.java:146) > at > org.apache.hadoop.hive.ql.exec.Operator.vectorForward(Operator.java:965) >
[jira] [Commented] (HIVE-20990) ORC, group by, case: java.lang.AssertionError: Output column number expected to be 0 when isRepeating
[ https://issues.apache.org/jira/browse/HIVE-20990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17814529#comment-17814529 ] Yi Zhang commented on HIVE-20990: - this is fixed by HIVE-26408. this could give wrong results or throw exception, so caution to work around it (if not patching the fix) by hive.vectorized.if.expr.mode=good > ORC, group by, case: java.lang.AssertionError: Output column number expected > to be 0 when isRepeating > - > > Key: HIVE-20990 > URL: https://issues.apache.org/jira/browse/HIVE-20990 > Project: Hive > Issue Type: Bug > Components: Vectorization >Affects Versions: 3.0.0 > Environment: * Hive 3.0.0 from hdp 3.0.1. > * centos 7 > * 5 datanodes, 2 masters > >Reporter: Guillaume >Priority: Major > > Run this to replicate: > {code:sql} > drop table if exists ds; > create table ds stored as orc as select > inline(array( > struct('gmail.com'), -- when branch of the case statement > struct('apache.org') -- else branch of the case statement > )) as (domain) > ; > select > case > when domain='gmail.com' then 'gmail' > else coalesce(domain, 'other') > end as domaingroup > from ds > group by 1 -- useless (datawise) for this example, but triggers the bug. > ; > {code} > Exceptions out with: > {noformat} > java.lang.RuntimeException: java.lang.AssertionError: Output column number > expected to be 0 when isRepeating > {noformat} > > Full exception is shown below. > Of interest: > * if the case is removed (eg. replaced only by the else clause), the query > works, > * if only the else clause from the case matches, the query works, > * replacing the case by a bunch of nested if does not change anything, > * removing the group by and the query works, > * replacing the table (ds) by a CTE and the query works. > Workaround, at the cost of performance: > {code:java} > set hive.vectorized.execution.enabled = false; > {code} > > Full exception: > {noformat} > ERROR : FAILED: Execution Error, return code 2 from > org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, vertexName=Map 1, > vertexId=vertex_1543326624484_8258_45_00, diagnostics=[Task failed, > taskId=task_1543326624484_8258_45_00_00, diagnostics=[TaskAttempt 0 > failed, info=[Error: Error while running task ( failure ) : > java.lang.RuntimeException: java.lang.AssertionError: Output column number > expected to be 0 when isRepeating > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:101) > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:76) > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:419) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:267) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250) > at > org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37) > at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) > at > com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:108) > at > com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:41) > at > com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:77) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.lang.AssertionError: Output column number expected to be 0 > when isRepeating > at > org.apache.hadoop.hive.ql.exec.vector.BytesColumnVector.setElement(BytesColumnVector.java:492) > at > org.apache.hadoop.hive.ql.exec.vector.expressions.IfExprColumnCondExpr.evaluate(IfExprColumnCondExpr.java:117) > at >
[jira] [Updated] (HIVE-20990) ORC, group by, case: java.lang.AssertionError: Output column number expected to be 0 when isRepeating
[ https://issues.apache.org/jira/browse/HIVE-20990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yi Zhang updated HIVE-20990: Labels: duplicate (was: ) > ORC, group by, case: java.lang.AssertionError: Output column number expected > to be 0 when isRepeating > - > > Key: HIVE-20990 > URL: https://issues.apache.org/jira/browse/HIVE-20990 > Project: Hive > Issue Type: Bug > Components: Vectorization >Affects Versions: 3.0.0 > Environment: * Hive 3.0.0 from hdp 3.0.1. > * centos 7 > * 5 datanodes, 2 masters > >Reporter: Guillaume >Priority: Major > Labels: duplicate > > Run this to replicate: > {code:sql} > drop table if exists ds; > create table ds stored as orc as select > inline(array( > struct('gmail.com'), -- when branch of the case statement > struct('apache.org') -- else branch of the case statement > )) as (domain) > ; > select > case > when domain='gmail.com' then 'gmail' > else coalesce(domain, 'other') > end as domaingroup > from ds > group by 1 -- useless (datawise) for this example, but triggers the bug. > ; > {code} > Exceptions out with: > {noformat} > java.lang.RuntimeException: java.lang.AssertionError: Output column number > expected to be 0 when isRepeating > {noformat} > > Full exception is shown below. > Of interest: > * if the case is removed (eg. replaced only by the else clause), the query > works, > * if only the else clause from the case matches, the query works, > * replacing the case by a bunch of nested if does not change anything, > * removing the group by and the query works, > * replacing the table (ds) by a CTE and the query works. > Workaround, at the cost of performance: > {code:java} > set hive.vectorized.execution.enabled = false; > {code} > > Full exception: > {noformat} > ERROR : FAILED: Execution Error, return code 2 from > org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, vertexName=Map 1, > vertexId=vertex_1543326624484_8258_45_00, diagnostics=[Task failed, > taskId=task_1543326624484_8258_45_00_00, diagnostics=[TaskAttempt 0 > failed, info=[Error: Error while running task ( failure ) : > java.lang.RuntimeException: java.lang.AssertionError: Output column number > expected to be 0 when isRepeating > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:101) > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:76) > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:419) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:267) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250) > at > org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37) > at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) > at > com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:108) > at > com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:41) > at > com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:77) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.lang.AssertionError: Output column number expected to be 0 > when isRepeating > at > org.apache.hadoop.hive.ql.exec.vector.BytesColumnVector.setElement(BytesColumnVector.java:492) > at > org.apache.hadoop.hive.ql.exec.vector.expressions.IfExprColumnCondExpr.evaluate(IfExprColumnCondExpr.java:117) > at > org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.process(VectorSelectOperator.java:146) > at > org.apache.hadoop.hive.ql.exec.Operator.vectorForward(Operator.java:965) > at
[jira] [Updated] (HIVE-26408) Vectorization: Fix deallocation of scratch columns, don't reuse a child ConstantVectorExpression as an output
[ https://issues.apache.org/jira/browse/HIVE-26408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yi Zhang updated HIVE-26408: Labels: hive-3.2.0-candidate pull-request-available (was: pull-request-available) > Vectorization: Fix deallocation of scratch columns, don't reuse a child > ConstantVectorExpression as an output > - > > Key: HIVE-26408 > URL: https://issues.apache.org/jira/browse/HIVE-26408 > Project: Hive > Issue Type: Bug >Reporter: László Bodor >Assignee: László Bodor >Priority: Major > Labels: hive-3.2.0-candidate, pull-request-available > Fix For: 4.0.0-alpha-2 > > Time Spent: 20m > Remaining Estimate: 0h > > This is similar to HIVE-15588. With a customer query, I reproduced a > vectorized expression tree like the below one (I'll attach a simple repro > query when it's possible): > {code} > selectExpressions: IfExprCondExprColumn(col 67:boolean, col 63:string, col > 61:string)(children: StringColumnInList(col 13, values TermDeposit, > RecurringDeposit, CertificateOfDeposit) -> 67:boolean, VectorCoalesce(columns > [61, 62])(children: VectorUDFAdaptor(from_unixtime(to_unix_timestamp(CAST( > _col1 AS DATE)), 'MM-dd-'))(children: VectorUDFUnixTimeStampDate(col > 68)(children: CastStringToDate(col 33:string) -> 68:date) -> 69:bigint) -> > 61:string, ConstantVectorExpression(val ) -> 62:string) -> 63:string, > ConstantVectorExpression(val ) -> 61:string) -> 62:string > {code} > query part was: > {code} > CASE WHEN DLY_BAL.PDELP_VALUE in ( > 'TermDeposit', 'RecurringDeposit', > 'CertificateOfDeposit' > ) THEN NVL( > ( > from_unixtime( > unix_timestamp( > cast(DLY_BAL.APATD_MTRTY_DATE as date) > ), > 'MM-dd-' > ) > ), > ' ' > ) ELSE '' END AS MAT_DTE > {code} > Here is the problem described: > 1. IfExprCondExprColumn has 62:string as its > [outputColumn|https://github.com/apache/hive/blob/d3309c0ea9da907af4d27427805084b7331a6c24/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/IfExprCondExprColumn.java#L64], > which is a reused scratch column (see 5) ) > 2. in evaluation time, [isRepeating is > reset|https://github.com/apache/hive/blob/d3309c0ea9da907af4d27427805084b7331a6c24/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/IfExprCondExprColumn.java#L68] > 3. in order to evaluate IfExprCondExprColumn, the conditional evaluation of > children is required, so we go to > [conditionalEvaluate|https://github.com/apache/hive/blob/d3309c0ea9da907af4d27427805084b7331a6c24/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/IfExprCondExprColumn.java#L95] > 4. one of the children is ConstantVectorExpression(val ) -> 62:string, which > belongs to the second branch of VectorCoalesce, so to the '' empty string in > NVL's second argument > 5. in 4) 62: string column is set to an isRepeating column (and it's released > by > [freeNonColumns|https://github.com/apache/hive/blob/d3309c0ea9da907af4d27427805084b7331a6c24/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorizationContext.java#L2459]), > so it's marked as a reusable scratch column > 6. after the conditional evaluation in 3), the final output of > IfExprCondExprColumn set > [here|https://github.com/apache/hive/blob/d3309c0ea9da907af4d27427805084b7331a6c24/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/IfExprCondExprColumn.java#L99], > but here we get an exception > [here|https://github.com/apache/hive/blob/d3309c0ea9da907af4d27427805084b7331a6c24/storage-api/src/java/org/apache/hadoop/hive/ql/exec/vector/BytesColumnVector.java#L484]: > {code} > 2022-07-01T04:26:24,567 ERROR [TezTR-745267_1_35_6_0_0] tez.MapRecordSource: > java.lang.AssertionError: Output column number expected to be 0 when > isRepeating > at > org.apache.hadoop.hive.ql.exec.vector.BytesColumnVector.setElement(BytesColumnVector.java:494) > at > org.apache.hadoop.hive.ql.exec.vector.expressions.IfExprCondExprColumn.evaluate(IfExprCondExprColumn.java:108) > at > org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.process(VectorSelectOperator.java:146) > at > org.apache.hadoop.hive.ql.exec.Operator.vectorForward(Operator.java:969) > at > org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinGenerateResultOperator.forwardBigTableBatch(VectorMapJoinGenerateResultOperator.java:694) > at > org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinInnerBigOnlyStringOperator.processBatch(VectorMapJoinInnerBigOnlyStringOperator.java:371) > at > org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinCommonOperator.process(VectorMapJoinCommonOperator.java:839) > at >
[jira] [Updated] (HIVE-27951) hcatalog dynamic partitioning fails with partition already exist error when exist parent partitions path
[ https://issues.apache.org/jira/browse/HIVE-27951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yi Zhang updated HIVE-27951: Attachment: (was: HIVE-27951.patch) > hcatalog dynamic partitioning fails with partition already exist error when > exist parent partitions path > > > Key: HIVE-27951 > URL: https://issues.apache.org/jira/browse/HIVE-27951 > Project: Hive > Issue Type: Bug > Components: HCatalog >Affects Versions: 4.0.0-beta-1 >Reporter: Yi Zhang >Assignee: Yi Zhang >Priority: Critical > Labels: pull-request-available > > if a table have multiple partitions (part1=x1, part2=y1), when insert into a > new partition(part1=x1, part2=y2) hcatalog FileOutputCommitterContainer > throws path already exists error > > reproduce: > create table source(id int, part1 string, part2 string); > create table target(id int) partitioned by (part1 string, part2 string) > insert into table source values (1, "x1", "y1"), (2, "x1", "y2"); > > pig -useHcatalog > A = load 'source' using org.apache.hive.hcatalog.pig.HCatLoader(); > B = filter A by (part2 == 'y1'); > // following succeeds > store B into 'target' USING org.apache.hive.hcatalog.pig.HCatStorer(); > //following fails with duplicate publishing error > C = filter A by (part2 == 'y2'); > store C into 'target' USING org.apache.hive.hcatalog.pig.HCatStorer(); > > > ``` > Partition already present with given partition key values : Data already > exists in /user/hive/warehouse/target_data/part1=x1, duplicate publish not > possible. > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputCommitter.commitJob(PigOutputCommitter.java:243) > at > org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler$EventProcessor.handleJobCommit(CommitterEventHandler.java:286) > > Caused by: org.apache.hive.hcatalog.common.HCatException : 2002 : Partition > already present with given partition key values : Data already exists in > /user/hive/warehouse/target_data/part1=x1, duplicate publish not possible. > at > org.apache.hive.hcatalog.mapreduce.FileOutputCommitterContainer.moveTaskOutputs(FileOutputCommitterContainer.java:564) > at > org.apache.hive.hcatalog.mapreduce.FileOutputCommitterContainer.registerPartitions(FileOutputCommitterContainer.java:949) > at > org.apache.hive.hcatalog.mapreduce.FileOutputCommitterContainer.commitJob(FileOutputCommitterContainer.java:273) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputCommitter.commitJob(PigOutputCommitter.java:241) > ``` -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-27951) hcatalog dynamic partitioning fails with partition already exist error when exist parent partitions path
[ https://issues.apache.org/jira/browse/HIVE-27951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yi Zhang updated HIVE-27951: Attachment: HIVE-27951.patch > hcatalog dynamic partitioning fails with partition already exist error when > exist parent partitions path > > > Key: HIVE-27951 > URL: https://issues.apache.org/jira/browse/HIVE-27951 > Project: Hive > Issue Type: Bug > Components: HCatalog >Affects Versions: 4.0.0-beta-1 >Reporter: Yi Zhang >Assignee: Yi Zhang >Priority: Critical > Attachments: HIVE-27951.patch > > > if a table have multiple partitions (part1=x1, part2=y1), when insert into a > new partition(part1=x1, part2=y2) hcatalog FileOutputCommitterContainer > throws path already exists error > > reproduce: > create table source(id int, part1 string, part2 string); > create table target(id int) partitioned by (part1 string, part2 string) > insert into table source values (1, "x1", "y1"), (2, "x1", "y2"); > > pig -useHcatalog > A = load 'source' using org.apache.hive.hcatalog.pig.HCatLoader(); > B = filter A by (part2 == 'y1'); > // following succeeds > store B into 'target' USING org.apache.hive.hcatalog.pig.HCatStorer(); > //following fails with duplicate publishing error > C = filter A by (part2 == 'y2'); > store C into 'target' USING org.apache.hive.hcatalog.pig.HCatStorer(); > > > ``` > Partition already present with given partition key values : Data already > exists in /user/hive/warehouse/target_data/part1=x1, duplicate publish not > possible. > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputCommitter.commitJob(PigOutputCommitter.java:243) > at > org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler$EventProcessor.handleJobCommit(CommitterEventHandler.java:286) > > Caused by: org.apache.hive.hcatalog.common.HCatException : 2002 : Partition > already present with given partition key values : Data already exists in > /user/hive/warehouse/target_data/part1=x1, duplicate publish not possible. > at > org.apache.hive.hcatalog.mapreduce.FileOutputCommitterContainer.moveTaskOutputs(FileOutputCommitterContainer.java:564) > at > org.apache.hive.hcatalog.mapreduce.FileOutputCommitterContainer.registerPartitions(FileOutputCommitterContainer.java:949) > at > org.apache.hive.hcatalog.mapreduce.FileOutputCommitterContainer.commitJob(FileOutputCommitterContainer.java:273) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputCommitter.commitJob(PigOutputCommitter.java:241) > ``` -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-27951) hcatalog dynamic partitioning fails with partition already exist error when exist parent partitions path
Yi Zhang created HIVE-27951: --- Summary: hcatalog dynamic partitioning fails with partition already exist error when exist parent partitions path Key: HIVE-27951 URL: https://issues.apache.org/jira/browse/HIVE-27951 Project: Hive Issue Type: Bug Components: HCatalog Affects Versions: 4.0.0-beta-1 Reporter: Yi Zhang if a table have multiple partitions (part1=x1, part2=y1), when insert into a new partition(part1=x1, part2=y2) hcatalog FileOutputCommitterContainer throws path already exists error reproduce: create table source(id int, part1 string, part2 string); create table target(id int) partitioned by (part1 string, part2 string) insert into table source values (1, "x1", "y1"), (2, "x1", "y2"); pig -useHcatalog A = load 'source' using org.apache.hive.hcatalog.pig.HCatLoader(); B = filter A by (part2 == 'y1'); // following succeeds store B into 'target' USING org.apache.hive.hcatalog.pig.HCatStorer(); //following fails with duplicate publishing error C = filter A by (part2 == 'y2'); store C into 'target' USING org.apache.hive.hcatalog.pig.HCatStorer(); ``` Partition already present with given partition key values : Data already exists in /user/hive/warehouse/target_data/part1=x1, duplicate publish not possible. at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputCommitter.commitJob(PigOutputCommitter.java:243) at org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler$EventProcessor.handleJobCommit(CommitterEventHandler.java:286) Caused by: org.apache.hive.hcatalog.common.HCatException : 2002 : Partition already present with given partition key values : Data already exists in /user/hive/warehouse/target_data/part1=x1, duplicate publish not possible. at org.apache.hive.hcatalog.mapreduce.FileOutputCommitterContainer.moveTaskOutputs(FileOutputCommitterContainer.java:564) at org.apache.hive.hcatalog.mapreduce.FileOutputCommitterContainer.registerPartitions(FileOutputCommitterContainer.java:949) at org.apache.hive.hcatalog.mapreduce.FileOutputCommitterContainer.commitJob(FileOutputCommitterContainer.java:273) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputCommitter.commitJob(PigOutputCommitter.java:241) ``` -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (HIVE-27951) hcatalog dynamic partitioning fails with partition already exist error when exist parent partitions path
[ https://issues.apache.org/jira/browse/HIVE-27951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yi Zhang reassigned HIVE-27951: --- Assignee: Yi Zhang > hcatalog dynamic partitioning fails with partition already exist error when > exist parent partitions path > > > Key: HIVE-27951 > URL: https://issues.apache.org/jira/browse/HIVE-27951 > Project: Hive > Issue Type: Bug > Components: HCatalog >Affects Versions: 4.0.0-beta-1 >Reporter: Yi Zhang >Assignee: Yi Zhang >Priority: Critical > > if a table have multiple partitions (part1=x1, part2=y1), when insert into a > new partition(part1=x1, part2=y2) hcatalog FileOutputCommitterContainer > throws path already exists error > > reproduce: > create table source(id int, part1 string, part2 string); > create table target(id int) partitioned by (part1 string, part2 string) > insert into table source values (1, "x1", "y1"), (2, "x1", "y2"); > > pig -useHcatalog > A = load 'source' using org.apache.hive.hcatalog.pig.HCatLoader(); > B = filter A by (part2 == 'y1'); > // following succeeds > store B into 'target' USING org.apache.hive.hcatalog.pig.HCatStorer(); > //following fails with duplicate publishing error > C = filter A by (part2 == 'y2'); > store C into 'target' USING org.apache.hive.hcatalog.pig.HCatStorer(); > > > ``` > Partition already present with given partition key values : Data already > exists in /user/hive/warehouse/target_data/part1=x1, duplicate publish not > possible. > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputCommitter.commitJob(PigOutputCommitter.java:243) > at > org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler$EventProcessor.handleJobCommit(CommitterEventHandler.java:286) > > Caused by: org.apache.hive.hcatalog.common.HCatException : 2002 : Partition > already present with given partition key values : Data already exists in > /user/hive/warehouse/target_data/part1=x1, duplicate publish not possible. > at > org.apache.hive.hcatalog.mapreduce.FileOutputCommitterContainer.moveTaskOutputs(FileOutputCommitterContainer.java:564) > at > org.apache.hive.hcatalog.mapreduce.FileOutputCommitterContainer.registerPartitions(FileOutputCommitterContainer.java:949) > at > org.apache.hive.hcatalog.mapreduce.FileOutputCommitterContainer.commitJob(FileOutputCommitterContainer.java:273) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputCommitter.commitJob(PigOutputCommitter.java:241) > ``` -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-27600) Reduce filesystem calls in OrcFileMergeOperator
[ https://issues.apache.org/jira/browse/HIVE-27600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yi Zhang updated HIVE-27600: Description: avoid unnecessary creating orc readers in the OrcFileMergeOperator > Reduce filesystem calls in OrcFileMergeOperator > --- > > Key: HIVE-27600 > URL: https://issues.apache.org/jira/browse/HIVE-27600 > Project: Hive > Issue Type: Improvement > Components: Hive >Affects Versions: 4.0.0-alpha-2 >Reporter: Yi Zhang >Assignee: Yi Zhang >Priority: Minor > Labels: pull-request-available > > avoid unnecessary creating orc readers in the OrcFileMergeOperator -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (HIVE-27600) Reduce filesystem calls in OrcFileMergeOperator
[ https://issues.apache.org/jira/browse/HIVE-27600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yi Zhang reassigned HIVE-27600: --- Assignee: Yi Zhang > Reduce filesystem calls in OrcFileMergeOperator > --- > > Key: HIVE-27600 > URL: https://issues.apache.org/jira/browse/HIVE-27600 > Project: Hive > Issue Type: Improvement > Components: Hive >Affects Versions: 4.0.0-alpha-2 >Reporter: Yi Zhang >Assignee: Yi Zhang >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-27600) Reduce filesystem calls in OrcFileMergeOperator
[ https://issues.apache.org/jira/browse/HIVE-27600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yi Zhang updated HIVE-27600: Affects Version/s: 4.0.0-alpha-2 > Reduce filesystem calls in OrcFileMergeOperator > --- > > Key: HIVE-27600 > URL: https://issues.apache.org/jira/browse/HIVE-27600 > Project: Hive > Issue Type: Improvement > Components: Hive >Affects Versions: 4.0.0-alpha-2 >Reporter: Yi Zhang >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-27600) Reduce filesystem calls in OrcFileMergeOperator
[ https://issues.apache.org/jira/browse/HIVE-27600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yi Zhang updated HIVE-27600: Summary: Reduce filesystem calls in OrcFileMergeOperator (was: Reduce filesystem calls in OrcFileMerge) > Reduce filesystem calls in OrcFileMergeOperator > --- > > Key: HIVE-27600 > URL: https://issues.apache.org/jira/browse/HIVE-27600 > Project: Hive > Issue Type: Improvement > Components: Hive >Reporter: Yi Zhang >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-27600) Reduce filesystem calls in OrcFileMerge
Yi Zhang created HIVE-27600: --- Summary: Reduce filesystem calls in OrcFileMerge Key: HIVE-27600 URL: https://issues.apache.org/jira/browse/HIVE-27600 Project: Hive Issue Type: Improvement Components: Hive Reporter: Yi Zhang -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-27218) Hive-3 switch hive.materializedview.rewriting default to false
[ https://issues.apache.org/jira/browse/HIVE-27218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yi Zhang updated HIVE-27218: Summary: Hive-3 switch hive.materializedview.rewriting default to false (was: Hive-3 set hive.materializedview.rewriting default to false) > Hive-3 switch hive.materializedview.rewriting default to false > -- > > Key: HIVE-27218 > URL: https://issues.apache.org/jira/browse/HIVE-27218 > Project: Hive > Issue Type: Improvement > Components: Hive >Affects Versions: 3.1.0, 3.1.2, 3.1.3 >Reporter: Yi Zhang >Priority: Major > > https://issues.apache.org/jira/browse/HIVE-19973 switched > hive.materializedview.rewriting default from false to true, however user > observed high latency at query compilation as they have large amount of > databases (5k), each call to a remote metastore DB adds up the latency to > minutes, each query compilation time is high. > as hive-4 have improvements in HIVE-21631 and HIVE-21344 backport is > unlikely, suggest to turn this false by default in hive-3. > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (HIVE-27200) Backport HIVE-24928 to branch-3
[ https://issues.apache.org/jira/browse/HIVE-27200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yi Zhang reassigned HIVE-27200: --- Assignee: Yi Zhang > Backport HIVE-24928 to branch-3 > --- > > Key: HIVE-27200 > URL: https://issues.apache.org/jira/browse/HIVE-27200 > Project: Hive > Issue Type: Improvement > Components: StorageHandler >Reporter: Yi Zhang >Assignee: Yi Zhang >Priority: Critical > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > This is to backport HIVE-24928 so that for HiveStorageHandler table 'ANALYZE > TABLE ... COMPUTE STATISTICS' can use storagehandler to provide basic stats > with BasicStatsNoJobTask -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-27143) Optimize HCatStorer move task
[ https://issues.apache.org/jira/browse/HIVE-27143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yi Zhang updated HIVE-27143: Summary: Optimize HCatStorer move task (was: Improve HCatStorer move task) > Optimize HCatStorer move task > - > > Key: HIVE-27143 > URL: https://issues.apache.org/jira/browse/HIVE-27143 > Project: Hive > Issue Type: Improvement > Components: HCatalog >Affects Versions: 3.1.3 >Reporter: Yi Zhang >Assignee: Yi Zhang >Priority: Major > > moveTask in hcatalog is inefficient, it does 2 iterations dryRun and > execution, and is sequential. This can be improved. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-27200) Backport HIVE-24928 to branch-3
[ https://issues.apache.org/jira/browse/HIVE-27200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yi Zhang updated HIVE-27200: Summary: Backport HIVE-24928 to branch-3 (was: Backport HIVE-24928 In case of non-native tables use basic statistics from HiveStorageHandler) > Backport HIVE-24928 to branch-3 > --- > > Key: HIVE-27200 > URL: https://issues.apache.org/jira/browse/HIVE-27200 > Project: Hive > Issue Type: Improvement > Components: StorageHandler >Reporter: Yi Zhang >Priority: Critical > > This is to backport HIVE-24928 so that for HiveStorageHandler table 'ANALYZE > TABLE ... COMPUTE STATISTICS' can use storagehandler to provide basic stats > with BasicStatsNoJobTask -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-27149) StorageHandler PPD query planning statistics not adjusted for pushedPredicate
[ https://issues.apache.org/jira/browse/HIVE-27149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yi Zhang updated HIVE-27149: Summary: StorageHandler PPD query planning statistics not adjusted for pushedPredicate (was: StorageHandler PPD query planning statistics not adjusted for residualPredicate) > StorageHandler PPD query planning statistics not adjusted for pushedPredicate > - > > Key: HIVE-27149 > URL: https://issues.apache.org/jira/browse/HIVE-27149 > Project: Hive > Issue Type: Bug > Components: StorageHandler >Affects Versions: 4.0.0-alpha-2 >Reporter: Yi Zhang >Priority: Minor > > In StorageHandler PPD, filter predicates can be pushed down to storage and > trimmed to a subset residualPredicate, however, at query planning statistics > based on filters only consider the 'final' residual predicates, when in fact > there are pushedPredicates that should also be considered, this affect > reducer parallelism (more reducers than needed) -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (HIVE-27143) Improve HCatStorer move task
[ https://issues.apache.org/jira/browse/HIVE-27143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yi Zhang reassigned HIVE-27143: --- Assignee: Yi Zhang > Improve HCatStorer move task > > > Key: HIVE-27143 > URL: https://issues.apache.org/jira/browse/HIVE-27143 > Project: Hive > Issue Type: Improvement > Components: HCatalog >Affects Versions: 3.1.3 >Reporter: Yi Zhang >Assignee: Yi Zhang >Priority: Major > > moveTask in hcatalog is inefficient, it does 2 iterations dryRun and > execution, and is sequential. This can be improved. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-27115) HiveInputFormat column project push down wrong fields (MR)
[ https://issues.apache.org/jira/browse/HIVE-27115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17697692#comment-17697692 ] Yi Zhang commented on HIVE-27115: - I see HIVE-25673 fixed a similar issue. [~pvary] [~Marton Bod] is this the same issue as HIVE-25673? > HiveInputFormat column project push down wrong fields (MR) > -- > > Key: HIVE-27115 > URL: https://issues.apache.org/jira/browse/HIVE-27115 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 3.1.3, 4.0.0-alpha-2 >Reporter: Yi Zhang >Assignee: Yi Zhang >Priority: Major > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > For query such as > select * from ( > select r_name from r > union all > select t_name from t > ) unioned > > in MR execution, when column project push down for splits, t_name gets pushed > down to table r. > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-27115) HiveInputFormat column project push down wrong fields (MR)
[ https://issues.apache.org/jira/browse/HIVE-27115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17697577#comment-17697577 ] Yi Zhang commented on HIVE-27115: - [~Marton Bod] wonder if hive-iceberg encountered this issue in MR mode and how it is handled? since HiveIceBergInputFormat uses jobConf ColumnProjectionUtils.READ_COLUMN_NAMES_CONF_STR for getSplits > HiveInputFormat column project push down wrong fields (MR) > -- > > Key: HIVE-27115 > URL: https://issues.apache.org/jira/browse/HIVE-27115 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 3.1.3, 4.0.0-alpha-2 >Reporter: Yi Zhang >Assignee: Yi Zhang >Priority: Major > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > For query such as > select * from ( > select r_name from r > union all > select t_name from t > ) unioned > > in MR execution, when column project push down for splits, t_name gets pushed > down to table r. > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-27115) HiveInputFormat column project push down wrong fields (MR)
[ https://issues.apache.org/jira/browse/HIVE-27115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17697152#comment-17697152 ] Yi Zhang commented on HIVE-27115: - [~rajesh.balamohan] Is there scenario where in Tez mode multiple TableScanOperators passed to HiveInputFormat? I assume in Tez each TS has its own input initialized so it doesn't run into this issue. Understand that MR is deprecated, however the logic of HiveInputFormat itself has this bug when combines multiple TS. In some use cases of storagehandlers that was added in MR mode have some issues when run in Tez mode. > HiveInputFormat column project push down wrong fields (MR) > -- > > Key: HIVE-27115 > URL: https://issues.apache.org/jira/browse/HIVE-27115 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 3.1.3, 4.0.0-alpha-2 >Reporter: Yi Zhang >Assignee: Yi Zhang >Priority: Major > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > For query such as > select * from ( > select r_name from r > union all > select t_name from t > ) unioned > > in MR execution, when column project push down for splits, t_name gets pushed > down to table r. > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-27115) HiveInputFormat column project push down wrong fields (MR)
[ https://issues.apache.org/jira/browse/HIVE-27115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17696327#comment-17696327 ] Yi Zhang commented on HIVE-27115: - [~rajesh.balamohan] can you review the pr? thank you! > HiveInputFormat column project push down wrong fields (MR) > -- > > Key: HIVE-27115 > URL: https://issues.apache.org/jira/browse/HIVE-27115 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 3.1.3, 4.0.0-alpha-2 >Reporter: Yi Zhang >Assignee: Yi Zhang >Priority: Major > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > For query such as > select * from ( > select r_name from r > union all > select t_name from t > ) unioned > > in MR execution, when column project push down for splits, t_name gets pushed > down to table r. > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (HIVE-27115) HiveInputFormat column project push down wrong fields (MR)
[ https://issues.apache.org/jira/browse/HIVE-27115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yi Zhang reassigned HIVE-27115: --- Assignee: Yi Zhang > HiveInputFormat column project push down wrong fields (MR) > -- > > Key: HIVE-27115 > URL: https://issues.apache.org/jira/browse/HIVE-27115 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 3.1.3, 4.0.0-alpha-2 >Reporter: Yi Zhang >Assignee: Yi Zhang >Priority: Major > > For query such as > select * from ( > select r_name from r > union all > select t_name from t > ) unioned > > in MR execution, when column project push down for splits, t_name gets pushed > down to table r. > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-27115) HiveInputFormat column project push down wrong fields (MR)
[ https://issues.apache.org/jira/browse/HIVE-27115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yi Zhang updated HIVE-27115: Description: For query such as select * from ( select r_name from r union all select t_name from t ) unioned in MR execution, when column project push down for splits, t_name gets pushed down to table r. was: For query such as select * from ( select r_name from r union all select t_name from t ) unioned in MR execution, when column project push down for splits on table r, it is t_name. > HiveInputFormat column project push down wrong fields (MR) > -- > > Key: HIVE-27115 > URL: https://issues.apache.org/jira/browse/HIVE-27115 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 3.1.3, 4.0.0-alpha-2 >Reporter: Yi Zhang >Priority: Major > > For query such as > select * from ( > select r_name from r > union all > select t_name from t > ) unioned > > in MR execution, when column project push down for splits, t_name gets pushed > down to table r. > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (HIVE-27017) option to use createTable DDLTask in CTAS for StorgeHandler
[ https://issues.apache.org/jira/browse/HIVE-27017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yi Zhang reassigned HIVE-27017: --- Assignee: Yi Zhang > option to use createTable DDLTask in CTAS for StorgeHandler > --- > > Key: HIVE-27017 > URL: https://issues.apache.org/jira/browse/HIVE-27017 > Project: Hive > Issue Type: Improvement > Components: StorageHandler >Affects Versions: 3.1.3 >Reporter: Yi Zhang >Assignee: Yi Zhang >Priority: Major > > This is to add a directInsert option for StorageHandler and advance DDLTask > for CTAS when it is storagehandler directInsert mode. This is partial > backport of HIVE-26771 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (HIVE-27016) Invoke optional output committer in TezProcessor
[ https://issues.apache.org/jira/browse/HIVE-27016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yi Zhang reassigned HIVE-27016: --- Assignee: Yi Zhang > Invoke optional output committer in TezProcessor > > > Key: HIVE-27016 > URL: https://issues.apache.org/jira/browse/HIVE-27016 > Project: Hive > Issue Type: Improvement > Components: Query Processor, StorageHandler >Affects Versions: 3.1.3 >Reporter: Yi Zhang >Assignee: Yi Zhang >Priority: Major > > This is backport HIVE-24629 and HIVE-24867, so StorageHandler with their own > OutputCommitter run in tez. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-26815) Backport HIVE-26758 (Allow use scratchdir for staging final job)
[ https://issues.apache.org/jira/browse/HIVE-26815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17648214#comment-17648214 ] Yi Zhang commented on HIVE-26815: - [~sruthim-official] the setting is used with blob storage settings hive.use.scratchdir.for.staging=true hive.blobstore.optimizations.enabled=true hive.blobstore.supported.schemes= hive.blobstore.use.blobstore.as.scratchdir=false > Backport HIVE-26758 (Allow use scratchdir for staging final job) > > > Key: HIVE-26815 > URL: https://issues.apache.org/jira/browse/HIVE-26815 > Project: Hive > Issue Type: Improvement > Components: Hive >Affects Versions: 3.1.3 >Reporter: Yi Zhang >Assignee: Yi Zhang >Priority: Minor > Labels: pull-request-available > Fix For: 3.2.0 > > Time Spent: 0.5h > Remaining Estimate: 0h > > HIVE-26758 add an option to allow choose set final job staging with > hive.exec.scratchdir. This is to backport this into 3.2.0 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (HIVE-26815) Backport HIVE-26758 (Allow use scratchdir for staging final job)
[ https://issues.apache.org/jira/browse/HIVE-26815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yi Zhang reassigned HIVE-26815: --- Assignee: Yi Zhang > Backport HIVE-26758 (Allow use scratchdir for staging final job) > > > Key: HIVE-26815 > URL: https://issues.apache.org/jira/browse/HIVE-26815 > Project: Hive > Issue Type: Improvement > Components: Hive >Affects Versions: 3.1.3 >Reporter: Yi Zhang >Assignee: Yi Zhang >Priority: Minor > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > HIVE-26758 add an option to allow choose set final job staging with > hive.exec.scratchdir. This is to backport this into 3.2.0 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (HIVE-26819) Vectorization: wrong results when filter on repeating map key orc table
[ https://issues.apache.org/jira/browse/HIVE-26819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yi Zhang reassigned HIVE-26819: --- Assignee: Yi Zhang > Vectorization: wrong results when filter on repeating map key orc table > --- > > Key: HIVE-26819 > URL: https://issues.apache.org/jira/browse/HIVE-26819 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 3.1.3 >Reporter: Yi Zhang >Assignee: Yi Zhang >Priority: Minor > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Same issue is fixed by HIVE-26447, this is to fix in 3.2.0. > Example reproducible case: > set hive.vectorized.execution.enabled=true; > set hive.fetch.task.conversion=none; > create temporary table foo (id int, x map) stored as orc; > insert into foo values(1, map('ABC', 9)), (2, map('ABC', 7)), (3, map('ABC', > 8)), (4, map('ABC', 9)); > select id from foo where x['ABC']=9; > this only gives 1, when correct result should be 1,4 > For every VectorizedRowBatch, only the first row is checked. > This seems to be a corner case of ORC table have repeating string type key > for map field in the MapColumnVector. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-26447) Vectorization: wrong results when filter on repeating map key orc table
[ https://issues.apache.org/jira/browse/HIVE-26447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yi Zhang updated HIVE-26447: Affects Version/s: (was: 3.1.3) > Vectorization: wrong results when filter on repeating map key orc table > --- > > Key: HIVE-26447 > URL: https://issues.apache.org/jira/browse/HIVE-26447 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 4.0.0-alpha-1 >Reporter: Yi Zhang >Assignee: Yi Zhang >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0-alpha-2 > > Time Spent: 3.5h > Remaining Estimate: 0h > > Example reproducible case: > > set hive.vectorized.execution.enabled=true; > set hive.fetch.task.conversion=none; > create temporary table foo (id int, x map) stored as orc; > insert into foo values(1, map('ABC', 9)), (2, map('ABC', 7)), (3, map('ABC', > 8)), (4, map('ABC', 9)); > select id from foo where x['ABC']=9; > this only gives 1, when correct result should be 1,4 > For every VectorizedRowBatch, only the first row is checked. > This seems to be a corner case of ORC table have repeating string type key > for map field in the MapColumnVector. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-26819) Vectorization: wrong results when filter on repeating map key orc table
[ https://issues.apache.org/jira/browse/HIVE-26819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yi Zhang updated HIVE-26819: Description: Same issue is fixed by HIVE-26447, this is to fix in 3.2.0. Example reproducible case: set hive.vectorized.execution.enabled=true; set hive.fetch.task.conversion=none; create temporary table foo (id int, x map) stored as orc; insert into foo values(1, map('ABC', 9)), (2, map('ABC', 7)), (3, map('ABC', 8)), (4, map('ABC', 9)); select id from foo where x['ABC']=9; this only gives 1, when correct result should be 1,4 For every VectorizedRowBatch, only the first row is checked. This seems to be a corner case of ORC table have repeating string type key for map field in the MapColumnVector. was:Same issue is fixed by HIVE-26447, this is to fix in 3.2.0. > Vectorization: wrong results when filter on repeating map key orc table > --- > > Key: HIVE-26819 > URL: https://issues.apache.org/jira/browse/HIVE-26819 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 3.1.3 >Reporter: Yi Zhang >Priority: Minor > > Same issue is fixed by HIVE-26447, this is to fix in 3.2.0. > Example reproducible case: > set hive.vectorized.execution.enabled=true; > set hive.fetch.task.conversion=none; > create temporary table foo (id int, x map) stored as orc; > insert into foo values(1, map('ABC', 9)), (2, map('ABC', 7)), (3, map('ABC', > 8)), (4, map('ABC', 9)); > select id from foo where x['ABC']=9; > this only gives 1, when correct result should be 1,4 > For every VectorizedRowBatch, only the first row is checked. > This seems to be a corner case of ORC table have repeating string type key > for map field in the MapColumnVector. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-26815) Backport HIVE-26758 (Allow use scratchdir for staging final job)
[ https://issues.apache.org/jira/browse/HIVE-26815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yi Zhang updated HIVE-26815: Affects Version/s: 3.1.3 (was: 3.2.0) > Backport HIVE-26758 (Allow use scratchdir for staging final job) > > > Key: HIVE-26815 > URL: https://issues.apache.org/jira/browse/HIVE-26815 > Project: Hive > Issue Type: Improvement > Components: Hive >Affects Versions: 3.1.3 >Reporter: Yi Zhang >Priority: Minor > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > HIVE-26758 add an option to allow choose set final job staging with > hive.exec.scratchdir. This is to backport this into 3.2.0 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-26447) Vectorization: wrong results when filter on repeating map key orc table
[ https://issues.apache.org/jira/browse/HIVE-26447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yi Zhang updated HIVE-26447: Affects Version/s: 4.0.0-alpha-1 (was: 4.0.0) > Vectorization: wrong results when filter on repeating map key orc table > --- > > Key: HIVE-26447 > URL: https://issues.apache.org/jira/browse/HIVE-26447 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 3.1.3, 4.0.0-alpha-1 >Reporter: Yi Zhang >Assignee: Yi Zhang >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0-alpha-2 > > Time Spent: 3.5h > Remaining Estimate: 0h > > Example reproducible case: > > set hive.vectorized.execution.enabled=true; > set hive.fetch.task.conversion=none; > create temporary table foo (id int, x map) stored as orc; > insert into foo values(1, map('ABC', 9)), (2, map('ABC', 7)), (3, map('ABC', > 8)), (4, map('ABC', 9)); > select id from foo where x['ABC']=9; > this only gives 1, when correct result should be 1,4 > For every VectorizedRowBatch, only the first row is checked. > This seems to be a corner case of ORC table have repeating string type key > for map field in the MapColumnVector. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-26815) Backport HIVE-26758 (Allow use scratchdir for staging final job)
[ https://issues.apache.org/jira/browse/HIVE-26815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yi Zhang updated HIVE-26815: Summary: Backport HIVE-26758 (Allow use scratchdir for staging final job) (was: Backport HIVE-26758 (Allow use scratchdir for staging final job) to 3.2.0) > Backport HIVE-26758 (Allow use scratchdir for staging final job) > > > Key: HIVE-26815 > URL: https://issues.apache.org/jira/browse/HIVE-26815 > Project: Hive > Issue Type: Improvement > Components: Hive >Affects Versions: 3.2.0 >Reporter: Yi Zhang >Priority: Minor > > HIVE-26758 add an option to allow choose set final job staging with > hive.exec.scratchdir. This is to backport this into 3.2.0 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-26813) Upgrade HikariCP from 2.6.1 to 4.0.3.
[ https://issues.apache.org/jira/browse/HIVE-26813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17644151#comment-17644151 ] Yi Zhang commented on HIVE-26813: - +1 > Upgrade HikariCP from 2.6.1 to 4.0.3. > - > > Key: HIVE-26813 > URL: https://issues.apache.org/jira/browse/HIVE-26813 > Project: Hive > Issue Type: Improvement > Components: Standalone Metastore >Reporter: Chris Nauroth >Assignee: Chris Nauroth >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > The Hive Metastore currently integrates with HikariCP 2.6.1 for database > connection pooling. This version was released in 2017. The most recent Java > 8-compatible release is 4.0.3, released earlier this year. This bug proposes > to upgrade so that we can include the past few years of development and bug > fixes in the 4.0.0 GA release. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-26758) Allow use scratchdir for staging final job
[ https://issues.apache.org/jira/browse/HIVE-26758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17641680#comment-17641680 ] Yi Zhang commented on HIVE-26758: - [~pvary] can you help review this? > Allow use scratchdir for staging final job > -- > > Key: HIVE-26758 > URL: https://issues.apache.org/jira/browse/HIVE-26758 > Project: Hive > Issue Type: New Feature > Components: Query Planning >Affects Versions: 4.0.0-alpha-2 >Reporter: Yi Zhang >Assignee: Yi Zhang >Priority: Minor > Labels: pull-request-available > Time Spent: 1h 50m > Remaining Estimate: 0h > > The query results are staged in stagingdir that is relative to the > destination path // > during blobstorage optimzation HIVE-17620 final job is set to use stagingdir. > HIVE-15215 mentioned the possibility of using scratch for staging when write > to S3 but it was long time ago and no activity. > > This is to allow final job to use hive.exec.scratchdir as the interim jobs, > with a configuration > hive.use.scratchdir.for.staging > This is useful for cross Filesystem, user can use local source filesystem > instead of remote filesystem for the staging. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-26758) Allow use scratchdir for staging final job
[ https://issues.apache.org/jira/browse/HIVE-26758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yi Zhang updated HIVE-26758: Description: The query results are staged in stagingdir that is relative to the destination path // during blobstorage optimzation HIVE-17620 final job is set to use stagingdir. HIVE-15215 mentioned the possibility of using scratch for staging when write to S3 but it was long time ago and no activity. This is to allow final job to use hive.exec.scratchdir as the interim jobs, with a configuration hive.use.scratchdir.for.staging This is useful for cross Filesystem, user can use local source filesystem instead of remote filesystem for the staging. was: The query results are staged in stagingdir that is relative to the destination path // It used to be able to change hive.exec.stagingdir for a different location, but that is lost during blobstorage optimzation HIVE-17620. HIVE-15215 mentioned the possibility of using scratch for staging when write to S3 but it was long time ago and no activity. This is to allow final job to use hive.exec.scratchdir as the interim jobs, with a configuration hive.use.scratchdir.for.staging This is useful for cross Filesystem, user can use local source filesystem instead of remote filesystem for the staging. > Allow use scratchdir for staging final job > -- > > Key: HIVE-26758 > URL: https://issues.apache.org/jira/browse/HIVE-26758 > Project: Hive > Issue Type: New Feature > Components: Query Planning >Affects Versions: 4.0.0-alpha-2 >Reporter: Yi Zhang >Assignee: Yi Zhang >Priority: Minor > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > The query results are staged in stagingdir that is relative to the > destination path // > during blobstorage optimzation HIVE-17620 final job is set to use stagingdir. > HIVE-15215 mentioned the possibility of using scratch for staging when write > to S3 but it was long time ago and no activity. > > This is to allow final job to use hive.exec.scratchdir as the interim jobs, > with a configuration > hive.use.scratchdir.for.staging > This is useful for cross Filesystem, user can use local source filesystem > instead of remote filesystem for the staging. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-26758) Allow use scratchdir for staging final job
[ https://issues.apache.org/jira/browse/HIVE-26758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yi Zhang updated HIVE-26758: Description: The query results are staged in stagingdir that is relative to the destination path // It used to be able to change hive.exec.stagingdir for a different location, but that is lost during blobstorage optimzation HIVE-17620. HIVE-15215 mentioned the possibility of using scratch for staging when write to S3 but it was long time ago and no activity. This is to allow final job to use hive.exec.scratchdir as the interim jobs, with a configuration hive.use.scratchdir.for.staging This is useful for cross Filesystem, user can use local source filesystem instead of remote filesystem for the staging. was: The query results are staged in stagingdir that is relative to the destination path // It used to be able to change hive.exec.stagingdir for a different location, but that is lost during blobstorage optimzation HIVE-17620. This is to allow final job to use hive.exec.scratchdir as the interim jobs, with a configuration hive.use.scratchdir_for_staging This is useful for cross Filesystem, user can use local source filesystem instead of remote filesystem for the staging. main change: for dynamic partitions that has static partition it was /// changes to /// or in case of \{hive.use.scratchdir_for_staging} // the change is due to that hive relies on parsing the path to discover partitions. > Allow use scratchdir for staging final job > -- > > Key: HIVE-26758 > URL: https://issues.apache.org/jira/browse/HIVE-26758 > Project: Hive > Issue Type: New Feature > Components: Query Planning >Affects Versions: 4.0.0-alpha-2 >Reporter: Yi Zhang >Assignee: Yi Zhang >Priority: Minor > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > The query results are staged in stagingdir that is relative to the > destination path // > It used to be able to change hive.exec.stagingdir for a different location, > but that is lost during blobstorage optimzation HIVE-17620. > HIVE-15215 mentioned the possibility of using scratch for staging when write > to S3 but it was long time ago and no activity. > > This is to allow final job to use hive.exec.scratchdir as the interim jobs, > with a configuration > hive.use.scratchdir.for.staging > This is useful for cross Filesystem, user can use local source filesystem > instead of remote filesystem for the staging. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-26758) Allow use scratchdir for staging final job
[ https://issues.apache.org/jira/browse/HIVE-26758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yi Zhang updated HIVE-26758: Summary: Allow use scratchdir for staging final job (was: Allow use scratchdir for staging) > Allow use scratchdir for staging final job > -- > > Key: HIVE-26758 > URL: https://issues.apache.org/jira/browse/HIVE-26758 > Project: Hive > Issue Type: New Feature > Components: Query Planning >Affects Versions: 4.0.0-alpha-2 >Reporter: Yi Zhang >Assignee: Yi Zhang >Priority: Minor > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > The query results are staged in stagingdir that is relative to the > destination path // > It used to be able to change hive.exec.stagingdir for a different location, > but that is lost during blobstorage optimzation HIVE-17620. > This is to allow final job to use hive.exec.scratchdir as the interim jobs, > with a configuration > hive.use.scratchdir_for_staging > This is useful for cross Filesystem, user can use local source filesystem > instead of remote filesystem for the staging. > main change: > for dynamic partitions that has static partition it was > /// > changes to > /// > or in case of \{hive.use.scratchdir_for_staging} > // > the change is due to that hive relies on parsing the path to discover > partitions. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-26758) Allow use scratchdir for staging
[ https://issues.apache.org/jira/browse/HIVE-26758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yi Zhang updated HIVE-26758: Description: The query results are staged in stagingdir that is relative to the destination path // It used to be able to change hive.exec.stagingdir for a different location, but that is lost during blobstorage optimzation HIVE-17620. This is to allow final job to use hive.exec.scratchdir as the interim jobs, with a configuration hive.use.scratchdir_for_staging This is useful for cross Filesystem, user can use local source filesystem instead of remote filesystem for the staging. main change: for dynamic partitions that has static partition it was /// changes to /// or in case of \{hive.use.scratchdir_for_staging} // the change is due to that hive relies on parsing the path to discover partitions. was: The query results are staged in stagingdir that is relative to the destination path // It used to be able to change hive.exec.stagingdir for a different location, but that is lost during blobstorage optimzation HIVE-17620. This is to allow final job to use hive.exec.scratchdir as the interim jobs, with a configuration hive.use.scratchdir_for_staging This is useful for cross Filesystem, user can use local source filesystem instead of remote filesystem for the staging. > Allow use scratchdir for staging > > > Key: HIVE-26758 > URL: https://issues.apache.org/jira/browse/HIVE-26758 > Project: Hive > Issue Type: New Feature > Components: Query Planning >Affects Versions: 4.0.0-alpha-2 >Reporter: Yi Zhang >Assignee: Yi Zhang >Priority: Minor > > The query results are staged in stagingdir that is relative to the > destination path // > It used to be able to change hive.exec.stagingdir for a different location, > but that is lost during blobstorage optimzation HIVE-17620. > This is to allow final job to use hive.exec.scratchdir as the interim jobs, > with a configuration > hive.use.scratchdir_for_staging > This is useful for cross Filesystem, user can use local source filesystem > instead of remote filesystem for the staging. > main change: > for dynamic partitions that has static partition it was > /// > changes to > /// > or in case of \{hive.use.scratchdir_for_staging} > // > the change is due to that hive relies on parsing the path to discover > partitions. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (HIVE-26758) Allow use scratchdir for staging
[ https://issues.apache.org/jira/browse/HIVE-26758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yi Zhang reassigned HIVE-26758: --- Assignee: Yi Zhang > Allow use scratchdir for staging > > > Key: HIVE-26758 > URL: https://issues.apache.org/jira/browse/HIVE-26758 > Project: Hive > Issue Type: New Feature > Components: Query Planning >Affects Versions: 4.0.0-alpha-2 >Reporter: Yi Zhang >Assignee: Yi Zhang >Priority: Minor > > The query results are staged in stagingdir that is relative to the > destination path // > It used to be able to change hive.exec.stagingdir for a different location, > but that is lost during blobstorage optimzation HIVE-17620. > This is to allow final job to use hive.exec.scratchdir as the interim jobs, > with a configuration > hive.use.scratchdir_for_staging > This is useful for cross Filesystem, user can use local source filesystem > instead of remote filesystem for the staging. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-26611) add HiveServer2 History Server?
[ https://issues.apache.org/jira/browse/HIVE-26611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17617256#comment-17617256 ] Yi Zhang commented on HIVE-26611: - [~ngangam] thank you for the guidance! I posted a 1-pager doc. please review. > add HiveServer2 History Server? > --- > > Key: HIVE-26611 > URL: https://issues.apache.org/jira/browse/HIVE-26611 > Project: Hive > Issue Type: Improvement > Components: HiveServer2 >Reporter: Yi Zhang >Priority: Major > Attachments: HiveServer2 History Server.pdf > > > HiveServer2 Web UI provides query profile and optional operation log, however > these are gone when hs2 server exits. > Was there discussion of add a hs2 history server before? -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-26611) add HiveServer2 History Server?
[ https://issues.apache.org/jira/browse/HIVE-26611?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yi Zhang updated HIVE-26611: Attachment: HiveServer2 History Server.pdf > add HiveServer2 History Server? > --- > > Key: HIVE-26611 > URL: https://issues.apache.org/jira/browse/HIVE-26611 > Project: Hive > Issue Type: Improvement > Components: HiveServer2 >Reporter: Yi Zhang >Priority: Major > Attachments: HiveServer2 History Server.pdf > > > HiveServer2 Web UI provides query profile and optional operation log, however > these are gone when hs2 server exits. > Was there discussion of add a hs2 history server before? -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-26611) add HiveServer2 History Server?
[ https://issues.apache.org/jira/browse/HIVE-26611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17615350#comment-17615350 ] Yi Zhang commented on HIVE-26611: - Thank you [~pvary] for the input! HS2 Web UI does have its limitations in production env, thus a stand-alone hs2 history server may work better for production env, if there is no third party tools available. Besides the reason that there are 3rd party tools, I was wondering if this was not developed because it would not bring much value? [~ngangam] , I am thinking if this can bring value, I can look into this, but I am not familiar with hive community's current directions, appreciate your input! > add HiveServer2 History Server? > --- > > Key: HIVE-26611 > URL: https://issues.apache.org/jira/browse/HIVE-26611 > Project: Hive > Issue Type: Improvement > Components: HiveServer2 >Reporter: Yi Zhang >Priority: Major > > HiveServer2 Web UI provides query profile and optional operation log, however > these are gone when hs2 server exits. > Was there discussion of add a hs2 history server before? -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-26611) add HiveServer2 History Server?
[ https://issues.apache.org/jira/browse/HIVE-26611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17614176#comment-17614176 ] Yi Zhang commented on HIVE-26611: - [~pvary] wonder if you have any insight on this? thank you! > add HiveServer2 History Server? > --- > > Key: HIVE-26611 > URL: https://issues.apache.org/jira/browse/HIVE-26611 > Project: Hive > Issue Type: Improvement > Components: HiveServer2 >Reporter: Yi Zhang >Priority: Major > > HiveServer2 Web UI provides query profile and optional operation log, however > these are gone when hs2 server exits. > Was there discussion of add a hs2 history server before? -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-26564) Separate query live operation log and historical operation log
[ https://issues.apache.org/jira/browse/HIVE-26564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yi Zhang updated HIVE-26564: Description: HIVE-24802 added OperationLogManager to support historical operation logs. OperationLogManager.createOperationLog creates operation log inside historical operation log dir if HIVE_SERVER2_HISTORIC_OPERATION_LOG_ENABLED=true. This is confusing, since on session level, SessionManager and HiveSession are using original operation log session directory. Proposed change is to separate live query's operation log and historical operation log. Upon operation close, OperationLogManager.closeOperation is called to move the operation log from session directory to historical log dir. OperationLogManager is only responsible to clean up historical operation logs. This change also makes it easier to manage historical logs, for example, user may want to persist historical logs, it is easier to differentiate live and historical operation logs. before this change, if HIVE_SERVER2_HISTORIC_OPERATION_LOG_ENABLED=true, the operation logs lay out is as following. in operation_logs_historic has both live queries and historic queries's operational logs ``` /tmp/hive/ ├── operation_logs └── operation_logs_historic └── hs2hostname_startupTimestamp ├── session_id_1 │ ├── hive_query_id_1 │ ├── hive_query_id_2 │ └── hive_query_id_3 ├── session_id_2 │ ├── hive_query_id_4 │ └── hive_query_id_5 ├── session_id_3 │ └── hive_query_id_6 └── session_id_4 ├── hive_query_id_7 └── hive_query_id_8 ``` after this change, the live queries operation logs are under and historical ones under /tmp/hive ├── operation_logs │ ├── session_id_1 │ │ ├── hive_query_id_2 │ │ └── hive_query_id_3 │ └── session_id_4 │ └── hive_query_id_8 └── operation_logs_historic └── hs2hostname_startupTimestamp ├── session_id_1 │ └── hive_query_id_1 ├── session_id_2 │ ├── hive_query_id_4 │ └── hive_query_id_5 ├── session_id_3 │ └── hive_query_id_6 └── session_id_4 └── hive_query_id_7 was: HIVE-24802 added OperationLogManager to support historical operation logs. OperationLogManager.createOperationLog creates operation log inside historical operation log dir if HIVE_SERVER2_HISTORIC_OPERATION_LOG_ENABLED=true. This is confusing, since on session level, SessionManager and HiveSession are using original operation log session directory. Proposed change is to separate live query's operation log and historical operation log. Upon operation close, OperationLogManager.closeOperation is called to move the operation log from session directory to historical log dir. OperationLogManager is only responsible to clean up historical operation logs. This change also makes it easier to manage historical logs, for example, user may want to persist historical logs, it is easier to differentiate live and historical operation logs. before this change, if HIVE_SERVER2_HISTORIC_OPERATION_LOG_ENABLED=true, the operation logs lay out is as following. in operation_logs_historic has both live queries and historic queries's operational logs /tmp/hive/ ├── operation_logs └── operation_logs_historic └── hs2hostname_startupTimestamp ├── session_id_1 │ ├── hive_query_id_1 │ ├── hive_query_id_2 │ └── hive_query_id_3 ├── session_id_2 │ ├── hive_query_id_4 │ └── hive_query_id_5 ├── session_id_3 │ └── hive_query_id_6 └── session_id_4 ├── hive_query_id_7 └── hive_query_id_8 after this change, the live queries operation logs are under and historical ones under /tmp/hive ├── operation_logs │ ├── session_id_1 │ │ ├── hive_query_id_2 │ │ └── hive_query_id_3 │ └── session_id_4 │ └── hive_query_id_8 └── operation_logs_historic └── hs2hostname_startupTimestamp ├── session_id_1 │ └── hive_query_id_1 ├── session_id_2 │ ├── hive_query_id_4 │ └── hive_query_id_5 ├── session_id_3 │ └── hive_query_id_6 └── session_id_4 └── hive_query_id_7 > Separate query live operation log and historical operation log > -- > > Key: HIVE-26564 > URL: https://issues.apache.org/jira/browse/HIVE-26564 > Project: Hive > Issue Type: Improvement > Components: HiveServer2 >Affects Versions: 4.0.0-alpha-2 >Reporter: Yi Zhang >Assignee: Yi Zhang >Priority: Minor > Labels: pull-request-available >
[jira] [Commented] (HIVE-26564) Separate query live operation log and historical operation log
[ https://issues.apache.org/jira/browse/HIVE-26564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17610790#comment-17610790 ] Yi Zhang commented on HIVE-26564: - [~zabetak] updated the description with example layouts before and after the change. > Separate query live operation log and historical operation log > -- > > Key: HIVE-26564 > URL: https://issues.apache.org/jira/browse/HIVE-26564 > Project: Hive > Issue Type: Improvement > Components: HiveServer2 >Affects Versions: 4.0.0-alpha-2 >Reporter: Yi Zhang >Assignee: Yi Zhang >Priority: Minor > Labels: pull-request-available > Time Spent: 1h 40m > Remaining Estimate: 0h > > HIVE-24802 added OperationLogManager to support historical operation logs. > OperationLogManager.createOperationLog creates operation log inside > historical operation log dir if > HIVE_SERVER2_HISTORIC_OPERATION_LOG_ENABLED=true. This is confusing, since on > session level, SessionManager and HiveSession are using original operation > log session directory. > Proposed change is to separate live query's operation log and historical > operation log. Upon operation close, OperationLogManager.closeOperation is > called to move the operation log from session directory to historical log > dir. OperationLogManager is only responsible to clean up historical operation > logs. > This change also makes it easier to manage historical logs, for example, user > may want to persist historical logs, it is easier to differentiate live and > historical operation logs. > > before this change, if HIVE_SERVER2_HISTORIC_OPERATION_LOG_ENABLED=true, the > operation logs lay out is as following. in operation_logs_historic has both > live queries and historic queries's operational logs > /tmp/hive/ > ├── operation_logs > └── operation_logs_historic > └── hs2hostname_startupTimestamp > ├── session_id_1 > │ ├── hive_query_id_1 > │ ├── hive_query_id_2 > │ └── hive_query_id_3 > ├── session_id_2 > │ ├── hive_query_id_4 > │ └── hive_query_id_5 > ├── session_id_3 > │ └── hive_query_id_6 > └── session_id_4 > ├── hive_query_id_7 > └── hive_query_id_8 > > after this change, the live queries operation logs are under > and historical ones under > /tmp/hive > ├── operation_logs > │ ├── session_id_1 > │ │ ├── hive_query_id_2 > │ │ └── hive_query_id_3 > │ └── session_id_4 > │ └── hive_query_id_8 > └── operation_logs_historic > └── hs2hostname_startupTimestamp > ├── session_id_1 > │ └── hive_query_id_1 > ├── session_id_2 > │ ├── hive_query_id_4 > │ └── hive_query_id_5 > ├── session_id_3 > │ └── hive_query_id_6 > └── session_id_4 > └── hive_query_id_7 > > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-26564) Separate query live operation log and historical operation log
[ https://issues.apache.org/jira/browse/HIVE-26564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yi Zhang updated HIVE-26564: Description: HIVE-24802 added OperationLogManager to support historical operation logs. OperationLogManager.createOperationLog creates operation log inside historical operation log dir if HIVE_SERVER2_HISTORIC_OPERATION_LOG_ENABLED=true. This is confusing, since on session level, SessionManager and HiveSession are using original operation log session directory. Proposed change is to separate live query's operation log and historical operation log. Upon operation close, OperationLogManager.closeOperation is called to move the operation log from session directory to historical log dir. OperationLogManager is only responsible to clean up historical operation logs. This change also makes it easier to manage historical logs, for example, user may want to persist historical logs, it is easier to differentiate live and historical operation logs. before this change, if HIVE_SERVER2_HISTORIC_OPERATION_LOG_ENABLED=true, the operation logs lay out is as following. in operation_logs_historic has both live queries and historic queries's operational logs /tmp/hive/ ├── operation_logs └── operation_logs_historic └── hs2hostname_startupTimestamp ├── session_id_1 │ ├── hive_query_id_1 │ ├── hive_query_id_2 │ └── hive_query_id_3 ├── session_id_2 │ ├── hive_query_id_4 │ └── hive_query_id_5 ├── session_id_3 │ └── hive_query_id_6 └── session_id_4 ├── hive_query_id_7 └── hive_query_id_8 after this change, the live queries operation logs are under and historical ones under /tmp/hive ├── operation_logs │ ├── session_id_1 │ │ ├── hive_query_id_2 │ │ └── hive_query_id_3 │ └── session_id_4 │ └── hive_query_id_8 └── operation_logs_historic └── hs2hostname_startupTimestamp ├── session_id_1 │ └── hive_query_id_1 ├── session_id_2 │ ├── hive_query_id_4 │ └── hive_query_id_5 ├── session_id_3 │ └── hive_query_id_6 └── session_id_4 └── hive_query_id_7 was: HIVE-24802 added OperationLogManager to support historical operation logs. OperationLogManager.createOperationLog creates operation log inside historical operation log dir if HIVE_SERVER2_HISTORIC_OPERATION_LOG_ENABLED=true. This is confusing, since on session level, SessionManager and HiveSession are using original operation log session directory. Proposed change is to separate live query's operation log and historical operation log. Upon operation close, OperationLogManager.closeOperation is called to move the operation log from session directory to historical log dir. OperationLogManager is only responsible to clean up historical operation logs. This change also makes it easier to manage historical logs, for example, user may want to persist historical logs, it is easier to differentiate live and historical operation logs. > Separate query live operation log and historical operation log > -- > > Key: HIVE-26564 > URL: https://issues.apache.org/jira/browse/HIVE-26564 > Project: Hive > Issue Type: Improvement > Components: HiveServer2 >Affects Versions: 4.0.0-alpha-2 >Reporter: Yi Zhang >Assignee: Yi Zhang >Priority: Minor > Labels: pull-request-available > Time Spent: 1h 40m > Remaining Estimate: 0h > > HIVE-24802 added OperationLogManager to support historical operation logs. > OperationLogManager.createOperationLog creates operation log inside > historical operation log dir if > HIVE_SERVER2_HISTORIC_OPERATION_LOG_ENABLED=true. This is confusing, since on > session level, SessionManager and HiveSession are using original operation > log session directory. > Proposed change is to separate live query's operation log and historical > operation log. Upon operation close, OperationLogManager.closeOperation is > called to move the operation log from session directory to historical log > dir. OperationLogManager is only responsible to clean up historical operation > logs. > This change also makes it easier to manage historical logs, for example, user > may want to persist historical logs, it is easier to differentiate live and > historical operation logs. > > before this change, if HIVE_SERVER2_HISTORIC_OPERATION_LOG_ENABLED=true, the > operation logs lay out is as following. in operation_logs_historic has both > live queries and historic queries's operational logs > /tmp/hive/ > ├── operation_logs > └── operation_logs_historic > └── hs2hostname_startupTimestamp > ├── session_id_1 >
[jira] [Assigned] (HIVE-26564) Separate query live operation log and historical operation log
[ https://issues.apache.org/jira/browse/HIVE-26564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yi Zhang reassigned HIVE-26564: --- > Separate query live operation log and historical operation log > -- > > Key: HIVE-26564 > URL: https://issues.apache.org/jira/browse/HIVE-26564 > Project: Hive > Issue Type: Improvement > Components: HiveServer2 >Affects Versions: 4.0.0-alpha-2 >Reporter: Yi Zhang >Assignee: Yi Zhang >Priority: Minor > > HIVE-24802 added OperationLogManager to support historical operation logs. > OperationLogManager.createOperationLog creates operation log inside > historical operation log dir if > HIVE_SERVER2_HISTORIC_OPERATION_LOG_ENABLED=true. This is confusing, since on > session level, SessionManager and HiveSession are using original operation > log session directory. > Proposed change is to separate live query's operation log and historical > operation log. Upon operation close, OperationLogManager.closeOperation is > called to move the operation log from session directory to historical log > dir. OperationLogManager is only responsible to clean up historical operation > logs. > This change also makes it easier to manage historical logs, for example, user > may want to persist historical logs, it is easier to differentiate live and > historical operation logs. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (HIVE-26478) Explicitly set Content-Type in QueryProfileServlet
[ https://issues.apache.org/jira/browse/HIVE-26478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yi Zhang reassigned HIVE-26478: --- > Explicitly set Content-Type in QueryProfileServlet > -- > > Key: HIVE-26478 > URL: https://issues.apache.org/jira/browse/HIVE-26478 > Project: Hive > Issue Type: Improvement > Components: HiveServer2 >Affects Versions: 3.1.3, 4.0.0 >Reporter: Yi Zhang >Assignee: Yi Zhang >Priority: Minor > > QueryProfileServlet does not set Content-type, though browser may detect it > correctly but for some application that checks Content-type, it would be > helpful to set it -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-26447) Vectorization: wrong results when filter on repeating map key orc table
[ https://issues.apache.org/jira/browse/HIVE-26447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yi Zhang updated HIVE-26447: Summary: Vectorization: wrong results when filter on repeating map key orc table (was: Vectorization: wrong results when filter on repeating map key) > Vectorization: wrong results when filter on repeating map key orc table > --- > > Key: HIVE-26447 > URL: https://issues.apache.org/jira/browse/HIVE-26447 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 3.1.3, 4.0.0 >Reporter: Yi Zhang >Assignee: Yi Zhang >Priority: Major > > Example reproducible case: > > set hive.vectorized.execution.enabled=true; > set hive.fetch.task.conversion=none; > create temporary table foo (id int, x map) stored as orc; > insert into foo values(1, map('ABC', 9)), (2, map('ABC', 7)), (3, map('ABC', > 8)), (4, map('ABC', 9)); > select id from foo where x['ABC']=9; > this only gives 1, when correct result should be 1,4 > For every VectorizedRowBatch, only the first row is checked. > This seems to be a corner case of ORC table have repeating string type key > for map field in the MapColumnVector. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (HIVE-26447) Vectorization: wrong results when filter on repeating map key
[ https://issues.apache.org/jira/browse/HIVE-26447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yi Zhang reassigned HIVE-26447: --- > Vectorization: wrong results when filter on repeating map key > - > > Key: HIVE-26447 > URL: https://issues.apache.org/jira/browse/HIVE-26447 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 3.1.3, 4.0.0 >Reporter: Yi Zhang >Assignee: Yi Zhang >Priority: Major > > Example reproducible case: > > set hive.vectorized.execution.enabled=true; > set hive.fetch.task.conversion=none; > create temporary table foo (id int, x map) stored as orc; > insert into foo values(1, map('ABC', 9)), (2, map('ABC', 7)), (3, map('ABC', > 8)), (4, map('ABC', 9)); > select id from foo where x['ABC']=9; > this only gives 1, when correct result should be 1,4 > For every VectorizedRowBatch, only the first row is checked. > This seems to be a corner case of ORC table have repeating string type key > for map field in the MapColumnVector. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (HIVE-16331) create orc table fails in auto merge job when hive.exec.scratchdir set to viewfs path
[ https://issues.apache.org/jira/browse/HIVE-16331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yi Zhang resolved HIVE-16331. - Resolution: Fixed looks duplicate to HIVE-12522 > create orc table fails in auto merge job when hive.exec.scratchdir set to > viewfs path > -- > > Key: HIVE-16331 > URL: https://issues.apache.org/jira/browse/HIVE-16331 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: Yi Zhang > > if hive.exec.sracthdir set to viewfs path, but fs.defaultFS not in viewfs, > when create ORC table in hive/tez, if auto merge job is kicked off, the auto > merge job fails with following error: > ``` > 2017-03-29 23:10:57,892 INFO [main]: org.apache.hadoop.hive.ql.Driver: > Launching Job 3 out of 3 > 2017-03-29 23:10:57,894 INFO [main]: org.apache.hadoop.hive.ql.Driver: > Starting task [Stage-4:MAPRED] in serial mode > 2017-03-29 23:10:57,894 INFO [main]: > org.apache.hadoop.hive.ql.exec.tez.TezSessionPoolManager: The current user: > yizhang, session user: yizhang > 2017-03-29 23:10:57,894 INFO [main]: > org.apache.hadoop.hive.ql.exec.tez.TezSessionPoolManager: Current queue name > is hadoop-sync incoming queue name is hadoop-sync > 2017-03-29 23:10:57,949 INFO [main]: hive.ql.Context: New scratch dir is > viewfs://ns-default/tmp/hive_scratchdir/yizhang/5da3d082-33b3-4194-97e2-005549d1b3c4/hive_2017-03-29_23-09-55_489_4030642791346631679-1 > 2017-03-29 23:10:57,949 DEBUG [main]: > org.apache.hadoop.hive.ql.exec.tez.DagUtils: TezDir path set > viewfs://ns-default/tmp/hive_scratchdir/yizhang/5da3d082-33b3-4194-97e2-005549d1b3c4/hive_2017-03-29_23-09-55_489_4030642791346631679-1/yizhang/_tez_scratch_dir > for user: yizhang > 2017-03-29 23:10:57,950 DEBUG [main]: org.apache.hadoop.hdfs.DFSClient: > /tmp/hive_scratchdir/yizhang/5da3d082-33b3-4194-97e2-005549d1b3c4/hive_2017-03-29_23-09-55_489_4030642791346631679-1/yizhang/_tez_scratch_dir: > masked=rwxr-xr-x > 2017-03-29 23:10:57,950 DEBUG [main]: org.apache.hadoop.ipc.Client: The ping > interval is 6 ms. > 2017-03-29 23:10:57,950 DEBUG [main]: org.apache.hadoop.ipc.Client: > Connecting to > hadooplithiumnamenode01-sjc1.prod.uber.internal/10.67.143.155:8020 > 2017-03-29 23:10:57,951 DEBUG [IPC Client (85121323) connection to > hadooplithiumnamenode01-sjc1.prod.uber.internal/10.67.143.155:8020 from > yizhang]: org.apache.hadoop.ipc.Client: IPC Client (85121323) connection to > hadooplithiumnamenode01-sjc1.prod.uber.internal/10.67.143.155:8020 from > yizhang: starting, having connections 3 > 2017-03-29 23:10:57,951 DEBUG [IPC Parameter Sending Thread #0]: > org.apache.hadoop.ipc.Client: IPC Client (85121323) connection to > hadooplithiumnamenode01-sjc1.prod.uber.internal/10.67.143.155:8020 from > yizhang sending #373 > 2017-03-29 23:10:57,954 DEBUG [IPC Client (85121323) connection to > hadooplithiumnamenode01-sjc1.prod.uber.internal/10.67.143.155:8020 from > yizhang]: org.apache.hadoop.ipc.Client: IPC Client (85121323) connection to > hadooplithiumnamenode01-sjc1.prod.uber.internal/10.67.143.155:8020 from > yizhang got value #373 > 2017-03-29 23:10:57,955 DEBUG [main]: > org.apache.hadoop.ipc.ProtobufRpcEngine: Call: mkdirs took 5ms > 2017-03-29 23:10:57,955 INFO [main]: org.apache.hadoop.hive.ql.exec.Task: > Session is already open > 2017-03-29 23:10:57,955 DEBUG [IPC Parameter Sending Thread #0]: > org.apache.hadoop.ipc.Client: IPC Client (85121323) connection to > hadooplithiumnamenode01-sjc1.prod.uber.internal/10.67.143.155:8020 from > yizhang sending #374 > 2017-03-29 23:10:57,956 DEBUG [IPC Client (85121323) connection to > hadooplithiumnamenode01-sjc1.prod.uber.internal/10.67.143.155:8020 from > yizhang]: org.apache.hadoop.ipc.Client: IPC Client (85121323) connection to > hadooplithiumnamenode01-sjc1.prod.uber.internal/10.67.143.155:8020 from > yizhang got value #374 > 2017-03-29 23:10:57,956 DEBUG [main]: > org.apache.hadoop.ipc.ProtobufRpcEngine: Call: getFileInfo took 1ms > 2017-03-29 23:10:57,956 DEBUG [IPC Parameter Sending Thread #0]: > org.apache.hadoop.ipc.Client: IPC Client (85121323) connection to > hadooplithiumnamenode01-sjc1.prod.uber.internal/10.67.143.155:8020 from > yizhang sending #375 > 2017-03-29 23:10:57,961 DEBUG [IPC Client (85121323) connection to > hadooplithiumnamenode01-sjc1.prod.uber.internal/10.67.143.155:8020 from > yizhang]: org.apache.hadoop.ipc.Client: IPC Client (85121323) connection to > hadooplithiumnamenode01-sjc1.prod.uber.internal/10.67.143.155:8020 from > yizhang got value #375 > 2017-03-29 23:10:57,961 DEBUG [main]: > org.apache.hadoop.ipc.ProtobufRpcEngine: Call: getFileInfo took 5ms > 2017-03-29 23:10:57,962 DEBUG [IPC Parameter Sending Thread #0]: > org.apache.hadoop.ipc.Client: IPC Client (85121323) connection to >
[jira] [Updated] (HIVE-16331) create orc table fails in auto merge job when hive.exec.scratchdir set to viewfs path
[ https://issues.apache.org/jira/browse/HIVE-16331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yi Zhang updated HIVE-16331: Summary: create orc table fails in auto merge job when hive.exec.scratchdir set to viewfs path (was: create orc table fails when hive.exec.scratchdir set to viewfs path in auto merge jobs) > create orc table fails in auto merge job when hive.exec.scratchdir set to > viewfs path > -- > > Key: HIVE-16331 > URL: https://issues.apache.org/jira/browse/HIVE-16331 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: Yi Zhang > > if hive.exec.sracthdir set to viewfs path, but fs.defaultFS not in viewfs, > when create ORC table in hive/tez, if auto merge job is kicked off, the auto > merge job fails with following error: > ``` > 2017-03-29 23:10:57,892 INFO [main]: org.apache.hadoop.hive.ql.Driver: > Launching Job 3 out of 3 > 2017-03-29 23:10:57,894 INFO [main]: org.apache.hadoop.hive.ql.Driver: > Starting task [Stage-4:MAPRED] in serial mode > 2017-03-29 23:10:57,894 INFO [main]: > org.apache.hadoop.hive.ql.exec.tez.TezSessionPoolManager: The current user: > yizhang, session user: yizhang > 2017-03-29 23:10:57,894 INFO [main]: > org.apache.hadoop.hive.ql.exec.tez.TezSessionPoolManager: Current queue name > is hadoop-sync incoming queue name is hadoop-sync > 2017-03-29 23:10:57,949 INFO [main]: hive.ql.Context: New scratch dir is > viewfs://ns-default/tmp/hive_scratchdir/yizhang/5da3d082-33b3-4194-97e2-005549d1b3c4/hive_2017-03-29_23-09-55_489_4030642791346631679-1 > 2017-03-29 23:10:57,949 DEBUG [main]: > org.apache.hadoop.hive.ql.exec.tez.DagUtils: TezDir path set > viewfs://ns-default/tmp/hive_scratchdir/yizhang/5da3d082-33b3-4194-97e2-005549d1b3c4/hive_2017-03-29_23-09-55_489_4030642791346631679-1/yizhang/_tez_scratch_dir > for user: yizhang > 2017-03-29 23:10:57,950 DEBUG [main]: org.apache.hadoop.hdfs.DFSClient: > /tmp/hive_scratchdir/yizhang/5da3d082-33b3-4194-97e2-005549d1b3c4/hive_2017-03-29_23-09-55_489_4030642791346631679-1/yizhang/_tez_scratch_dir: > masked=rwxr-xr-x > 2017-03-29 23:10:57,950 DEBUG [main]: org.apache.hadoop.ipc.Client: The ping > interval is 6 ms. > 2017-03-29 23:10:57,950 DEBUG [main]: org.apache.hadoop.ipc.Client: > Connecting to > hadooplithiumnamenode01-sjc1.prod.uber.internal/10.67.143.155:8020 > 2017-03-29 23:10:57,951 DEBUG [IPC Client (85121323) connection to > hadooplithiumnamenode01-sjc1.prod.uber.internal/10.67.143.155:8020 from > yizhang]: org.apache.hadoop.ipc.Client: IPC Client (85121323) connection to > hadooplithiumnamenode01-sjc1.prod.uber.internal/10.67.143.155:8020 from > yizhang: starting, having connections 3 > 2017-03-29 23:10:57,951 DEBUG [IPC Parameter Sending Thread #0]: > org.apache.hadoop.ipc.Client: IPC Client (85121323) connection to > hadooplithiumnamenode01-sjc1.prod.uber.internal/10.67.143.155:8020 from > yizhang sending #373 > 2017-03-29 23:10:57,954 DEBUG [IPC Client (85121323) connection to > hadooplithiumnamenode01-sjc1.prod.uber.internal/10.67.143.155:8020 from > yizhang]: org.apache.hadoop.ipc.Client: IPC Client (85121323) connection to > hadooplithiumnamenode01-sjc1.prod.uber.internal/10.67.143.155:8020 from > yizhang got value #373 > 2017-03-29 23:10:57,955 DEBUG [main]: > org.apache.hadoop.ipc.ProtobufRpcEngine: Call: mkdirs took 5ms > 2017-03-29 23:10:57,955 INFO [main]: org.apache.hadoop.hive.ql.exec.Task: > Session is already open > 2017-03-29 23:10:57,955 DEBUG [IPC Parameter Sending Thread #0]: > org.apache.hadoop.ipc.Client: IPC Client (85121323) connection to > hadooplithiumnamenode01-sjc1.prod.uber.internal/10.67.143.155:8020 from > yizhang sending #374 > 2017-03-29 23:10:57,956 DEBUG [IPC Client (85121323) connection to > hadooplithiumnamenode01-sjc1.prod.uber.internal/10.67.143.155:8020 from > yizhang]: org.apache.hadoop.ipc.Client: IPC Client (85121323) connection to > hadooplithiumnamenode01-sjc1.prod.uber.internal/10.67.143.155:8020 from > yizhang got value #374 > 2017-03-29 23:10:57,956 DEBUG [main]: > org.apache.hadoop.ipc.ProtobufRpcEngine: Call: getFileInfo took 1ms > 2017-03-29 23:10:57,956 DEBUG [IPC Parameter Sending Thread #0]: > org.apache.hadoop.ipc.Client: IPC Client (85121323) connection to > hadooplithiumnamenode01-sjc1.prod.uber.internal/10.67.143.155:8020 from > yizhang sending #375 > 2017-03-29 23:10:57,961 DEBUG [IPC Client (85121323) connection to > hadooplithiumnamenode01-sjc1.prod.uber.internal/10.67.143.155:8020 from > yizhang]: org.apache.hadoop.ipc.Client: IPC Client (85121323) connection to > hadooplithiumnamenode01-sjc1.prod.uber.internal/10.67.143.155:8020 from > yizhang got value #375 > 2017-03-29 23:10:57,961 DEBUG [main]: > org.apache.hadoop.ipc.ProtobufRpcEngine: Call: getFileInfo took 5ms >
[jira] [Updated] (HIVE-16331) create orc table fails in auto merge job when hive.exec.scratchdir set to viewfs path
[ https://issues.apache.org/jira/browse/HIVE-16331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yi Zhang updated HIVE-16331: Description: if hive.exec.sracthdir set to viewfs path, but fs.defaultFS not in viewfs, when create ORC table in hive/tez, if auto merge job is kicked off, the auto merge job fails with following error: ``` 2017-03-29 23:10:57,892 INFO [main]: org.apache.hadoop.hive.ql.Driver: Launching Job 3 out of 3 2017-03-29 23:10:57,894 INFO [main]: org.apache.hadoop.hive.ql.Driver: Starting task [Stage-4:MAPRED] in serial mode 2017-03-29 23:10:57,894 INFO [main]: org.apache.hadoop.hive.ql.exec.tez.TezSessionPoolManager: The current user: yizhang, session user: yizhang 2017-03-29 23:10:57,894 INFO [main]: org.apache.hadoop.hive.ql.exec.tez.TezSessionPoolManager: Current queue name is hadoop-sync incoming queue name is hadoop-sync 2017-03-29 23:10:57,949 INFO [main]: hive.ql.Context: New scratch dir is viewfs://ns-default/tmp/hive_scratchdir/yizhang/5da3d082-33b3-4194-97e2-005549d1b3c4/hive_2017-03-29_23-09-55_489_4030642791346631679-1 2017-03-29 23:10:57,949 DEBUG [main]: org.apache.hadoop.hive.ql.exec.tez.DagUtils: TezDir path set viewfs://ns-default/tmp/hive_scratchdir/yizhang/5da3d082-33b3-4194-97e2-005549d1b3c4/hive_2017-03-29_23-09-55_489_4030642791346631679-1/yizhang/_tez_scratch_dir for user: yizhang 2017-03-29 23:10:57,950 DEBUG [main]: org.apache.hadoop.hdfs.DFSClient: /tmp/hive_scratchdir/yizhang/5da3d082-33b3-4194-97e2-005549d1b3c4/hive_2017-03-29_23-09-55_489_4030642791346631679-1/yizhang/_tez_scratch_dir: masked=rwxr-xr-x 2017-03-29 23:10:57,950 DEBUG [main]: org.apache.hadoop.ipc.Client: The ping interval is 6 ms. 2017-03-29 23:10:57,950 DEBUG [main]: org.apache.hadoop.ipc.Client: Connecting to hadooplithiumnamenode01-sjc1.prod.uber.internal/10.67.143.155:8020 2017-03-29 23:10:57,951 DEBUG [IPC Client (85121323) connection to hadooplithiumnamenode01-sjc1.prod.uber.internal/10.67.143.155:8020 from yizhang]: org.apache.hadoop.ipc.Client: IPC Client (85121323) connection to hadooplithiumnamenode01-sjc1.prod.uber.internal/10.67.143.155:8020 from yizhang: starting, having connections 3 2017-03-29 23:10:57,951 DEBUG [IPC Parameter Sending Thread #0]: org.apache.hadoop.ipc.Client: IPC Client (85121323) connection to hadooplithiumnamenode01-sjc1.prod.uber.internal/10.67.143.155:8020 from yizhang sending #373 2017-03-29 23:10:57,954 DEBUG [IPC Client (85121323) connection to hadooplithiumnamenode01-sjc1.prod.uber.internal/10.67.143.155:8020 from yizhang]: org.apache.hadoop.ipc.Client: IPC Client (85121323) connection to hadooplithiumnamenode01-sjc1.prod.uber.internal/10.67.143.155:8020 from yizhang got value #373 2017-03-29 23:10:57,955 DEBUG [main]: org.apache.hadoop.ipc.ProtobufRpcEngine: Call: mkdirs took 5ms 2017-03-29 23:10:57,955 INFO [main]: org.apache.hadoop.hive.ql.exec.Task: Session is already open 2017-03-29 23:10:57,955 DEBUG [IPC Parameter Sending Thread #0]: org.apache.hadoop.ipc.Client: IPC Client (85121323) connection to hadooplithiumnamenode01-sjc1.prod.uber.internal/10.67.143.155:8020 from yizhang sending #374 2017-03-29 23:10:57,956 DEBUG [IPC Client (85121323) connection to hadooplithiumnamenode01-sjc1.prod.uber.internal/10.67.143.155:8020 from yizhang]: org.apache.hadoop.ipc.Client: IPC Client (85121323) connection to hadooplithiumnamenode01-sjc1.prod.uber.internal/10.67.143.155:8020 from yizhang got value #374 2017-03-29 23:10:57,956 DEBUG [main]: org.apache.hadoop.ipc.ProtobufRpcEngine: Call: getFileInfo took 1ms 2017-03-29 23:10:57,956 DEBUG [IPC Parameter Sending Thread #0]: org.apache.hadoop.ipc.Client: IPC Client (85121323) connection to hadooplithiumnamenode01-sjc1.prod.uber.internal/10.67.143.155:8020 from yizhang sending #375 2017-03-29 23:10:57,961 DEBUG [IPC Client (85121323) connection to hadooplithiumnamenode01-sjc1.prod.uber.internal/10.67.143.155:8020 from yizhang]: org.apache.hadoop.ipc.Client: IPC Client (85121323) connection to hadooplithiumnamenode01-sjc1.prod.uber.internal/10.67.143.155:8020 from yizhang got value #375 2017-03-29 23:10:57,961 DEBUG [main]: org.apache.hadoop.ipc.ProtobufRpcEngine: Call: getFileInfo took 5ms 2017-03-29 23:10:57,962 DEBUG [IPC Parameter Sending Thread #0]: org.apache.hadoop.ipc.Client: IPC Client (85121323) connection to hadooplithiumnamenode01-sjc1.prod.uber.internal/10.67.143.155:8020 from yizhang sending #376 2017-03-29 23:10:57,962 DEBUG [IPC Client (85121323) connection to hadooplithiumnamenode01-sjc1.prod.uber.internal/10.67.143.155:8020 from yizhang]: org.apache.hadoop.ipc.Client: IPC Client (85121323) connection to hadooplithiumnamenode01-sjc1.prod.uber.internal/10.67.143.155:8020 from yizhang got value #376 2017-03-29 23:10:57,962 DEBUG [main]: org.apache.hadoop.ipc.ProtobufRpcEngine: Call: getFileInfo took 1ms 2017-03-29 23:10:57,963 INFO [main]:
[jira] [Updated] (HIVE-12605) Implement JDBC Connection.isValid
[ https://issues.apache.org/jira/browse/HIVE-12605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yi Zhang updated HIVE-12605: Description: http://docs.oracle.com/javase/7/docs/api/java/sql/Connection.html#isValid(int) implementation in Hive JDBC driver throws "SQLException("Method not supported")". That is a method often used by connection pooling libraries. was: http://docs.oracle.com/javase/7/docs/api/java/sql/Connection.html#isValid(int) implementation in Hive JDBC driver throws "SQLException("Method not supported")". That is a method often used by connection pooling libraries. Thanks to [~yeeza] for raising this issue. > Implement JDBC Connection.isValid > - > > Key: HIVE-12605 > URL: https://issues.apache.org/jira/browse/HIVE-12605 > Project: Hive > Issue Type: Bug > Components: HiveServer2, JDBC >Reporter: Thejas M Nair > Labels: newbie, trivial > > http://docs.oracle.com/javase/7/docs/api/java/sql/Connection.html#isValid(int) > implementation in Hive JDBC driver throws "SQLException("Method not > supported")". > That is a method often used by connection pooling libraries. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11091) Unable to load data into hive table using Load data local inapth command from unix named pipe
[ https://issues.apache.org/jira/browse/HIVE-11091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14625415#comment-14625415 ] Yi Zhang commented on HIVE-11091: - This is tracked to HDFS-8767 Unable to load data into hive table using Load data local inapth command from unix named pipe --- Key: HIVE-11091 URL: https://issues.apache.org/jira/browse/HIVE-11091 Project: Hive Issue Type: Bug Components: Hive Affects Versions: 0.14.0 Environment: Unix,MacOS Reporter: Manoranjan Sahoo Priority: Blocker Unable to load data into hive table from unix named pipe in Hive 0.14.0 Please find below the execution details in env ( Hadoop2.6.0 + Hive 0.14.0): $ mkfifo /tmp/test.txt $ hive hive create table test(id bigint,name string); OK Time taken: 1.018 seconds hive LOAD DATA LOCAL INPATH '/tmp/test.txt' OVERWRITE INTO TABLE test; Loading data to table default.test Failed with exception addFiles: filesystem error in check phase FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MoveTask But in Hadoop 1.3 and hive 0.11.0 it works fine: hive LOAD DATA LOCAL INPATH '/tmp/test.txt' OVERWRITE INTO TABLE test; Copying data from file:/tmp/test.txt Copying file: file:/tmp/test.txt -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10142) Calculating formula based on difference between each row's value and current row's in Windowing function
[ https://issues.apache.org/jira/browse/HIVE-10142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14592377#comment-14592377 ] Yi Zhang commented on HIVE-10142: - This request is more in the line as following decay variable definition: Exponential rate of change can be modeled algebraically by the following formula: N(t)=N(0)e^(−λt) where N is the quantity, N0 is the initial quantity, λ is the decay constant, and t is time. And the window function will be a summary of the value of all records in the window relative to the current record. Calculating formula based on difference between each row's value and current row's in Windowing function Key: HIVE-10142 URL: https://issues.apache.org/jira/browse/HIVE-10142 Project: Hive Issue Type: New Feature Components: PTF-Windowing Affects Versions: 1.0.0 Reporter: Yi Zhang Assignee: Aihua Xu For analytics with windowing function, the calculation formula sometimes needs to perform over each row's value against current tow's value. The decay value is a good example, such as sums of value with a decay function based on difference of timestamp between each row and current row. -- This message was sent by Atlassian JIRA (v6.3.4#6332)