[jira] [Assigned] (HIVE-23771) load数据到hive,limit 显示用户名中文正确,where 用户名乱码,并且不能使用用户名比对

2020-06-29 Thread wang (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wang reassigned HIVE-23771:
---

Assignee: (was: wang)

> load数据到hive,limit 显示用户名中文正确,where 用户名乱码,并且不能使用用户名比对
> ---
>
> Key: HIVE-23771
> URL: https://issues.apache.org/jira/browse/HIVE-23771
> Project: Hive
>  Issue Type: Bug
>  Components: CLI
>Affects Versions: 2.1.1
>Reporter: wang
>Priority: Major
> Fix For: 2.1.1
>
> Attachments: image-2020-06-29-15-04-23-999.png, 
> image-2020-06-29-15-08-25-923.png, image-2020-06-29-15-10-10-310.png
>
>
> 建表语句:create table smg_t_usr_inf_23(
> Usr_ID string,
> RlgnSvcPltfrmUsr_TpCd string,
> Rlgn_InsID string,
> Usr_Nm string ,
> ) ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.MultiDelimitSerDe' 
> WITH SERDEPROPERTIES ("field.delim"="|@|") stored as textfile
> 导入数据:LOAD DATA LOCAL INPATH '/home/ap/USR_INF 20200622_0001.dat' INTO TABLE 
> usr_inf
> select * from usr_inf limit 10;显示数据: !image-2020-06-29-15-04-23-999.png!
>  
> select * from usr_inf where usr_nm = '胡学玲' ;无显示数据: 
> !image-2020-06-29-15-08-25-923.png!
>  
> 其他select * from usr_inf where usr_id='***';显示数据 
> !image-2020-06-29-15-10-10-310.png! .
> 求大神解答,为什么导入的数据是中文但是where就有问题,直接insert into table aa select * from usr_inf;新表 
> 的usr_nm 字段也是同上 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-23771) load数据到hive,limit 显示用户名中文正确,where 用户名乱码,并且不能使用用户名比对

2020-06-29 Thread wang (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wang reassigned HIVE-23771:
---

Assignee: wang

> load数据到hive,limit 显示用户名中文正确,where 用户名乱码,并且不能使用用户名比对
> ---
>
> Key: HIVE-23771
> URL: https://issues.apache.org/jira/browse/HIVE-23771
> Project: Hive
>  Issue Type: Bug
>  Components: CLI
>Affects Versions: 2.1.1
>Reporter: wang
>Assignee: wang
>Priority: Major
> Fix For: 2.1.1
>
> Attachments: image-2020-06-29-15-04-23-999.png, 
> image-2020-06-29-15-08-25-923.png, image-2020-06-29-15-10-10-310.png
>
>
> 建表语句:create table smg_t_usr_inf_23(
> Usr_ID string,
> RlgnSvcPltfrmUsr_TpCd string,
> Rlgn_InsID string,
> Usr_Nm string ,
> ) ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.MultiDelimitSerDe' 
> WITH SERDEPROPERTIES ("field.delim"="|@|") stored as textfile
> 导入数据:LOAD DATA LOCAL INPATH '/home/ap/USR_INF 20200622_0001.dat' INTO TABLE 
> usr_inf
> select * from usr_inf limit 10;显示数据: !image-2020-06-29-15-04-23-999.png!
>  
> select * from usr_inf where usr_nm = '胡学玲' ;无显示数据: 
> !image-2020-06-29-15-08-25-923.png!
>  
> 其他select * from usr_inf where usr_id='***';显示数据 
> !image-2020-06-29-15-10-10-310.png! .
> 求大神解答,为什么导入的数据是中文但是where就有问题,直接insert into table aa select * from usr_inf;新表 
> 的usr_nm 字段也是同上 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-23725) ValidTxnManager snapshot outdating causing partial reads in merge insert

2020-06-29 Thread Denys Kuzmenko (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17147593#comment-17147593
 ] 

Denys Kuzmenko commented on HIVE-23725:
---

Pushed to master.
[~pvargacl], thank you for the patch and [~jcamachorodriguez] for the review!

> ValidTxnManager snapshot outdating causing partial reads in merge insert
> 
>
> Key: HIVE-23725
> URL: https://issues.apache.org/jira/browse/HIVE-23725
> Project: Hive
>  Issue Type: Bug
>Reporter: Peter Varga
>Assignee: Peter Varga
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 5h 40m
>  Remaining Estimate: 0h
>
> When the ValidTxnManager invalidates the snapshot during merge insert and 
> starts to read committed transactions that were not committed when the query 
> compilation happened, it can cause partial read problems if the committed 
> transaction created new partition in the source or target table.
> The solution should be not only fix the snapshot but also recompile the query 
> and acquire the locks again.
> You could construct an example like this:
> 1. open and compile transaction 1 that merge inserts data from a partitioned 
> source table that has a few partition.
> 2. Open, run and commit transaction 2 that inserts data to an old and a new 
> partition to the source table.
> 3. Open, run and commit transaction 3 that inserts data to the target table 
> of the merge statement, that will retrigger a snapshot generation in 
> transaction 1.
> 4. Run transaction 1, the snapshot will be regenerated, and it will read 
> partial data from transaction 2 breaking the ACID properties.
> Different setup.
> Switch the transaction order:
> 1. compile transaction 1 that inserts data to an old and a new partition of 
> the source table.
> 2. compile transaction 2 that insert data to the target table
> 2. compile transaction 3 that merge inserts data from the source table to the 
> target table
> 3. run and commit transaction 1
> 4. run and commit transaction 2
> 5. run transaction 3, since it cointains 1 and 2 in its snaphot the 
> isValidTxnListState will be triggered and we do a partial read of the 
> transaction 1 for the same reasons.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23725) ValidTxnManager snapshot outdating causing partial reads in merge insert

2020-06-29 Thread Denys Kuzmenko (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denys Kuzmenko updated HIVE-23725:
--
Resolution: Fixed
Status: Resolved  (was: Patch Available)

> ValidTxnManager snapshot outdating causing partial reads in merge insert
> 
>
> Key: HIVE-23725
> URL: https://issues.apache.org/jira/browse/HIVE-23725
> Project: Hive
>  Issue Type: Bug
>Reporter: Peter Varga
>Assignee: Peter Varga
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 5h 40m
>  Remaining Estimate: 0h
>
> When the ValidTxnManager invalidates the snapshot during merge insert and 
> starts to read committed transactions that were not committed when the query 
> compilation happened, it can cause partial read problems if the committed 
> transaction created new partition in the source or target table.
> The solution should be not only fix the snapshot but also recompile the query 
> and acquire the locks again.
> You could construct an example like this:
> 1. open and compile transaction 1 that merge inserts data from a partitioned 
> source table that has a few partition.
> 2. Open, run and commit transaction 2 that inserts data to an old and a new 
> partition to the source table.
> 3. Open, run and commit transaction 3 that inserts data to the target table 
> of the merge statement, that will retrigger a snapshot generation in 
> transaction 1.
> 4. Run transaction 1, the snapshot will be regenerated, and it will read 
> partial data from transaction 2 breaking the ACID properties.
> Different setup.
> Switch the transaction order:
> 1. compile transaction 1 that inserts data to an old and a new partition of 
> the source table.
> 2. compile transaction 2 that insert data to the target table
> 2. compile transaction 3 that merge inserts data from the source table to the 
> target table
> 3. run and commit transaction 1
> 4. run and commit transaction 2
> 5. run transaction 3, since it cointains 1 and 2 in its snaphot the 
> isValidTxnListState will be triggered and we do a partial read of the 
> transaction 1 for the same reasons.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-22957) Support Partition Filtering In MSCK REPAIR TABLE Command

2020-06-29 Thread Syed Shameerur Rahman (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17147592#comment-17147592
 ] 

Syed Shameerur Rahman commented on HIVE-22957:
--

Test Passed after rebased to master
cc: [~jcamachorodriguez]

> Support Partition Filtering In MSCK REPAIR TABLE Command
> 
>
> Key: HIVE-22957
> URL: https://issues.apache.org/jira/browse/HIVE-22957
> Project: Hive
>  Issue Type: Improvement
>Reporter: Syed Shameerur Rahman
>Assignee: Syed Shameerur Rahman
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: Design Doc_ Partition Filtering In MSCK REPAIR 
> TABLE.pdf, HIVE-22957.01.patch, HIVE-22957.02.patch, HIVE-22957.03.patch
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> *Design Doc:*
> [^Design Doc_ Partition Filtering In MSCK REPAIR TABLE.pdf] 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-23759) Refactor CommitTxnRequest field order

2020-06-29 Thread Denys Kuzmenko (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17147594#comment-17147594
 ] 

Denys Kuzmenko commented on HIVE-23759:
---

Pushed to master.
[~Marton Bod], thank you for the patch! 

> Refactor CommitTxnRequest field order
> -
>
> Key: HIVE-23759
> URL: https://issues.apache.org/jira/browse/HIVE-23759
> Project: Hive
>  Issue Type: Task
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Refactor CommitTxnRequest field order (keyValue and replLastIdInfo). This 
> should be a safe change as neither of these fields have been part of any 
> official Hive release.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-23759) Refactor CommitTxnRequest field order

2020-06-29 Thread Denys Kuzmenko (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denys Kuzmenko resolved HIVE-23759.
---
Resolution: Fixed

> Refactor CommitTxnRequest field order
> -
>
> Key: HIVE-23759
> URL: https://issues.apache.org/jira/browse/HIVE-23759
> Project: Hive
>  Issue Type: Task
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Refactor CommitTxnRequest field order (keyValue and replLastIdInfo). This 
> should be a safe change as neither of these fields have been part of any 
> official Hive release.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-23593) Schemainit fails with NoSuchFieldError

2020-06-29 Thread Ayush Saxena (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17147597#comment-17147597
 ] 

Ayush Saxena commented on HIVE-23593:
-

Hi [~kgyrtkirk], we too faced the same issue, while verifying {{Hadoop-3.3.0}} 
release on {{ARM}}.
This happens due to HIVE-22126, When {{calcite-core}} got shaded as well along 
with {{guava}}, but there guava was reallocated, but not {{calcite-core}} which 
was included later. As you said removing {{calcite-core}} from {{lib}} works 
since the only class that is able to load is from {{hive-exec}} jar, This makes 
things work here.
But if some one(client plugins/applications) is having {{calcite-core}} in its 
classpath  for other reasons, or it tends to be used later even somewhere other 
than where shaded. This issue will surface again.
A probable addition is we reallocate the {{calcite-core}} classes as well, 
which is done for gauva and others in Hive. This is something we follow in 
Hadoop too. So, as presence of {{calcite-core}} or addition later doesn't 
bother.
We tried reallocating in the pom.xml the {{calcite-core}} and that seems to 
work for us. That is just 2-3 lines change.

Let me know, your thoughts on this, if you don't have any issues, will raise a 
follow up JIRA to reallocate as well. Thanx :)

cc [~vinayakumarb] [~chinnaraol]

> Schemainit fails with NoSuchFieldError 
> ---
>
> Key: HIVE-23593
> URL: https://issues.apache.org/jira/browse/HIVE-23593
> Project: Hive
>  Issue Type: Bug
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> the issue comes from a calcite related class ; it's very interesting because 
> ql already has a shaded calcite
> {code}
> Caused by: java.lang.NoSuchFieldError: operands
> at 
> org.apache.hadoop.hive.ql.optimizer.calcite.translator.ExprNodeConverter.visitCall(ExprNodeConverter.java:192)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.optimizer.calcite.translator.ExprNodeConverter.visitCall(ExprNodeConverter.java:98)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at org.apache.calcite.rex.RexCall.accept(RexCall.java:191) 
> ~[calcite-core-1.21.0.jar:1.21.0]
> at 
> org.apache.hadoop.hive.ql.optimizer.calcite.HiveRexExecutorImpl.reduce(HiveRexExecutorImpl.java:56)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.type.HiveFunctionHelper.foldExpression(HiveFunctionHelper.java:544)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.type.HiveFunctionHelper.createConstantObjectInspector(HiveFunctionHelper.java:452)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.type.HiveFunctionHelper.createObjectInspector(HiveFunctionHelper.java:435)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.type.HiveFunctionHelper.getReturnType(HiveFunctionHelper.java:124)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.type.RexNodeExprFactory.createFuncCallExpr(RexNodeExprFactory.java:647)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-23593) Schemainit fails with NoSuchFieldError

2020-06-29 Thread Zoltan Haindrich (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17147603#comment-17147603
 ] 

Zoltan Haindrich commented on HIVE-23593:
-

[~ayushtkn] thank you for the details; I've also tried to relocate calcite-core 
as well in ql/pom.xml but it didn't worked for me - some issues kept surfacing.
maybe I've made some typo (or missed something) ?
this ticket also added things to make sure that we at least don't package a 
release which out-of-the-box fails with this kind of excpetion.
I guess you had a better approach fixing the shading stuff - could you open a 
jira and submit your patch?

> Schemainit fails with NoSuchFieldError 
> ---
>
> Key: HIVE-23593
> URL: https://issues.apache.org/jira/browse/HIVE-23593
> Project: Hive
>  Issue Type: Bug
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> the issue comes from a calcite related class ; it's very interesting because 
> ql already has a shaded calcite
> {code}
> Caused by: java.lang.NoSuchFieldError: operands
> at 
> org.apache.hadoop.hive.ql.optimizer.calcite.translator.ExprNodeConverter.visitCall(ExprNodeConverter.java:192)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.optimizer.calcite.translator.ExprNodeConverter.visitCall(ExprNodeConverter.java:98)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at org.apache.calcite.rex.RexCall.accept(RexCall.java:191) 
> ~[calcite-core-1.21.0.jar:1.21.0]
> at 
> org.apache.hadoop.hive.ql.optimizer.calcite.HiveRexExecutorImpl.reduce(HiveRexExecutorImpl.java:56)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.type.HiveFunctionHelper.foldExpression(HiveFunctionHelper.java:544)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.type.HiveFunctionHelper.createConstantObjectInspector(HiveFunctionHelper.java:452)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.type.HiveFunctionHelper.createObjectInspector(HiveFunctionHelper.java:435)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.type.HiveFunctionHelper.getReturnType(HiveFunctionHelper.java:124)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.type.RexNodeExprFactory.createFuncCallExpr(RexNodeExprFactory.java:647)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-11322) LLAP: Fix API usage to work with evolving Tez APIs

2020-06-29 Thread Ted Xu (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-11322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Xu updated HIVE-11322:
--
Description: TEZ-2004 for now. There's going to be additional changes 
coming in. May re-use this jira for multiple fixes as they happen to avoid a 
stream of API fix jiras.  (was: xTEZ-2004 for now. There's going to be 
additional changes coming in. May re-use this jira for multiple fixes as they 
happen to avoid a stream of API fix jiras.)

> LLAP: Fix API usage to work with evolving Tez APIs
> --
>
> Key: HIVE-11322
> URL: https://issues.apache.org/jira/browse/HIVE-11322
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
>Priority: Major
> Fix For: llap
>
> Attachments: HIVE-11322.1.TEZ2004.txt
>
>
> TEZ-2004 for now. There's going to be additional changes coming in. May 
> re-use this jira for multiple fixes as they happen to avoid a stream of API 
> fix jiras.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-11322) LLAP: Fix API usage to work with evolving Tez APIs

2020-06-29 Thread Ted Xu (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-11322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Xu updated HIVE-11322:
--
Description: xTEZ-2004 for now. There's going to be additional changes 
coming in. May re-use this jira for multiple fixes as they happen to avoid a 
stream of API fix jiras.  (was: TEZ-2004 for now. There's going to be 
additional changes coming in. May re-use this jira for multiple fixes as they 
happen to avoid a stream of API fix jiras.)

> LLAP: Fix API usage to work with evolving Tez APIs
> --
>
> Key: HIVE-11322
> URL: https://issues.apache.org/jira/browse/HIVE-11322
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
>Priority: Major
> Fix For: llap
>
> Attachments: HIVE-11322.1.TEZ2004.txt
>
>
> xTEZ-2004 for now. There's going to be additional changes coming in. May 
> re-use this jira for multiple fixes as they happen to avoid a stream of API 
> fix jiras.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-23772) Reallocate calcite-core to prevent NoSuchFiledError

2020-06-29 Thread Ayush Saxena (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ayush Saxena reassigned HIVE-23772:
---


> Reallocate calcite-core to prevent NoSuchFiledError
> ---
>
> Key: HIVE-23772
> URL: https://issues.apache.org/jira/browse/HIVE-23772
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Major
>
> Exception trace due to conflict with {{calcite-core}}
> {noformat}
> Caused by: java.lang.NoSuchFieldError: operands
> at 
> org.apache.hadoop.hive.ql.optimizer.calcite.translator.ASTConverter$RexVisitor.visitCall(ASTConverter.java:785)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.optimizer.calcite.translator.ASTConverter$RexVisitor.visitCall(ASTConverter.java:509)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at org.apache.calcite.rex.RexCall.accept(RexCall.java:191) 
> ~[calcite-core-1.21.0.jar:1.21.0]
> at 
> org.apache.hadoop.hive.ql.optimizer.calcite.translator.ASTConverter.convert(ASTConverter.java:239)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.optimizer.calcite.translator.ASTConverter.convertSource(ASTConverter.java:437)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.optimizer.calcite.translator.ASTConverter.convert(ASTConverter.java:124)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.optimizer.calcite.translator.ASTConverter.convert(ASTConverter.java:112)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.getOptimizedAST(CalcitePlanner.java:1620)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:555)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:12456)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:433)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:290)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:220) 
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:104) 
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:184) 
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:602) 
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:548) 
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:542) 
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.compileAndRespond(ReExecDriver.java:125)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:199)
>  ~[hive-service-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23738) DBLockManager::lock() : Move lock request to debug level

2020-06-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23738?focusedWorklogId=452218&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-452218
 ]

ASF GitHub Bot logged work on HIVE-23738:
-

Author: ASF GitHub Bot
Created on: 29/Jun/20 08:24
Start Date: 29/Jun/20 08:24
Worklog Time Spent: 10m 
  Work Description: pvary merged pull request #1168:
URL: https://github.com/apache/hive/pull/1168


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 452218)
Time Spent: 0.5h  (was: 20m)

> DBLockManager::lock() : Move lock request to debug level
> 
>
> Key: HIVE-23738
> URL: https://issues.apache.org/jira/browse/HIVE-23738
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Assignee: Barnabas Maidics
>Priority: Trivial
>  Labels: pull-request-available
> Attachments: q78_30tb_lock_request.log
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/lockmgr/DbLockManager.java#L102]
>  
> For Q78 @30TB scale, it ends up dumping couple of MBs of log in info level to 
> print the lock request type. If possible, this should be moved to debug level.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23772) Reallocate calcite-core to prevent NoSuchFiledError

2020-06-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-23772:
--
Labels: pull-request-available  (was: )

> Reallocate calcite-core to prevent NoSuchFiledError
> ---
>
> Key: HIVE-23772
> URL: https://issues.apache.org/jira/browse/HIVE-23772
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Exception trace due to conflict with {{calcite-core}}
> {noformat}
> Caused by: java.lang.NoSuchFieldError: operands
> at 
> org.apache.hadoop.hive.ql.optimizer.calcite.translator.ASTConverter$RexVisitor.visitCall(ASTConverter.java:785)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.optimizer.calcite.translator.ASTConverter$RexVisitor.visitCall(ASTConverter.java:509)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at org.apache.calcite.rex.RexCall.accept(RexCall.java:191) 
> ~[calcite-core-1.21.0.jar:1.21.0]
> at 
> org.apache.hadoop.hive.ql.optimizer.calcite.translator.ASTConverter.convert(ASTConverter.java:239)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.optimizer.calcite.translator.ASTConverter.convertSource(ASTConverter.java:437)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.optimizer.calcite.translator.ASTConverter.convert(ASTConverter.java:124)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.optimizer.calcite.translator.ASTConverter.convert(ASTConverter.java:112)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.getOptimizedAST(CalcitePlanner.java:1620)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:555)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:12456)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:433)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:290)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:220) 
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:104) 
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:184) 
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:602) 
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:548) 
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:542) 
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.compileAndRespond(ReExecDriver.java:125)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:199)
>  ~[hive-service-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-23738) DBLockManager::lock() : Move lock request to debug level

2020-06-29 Thread Peter Vary (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Vary resolved HIVE-23738.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Pushed to master.
Thanks for the patch [~b.maidics]!

> DBLockManager::lock() : Move lock request to debug level
> 
>
> Key: HIVE-23738
> URL: https://issues.apache.org/jira/browse/HIVE-23738
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Assignee: Barnabas Maidics
>Priority: Trivial
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: q78_30tb_lock_request.log
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/lockmgr/DbLockManager.java#L102]
>  
> For Q78 @30TB scale, it ends up dumping couple of MBs of log in info level to 
> print the lock request type. If possible, this should be moved to debug level.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23772) Reallocate calcite-core to prevent NoSuchFiledError

2020-06-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23772?focusedWorklogId=45&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-45
 ]

ASF GitHub Bot logged work on HIVE-23772:
-

Author: ASF GitHub Bot
Created on: 29/Jun/20 08:26
Start Date: 29/Jun/20 08:26
Worklog Time Spent: 10m 
  Work Description: ayushtkn opened a new pull request #1187:
URL: https://github.com/apache/hive/pull/1187


   https://issues.apache.org/jira/browse/HIVE-23772
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 45)
Remaining Estimate: 0h
Time Spent: 10m

> Reallocate calcite-core to prevent NoSuchFiledError
> ---
>
> Key: HIVE-23772
> URL: https://issues.apache.org/jira/browse/HIVE-23772
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Exception trace due to conflict with {{calcite-core}}
> {noformat}
> Caused by: java.lang.NoSuchFieldError: operands
> at 
> org.apache.hadoop.hive.ql.optimizer.calcite.translator.ASTConverter$RexVisitor.visitCall(ASTConverter.java:785)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.optimizer.calcite.translator.ASTConverter$RexVisitor.visitCall(ASTConverter.java:509)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at org.apache.calcite.rex.RexCall.accept(RexCall.java:191) 
> ~[calcite-core-1.21.0.jar:1.21.0]
> at 
> org.apache.hadoop.hive.ql.optimizer.calcite.translator.ASTConverter.convert(ASTConverter.java:239)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.optimizer.calcite.translator.ASTConverter.convertSource(ASTConverter.java:437)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.optimizer.calcite.translator.ASTConverter.convert(ASTConverter.java:124)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.optimizer.calcite.translator.ASTConverter.convert(ASTConverter.java:112)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.getOptimizedAST(CalcitePlanner.java:1620)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:555)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:12456)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:433)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:290)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:220) 
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:104) 
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:184) 
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:602) 
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:548) 
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:542) 
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.compileAndRespond(ReExecDriver.java:125)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:199)
>  ~[hive-service-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-23593) Schemainit fails with NoSuchFieldError

2020-06-29 Thread Ayush Saxena (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17147615#comment-17147615
 ] 

Ayush Saxena commented on HIVE-23593:
-

Thanx [~kgyrtkirk] for the response. Have raised HIVE-23772 with the changes.

> Schemainit fails with NoSuchFieldError 
> ---
>
> Key: HIVE-23593
> URL: https://issues.apache.org/jira/browse/HIVE-23593
> Project: Hive
>  Issue Type: Bug
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> the issue comes from a calcite related class ; it's very interesting because 
> ql already has a shaded calcite
> {code}
> Caused by: java.lang.NoSuchFieldError: operands
> at 
> org.apache.hadoop.hive.ql.optimizer.calcite.translator.ExprNodeConverter.visitCall(ExprNodeConverter.java:192)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.optimizer.calcite.translator.ExprNodeConverter.visitCall(ExprNodeConverter.java:98)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at org.apache.calcite.rex.RexCall.accept(RexCall.java:191) 
> ~[calcite-core-1.21.0.jar:1.21.0]
> at 
> org.apache.hadoop.hive.ql.optimizer.calcite.HiveRexExecutorImpl.reduce(HiveRexExecutorImpl.java:56)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.type.HiveFunctionHelper.foldExpression(HiveFunctionHelper.java:544)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.type.HiveFunctionHelper.createConstantObjectInspector(HiveFunctionHelper.java:452)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.type.HiveFunctionHelper.createObjectInspector(HiveFunctionHelper.java:435)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.type.HiveFunctionHelper.getReturnType(HiveFunctionHelper.java:124)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.type.RexNodeExprFactory.createFuncCallExpr(RexNodeExprFactory.java:647)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23755) Fix Ranger Url extra slash

2020-06-29 Thread Aasha Medhi (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aasha Medhi updated HIVE-23755:
---
Status: In Progress  (was: Patch Available)

> Fix Ranger Url extra slash
> --
>
> Key: HIVE-23755
> URL: https://issues.apache.org/jira/browse/HIVE-23755
> Project: Hive
>  Issue Type: Task
>Reporter: Aasha Medhi
>Assignee: Aasha Medhi
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23755.01.patch, HIVE-23755.02.patch, 
> HIVE-23755.03.patch
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23755) Fix Ranger Url extra slash

2020-06-29 Thread Aasha Medhi (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aasha Medhi updated HIVE-23755:
---
Attachment: HIVE-23755.03.patch
Status: Patch Available  (was: In Progress)

> Fix Ranger Url extra slash
> --
>
> Key: HIVE-23755
> URL: https://issues.apache.org/jira/browse/HIVE-23755
> Project: Hive
>  Issue Type: Task
>Reporter: Aasha Medhi
>Assignee: Aasha Medhi
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23755.01.patch, HIVE-23755.02.patch, 
> HIVE-23755.03.patch
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23772) Relocate calcite-core to prevent NoSuchFiledError

2020-06-29 Thread Stamatis Zampetakis (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stamatis Zampetakis updated HIVE-23772:
---
Summary: Relocate calcite-core to prevent NoSuchFiledError  (was: 
Reallocate calcite-core to prevent NoSuchFiledError)

> Relocate calcite-core to prevent NoSuchFiledError
> -
>
> Key: HIVE-23772
> URL: https://issues.apache.org/jira/browse/HIVE-23772
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Exception trace due to conflict with {{calcite-core}}
> {noformat}
> Caused by: java.lang.NoSuchFieldError: operands
> at 
> org.apache.hadoop.hive.ql.optimizer.calcite.translator.ASTConverter$RexVisitor.visitCall(ASTConverter.java:785)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.optimizer.calcite.translator.ASTConverter$RexVisitor.visitCall(ASTConverter.java:509)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at org.apache.calcite.rex.RexCall.accept(RexCall.java:191) 
> ~[calcite-core-1.21.0.jar:1.21.0]
> at 
> org.apache.hadoop.hive.ql.optimizer.calcite.translator.ASTConverter.convert(ASTConverter.java:239)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.optimizer.calcite.translator.ASTConverter.convertSource(ASTConverter.java:437)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.optimizer.calcite.translator.ASTConverter.convert(ASTConverter.java:124)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.optimizer.calcite.translator.ASTConverter.convert(ASTConverter.java:112)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.getOptimizedAST(CalcitePlanner.java:1620)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:555)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:12456)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:433)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:290)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:220) 
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:104) 
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:184) 
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:602) 
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:548) 
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:542) 
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.compileAndRespond(ReExecDriver.java:125)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:199)
>  ~[hive-service-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-23773) Support multi-key probe MapJoins

2020-06-29 Thread Panagiotis Garefalakis (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Panagiotis Garefalakis reassigned HIVE-23773:
-


> Support multi-key probe MapJoins
> 
>
> Key: HIVE-23773
> URL: https://issues.apache.org/jira/browse/HIVE-23773
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Panagiotis Garefalakis
>Assignee: Panagiotis Garefalakis
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-22015) [CachedStore] Cache table constraints in CachedStore

2020-06-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22015?focusedWorklogId=452256&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-452256
 ]

ASF GitHub Bot logged work on HIVE-22015:
-

Author: ASF GitHub Bot
Created on: 29/Jun/20 11:29
Start Date: 29/Jun/20 11:29
Worklog Time Spent: 10m 
  Work Description: adesh-rao commented on a change in pull request #1109:
URL: https://github.com/apache/hive/pull/1109#discussion_r446898404



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/cache/CachedStore.java
##
@@ -543,10 +556,24 @@ static void prewarm(RawStore rawStore) {
 tableColStats = rawStore.getTableColumnStatistics(catName, 
dbName, tblName, colNames, CacheUtils.HIVE_ENGINE);
 Deadline.stopTimer();
   }
+  Deadline.startTimer("getPrimaryKeys");
+  rawStore.getPrimaryKeys(catName, dbName, tblName);
+  Deadline.stopTimer();
+  Deadline.startTimer("getForeignKeys");
+  rawStore.getForeignKeys(catName, null, null, dbName, tblName);
+  Deadline.stopTimer();
+  Deadline.startTimer("getUniqueConstraints");
+  rawStore.getUniqueConstraints(catName, dbName, tblName);
+  Deadline.stopTimer();
+  Deadline.startTimer("getNotNullConstraints");
+  rawStore.getNotNullConstraints(catName, dbName, tblName);
+  Deadline.stopTimer();
+
   // If the table could not cached due to memory limit, stop 
prewarm
   boolean isSuccess = sharedCache
   .populateTableInCache(table, tableColStats, partitions, 
partitionColStats, aggrStatsAllPartitions,

Review comment:
   Done. Though the new class just contains constraints objects for now, we 
can have a different refactoring jira for partition/column stat that can also 
refactor the array created to store size/dirtyCache variable.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 452256)
Time Spent: 50m  (was: 40m)

> [CachedStore] Cache table constraints in CachedStore
> 
>
> Key: HIVE-22015
> URL: https://issues.apache.org/jira/browse/HIVE-22015
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Daniel Dai
>Assignee: Adesh Kumar Rao
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Currently table constraints are not cached. Hive will pull all constraints 
> from tables involved in query, which results multiple db reads (including 
> get_primary_keys, get_foreign_keys, get_unique_constraints, etc). The effort 
> to cache this is small as it's just another table component.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-22015) [CachedStore] Cache table constraints in CachedStore

2020-06-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22015?focusedWorklogId=452263&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-452263
 ]

ASF GitHub Bot logged work on HIVE-22015:
-

Author: ASF GitHub Bot
Created on: 29/Jun/20 11:33
Start Date: 29/Jun/20 11:33
Worklog Time Spent: 10m 
  Work Description: adesh-rao commented on a change in pull request #1109:
URL: https://github.com/apache/hive/pull/1109#discussion_r446900743



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/cache/CachedStore.java
##
@@ -2497,26 +2599,82 @@ long getPartsFound() {
 
   @Override public List getPrimaryKeys(String catName, String 
dbName, String tblName)
   throws MetaException {
-// TODO constraintCache
-return rawStore.getPrimaryKeys(catName, dbName, tblName);
+catName = normalizeIdentifier(catName);
+dbName = StringUtils.normalizeIdentifier(dbName);
+tblName = StringUtils.normalizeIdentifier(tblName);
+if (!shouldCacheTable(catName, dbName, tblName) || (canUseEvents && 
rawStore.isActiveTransaction())) {
+  return rawStore.getPrimaryKeys(catName, dbName, tblName);
+}
+
+Table tbl = sharedCache.getTableFromCache(catName, dbName, tblName);
+if (tbl == null) {
+  // The table containing the primary keys is not yet loaded in cache
+  return rawStore.getPrimaryKeys(catName, dbName, tblName);
+}
+List keys = sharedCache.listCachedPrimaryKeys(catName, 
dbName, tblName);

Review comment:
   Yes, While updating the cache, there is a possibility that table got 
updated but constraints didn't (they are yet to be updated). But this is 
similar to partition/columnStats caching.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 452263)
Time Spent: 1h  (was: 50m)

> [CachedStore] Cache table constraints in CachedStore
> 
>
> Key: HIVE-22015
> URL: https://issues.apache.org/jira/browse/HIVE-22015
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Daniel Dai
>Assignee: Adesh Kumar Rao
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Currently table constraints are not cached. Hive will pull all constraints 
> from tables involved in query, which results multiple db reads (including 
> get_primary_keys, get_foreign_keys, get_unique_constraints, etc). The effort 
> to cache this is small as it's just another table component.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23772) Relocate calcite-core to prevent NoSuchFiledError

2020-06-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23772?focusedWorklogId=452262&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-452262
 ]

ASF GitHub Bot logged work on HIVE-23772:
-

Author: ASF GitHub Bot
Created on: 29/Jun/20 11:33
Start Date: 29/Jun/20 11:33
Worklog Time Spent: 10m 
  Work Description: ayushtkn commented on pull request #1187:
URL: https://github.com/apache/hive/pull/1187#issuecomment-651055461


   Have updated itest as well, which caused the compilation to fail. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 452262)
Time Spent: 20m  (was: 10m)

> Relocate calcite-core to prevent NoSuchFiledError
> -
>
> Key: HIVE-23772
> URL: https://issues.apache.org/jira/browse/HIVE-23772
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Exception trace due to conflict with {{calcite-core}}
> {noformat}
> Caused by: java.lang.NoSuchFieldError: operands
> at 
> org.apache.hadoop.hive.ql.optimizer.calcite.translator.ASTConverter$RexVisitor.visitCall(ASTConverter.java:785)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.optimizer.calcite.translator.ASTConverter$RexVisitor.visitCall(ASTConverter.java:509)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at org.apache.calcite.rex.RexCall.accept(RexCall.java:191) 
> ~[calcite-core-1.21.0.jar:1.21.0]
> at 
> org.apache.hadoop.hive.ql.optimizer.calcite.translator.ASTConverter.convert(ASTConverter.java:239)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.optimizer.calcite.translator.ASTConverter.convertSource(ASTConverter.java:437)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.optimizer.calcite.translator.ASTConverter.convert(ASTConverter.java:124)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.optimizer.calcite.translator.ASTConverter.convert(ASTConverter.java:112)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.getOptimizedAST(CalcitePlanner.java:1620)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:555)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:12456)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:433)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:290)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:220) 
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:104) 
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:184) 
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:602) 
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:548) 
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:542) 
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.compileAndRespond(ReExecDriver.java:125)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:199)
>  ~[hive-service-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-19549) Enable TestAcidOnTez#testCtasTezUnion

2020-06-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-19549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-19549:
--
Labels: pull-request-available  (was: )

> Enable TestAcidOnTez#testCtasTezUnion
> -
>
> Key: HIVE-19549
> URL: https://issues.apache.org/jira/browse/HIVE-19549
> Project: Hive
>  Issue Type: Test
>  Components: Test
>Affects Versions: 3.1.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Krisztian Kasa
>Priority: Critical
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-19549) Enable TestAcidOnTez#testCtasTezUnion

2020-06-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-19549?focusedWorklogId=452272&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-452272
 ]

ASF GitHub Bot logged work on HIVE-19549:
-

Author: ASF GitHub Bot
Created on: 29/Jun/20 11:57
Start Date: 29/Jun/20 11:57
Worklog Time Spent: 10m 
  Work Description: kasakrisz opened a new pull request #1188:
URL: https://github.com/apache/hive/pull/1188


   Testing done:
   ```
   mvn test -Dtest=TestAcidOnTez#testCtasTezUnion -pl itests/hive-unit -Pitests
   ```



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 452272)
Remaining Estimate: 0h
Time Spent: 10m

> Enable TestAcidOnTez#testCtasTezUnion
> -
>
> Key: HIVE-19549
> URL: https://issues.apache.org/jira/browse/HIVE-19549
> Project: Hive
>  Issue Type: Test
>  Components: Test
>Affects Versions: 3.1.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Krisztian Kasa
>Priority: Critical
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-23774) Reduce log level at aggrColStatsForPartitions in MetaStoreDirectSql.java

2020-06-29 Thread Barnabas Maidics (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barnabas Maidics reassigned HIVE-23774:
---


> Reduce log level at aggrColStatsForPartitions in MetaStoreDirectSql.java
> 
>
> Key: HIVE-23774
> URL: https://issues.apache.org/jira/browse/HIVE-23774
> Project: Hive
>  Issue Type: Improvement
>Reporter: Barnabas Maidics
>Assignee: Barnabas Maidics
>Priority: Trivial
>
> [https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java#L1589]
> This log is not needed at INFO log level.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work started] (HIVE-23774) Reduce log level at aggrColStatsForPartitions in MetaStoreDirectSql.java

2020-06-29 Thread Barnabas Maidics (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-23774 started by Barnabas Maidics.
---
> Reduce log level at aggrColStatsForPartitions in MetaStoreDirectSql.java
> 
>
> Key: HIVE-23774
> URL: https://issues.apache.org/jira/browse/HIVE-23774
> Project: Hive
>  Issue Type: Improvement
>Reporter: Barnabas Maidics
>Assignee: Barnabas Maidics
>Priority: Trivial
>
> [https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java#L1589]
> This log is not needed at INFO log level.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23774) Reduce log level at aggrColStatsForPartitions in MetaStoreDirectSql.java

2020-06-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-23774:
--
Labels: pull-request-available  (was: )

> Reduce log level at aggrColStatsForPartitions in MetaStoreDirectSql.java
> 
>
> Key: HIVE-23774
> URL: https://issues.apache.org/jira/browse/HIVE-23774
> Project: Hive
>  Issue Type: Improvement
>Reporter: Barnabas Maidics
>Assignee: Barnabas Maidics
>Priority: Trivial
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> [https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java#L1589]
> This log is not needed at INFO log level.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23774) Reduce log level at aggrColStatsForPartitions in MetaStoreDirectSql.java

2020-06-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23774?focusedWorklogId=452281&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-452281
 ]

ASF GitHub Bot logged work on HIVE-23774:
-

Author: ASF GitHub Bot
Created on: 29/Jun/20 12:14
Start Date: 29/Jun/20 12:14
Worklog Time Spent: 10m 
  Work Description: bmaidics opened a new pull request #1189:
URL: https://github.com/apache/hive/pull/1189


   This log is not needed at INFO log level.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 452281)
Remaining Estimate: 0h
Time Spent: 10m

> Reduce log level at aggrColStatsForPartitions in MetaStoreDirectSql.java
> 
>
> Key: HIVE-23774
> URL: https://issues.apache.org/jira/browse/HIVE-23774
> Project: Hive
>  Issue Type: Improvement
>Reporter: Barnabas Maidics
>Assignee: Barnabas Maidics
>Priority: Trivial
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> [https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java#L1589]
> This log is not needed at INFO log level.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23611) Mandate fully qualified absolute path for external table base dir during REPL operation

2020-06-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23611?focusedWorklogId=452316&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-452316
 ]

ASF GitHub Bot logged work on HIVE-23611:
-

Author: ASF GitHub Bot
Created on: 29/Jun/20 13:30
Start Date: 29/Jun/20 13:30
Worklog Time Spent: 10m 
  Work Description: aasha commented on a change in pull request #1120:
URL: https://github.com/apache/hive/pull/1120#discussion_r444723985



##
File path: 
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestReplicationScenariosExternalTables.java
##
@@ -517,7 +508,7 @@ public void externalTableIncrementalReplication() throws 
Throwable {
 + "'")
 .run("alter table t1 add partition(country='india')")
 .run("alter table t1 add partition(country='us')")
-.dump(primaryDbName, withClause);

Review comment:
   why is the withclause removed from here?

##
File path: 
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/BaseReplicationAcrossInstances.java
##
@@ -103,6 +114,12 @@ public static void classLevelTearDown() throws IOException 
{
 replica.close();
   }
 
+  private static void setReplicaExternalBase(FileSystem fs, Map confMap) throws IOException {
+fs.mkdirs(REPLICA_EXTERNAL_BASE);
+fullyQualifiedReplicaExternalBase =  
fs.getFileStatus(REPLICA_EXTERNAL_BASE).getPath().toString();
+confMap.put(HiveConf.ConfVars.REPL_EXTERNAL_TABLE_BASE_DIR.varname, 
fullyQualifiedReplicaExternalBase);

Review comment:
   this is set at 104 line also

##
File path: 
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/ReplicationTestUtils.java
##
@@ -502,19 +502,11 @@ public static void insertForMerge(WarehouseInstance 
primary, String primaryDbNam
 "creation", "creation", "merge_update", "merge_insert", 
"merge_insert"});
   }
 
-  public static List externalTableBasePathWithClause(String 
replExternalBase, WarehouseInstance replica)
-  throws IOException, SemanticException {
-Path externalTableLocation = new Path(replExternalBase);
-DistributedFileSystem fileSystem = replica.miniDFSCluster.getFileSystem();
-externalTableLocation = 
PathBuilder.fullyQualifiedHDFSUri(externalTableLocation, fileSystem);
-fileSystem.mkdirs(externalTableLocation);
-
-// this is required since the same filesystem is used in both source and 
target
-return Arrays.asList(
-"'" + HiveConf.ConfVars.REPL_EXTERNAL_TABLE_BASE_DIR.varname + 
"'='"
-+ externalTableLocation.toString() + "'",
-"'distcp.options.pugpb'=''"
-);
+  public static List externalTableClause(boolean enable) {

Review comment:
   should this be include external table clause

##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/exec/repl/ReplExternalTables.java
##
@@ -66,6 +67,9 @@ private ReplExternalTables(){}
 
   public static String externalTableLocation(HiveConf hiveConf, String 
location) throws SemanticException {
 String baseDir = 
hiveConf.get(HiveConf.ConfVars.REPL_EXTERNAL_TABLE_BASE_DIR.varname);
+if (StringUtils.isEmpty(baseDir)) {

Review comment:
   At the time of load the REPL_EXTERNAL_TABLE_BASE_DIR fully qualified 
path is not needed?

##
File path: 
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestReplicationScenariosExternalTables.java
##
@@ -503,8 +495,7 @@ public void externalTableIncrementalCheckpointing() throws 
Throwable {
 
   @Test
   public void externalTableIncrementalReplication() throws Throwable {
-List withClause = externalTableBasePathWithClause();
-WarehouseInstance.Tuple tuple = primary.dump(primaryDbName, withClause);
+WarehouseInstance.Tuple tuple = primary.dump(primaryDbName);

Review comment:
   with clause removed?

##
File path: 
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestReplicationScenariosExternalTables.java
##
@@ -159,14 +154,14 @@ public void externalTableReplicationWithDefaultPaths() 
throws Throwable {
 .run("insert into table t2 partition(country='india') values 
('bangalore')")
 .run("insert into table t2 partition(country='us') values ('austin')")
 .run("insert into table t2 partition(country='france') values 
('paris')")
-.dump(primaryDbName, withClauseOptions);
+.dump(primaryDbName);

Review comment:
   why is the with clause removed?

##
File path: 
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestReplicationScenariosExternalTables.java
##
@@ -623,7 +612,7 @@ public void bootstrapExternalTablesDuringIncrementalPhase() 
throws Throwable {
 assertFalse(primary.miniDFSCluster.getFileSystem()
 .exists(new Path(metadataPath + relativeExtInfoPath(null;
 
-replica.load(repli

[jira] [Commented] (HIVE-23755) Fix Ranger Url extra slash

2020-06-29 Thread Pravin Sinha (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17147787#comment-17147787
 ] 

Pravin Sinha commented on HIVE-23755:
-

+1

> Fix Ranger Url extra slash
> --
>
> Key: HIVE-23755
> URL: https://issues.apache.org/jira/browse/HIVE-23755
> Project: Hive
>  Issue Type: Task
>Reporter: Aasha Medhi
>Assignee: Aasha Medhi
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23755.01.patch, HIVE-23755.02.patch, 
> HIVE-23755.03.patch
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23638) Fix FindBug issues in hive-common

2020-06-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23638?focusedWorklogId=452324&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-452324
 ]

ASF GitHub Bot logged work on HIVE-23638:
-

Author: ASF GitHub Bot
Created on: 29/Jun/20 13:52
Start Date: 29/Jun/20 13:52
Worklog Time Spent: 10m 
  Work Description: belugabehr commented on a change in pull request #1161:
URL: https://github.com/apache/hive/pull/1161#discussion_r446983489



##
File path: common/src/java/org/apache/hadoop/hive/common/FileUtils.java
##
@@ -483,12 +483,6 @@ public static boolean 
isActionPermittedForFileHierarchy(FileSystem fs, FileStatu
   String userName, FsAction action, boolean recurse) throws Exception {
 boolean isDir = fileStatus.isDir();
 
-FsAction dirActionNeeded = action;
-if (isDir) {
-  // for dirs user needs execute privileges as well
-  dirActionNeeded.and(FsAction.EXECUTE);
-}
-

Review comment:
   So, I understand why this would be removed from a find-bugs perspective 
(this is a no-op), but this is actually an all around bug.  This should be:
   
   ```
   // for dirs user needs execute privileges as well
   FsAction dirActionNeeded = (isDir) ? action.and(FsAction.EXECUTE) : action;
   ```

##
File path: common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
##
@@ -6475,17 +6477,17 @@ private static boolean isAllowed(Configuration conf, 
ConfVars setting) {
   }
 
   public static String getNonMrEngines() {
-String result = StringUtils.EMPTY;
+StringBuffer result = new StringBuffer();
 for (String s : ConfVars.HIVE_EXECUTION_ENGINE.getValidStringValues()) {
   if ("mr".equals(s)) {
 continue;
   }
-  if (!result.isEmpty()) {
-result += ", ";
+  if (result.length() != 0) {
+result.append(", ");
   }
-  result += s;
+  result.append(s);
 }
-return result;
+return result.toString();

Review comment:
   Please change this to just use String#join and more human friendly.
   
   ```
   Set engines = new 
HashSet<>(ConfVars.HIVE_EXECUTION_ENGINE.getValidStringValues());
   boolean removedMR = engines.remove("mr");
   LOG.debug("Found and removed MapReduce engine from list of valid execution 
engines: {}", removedMR);
   return String.join(", ", engines);
   ```





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 452324)
Time Spent: 40m  (was: 0.5h)

> Fix FindBug issues in hive-common
> -
>
> Key: HIVE-23638
> URL: https://issues.apache.org/jira/browse/HIVE-23638
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Panagiotis Garefalakis
>Assignee: Panagiotis Garefalakis
>Priority: Major
>  Labels: pull-request-available
> Attachments: spotbugsXml.xml
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> mvn -Pspotbugs 
> -Dorg.slf4j.simpleLogger.log.org.apache.maven.plugin.surefire.SurefirePlugin=INFO
>  -pl :hive-common test-compile 
> com.github.spotbugs:spotbugs-maven-plugin:4.0.0:check



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-22676) Replace Base64 in hive-service Package

2020-06-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22676?focusedWorklogId=452334&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-452334
 ]

ASF GitHub Bot logged work on HIVE-22676:
-

Author: ASF GitHub Bot
Created on: 29/Jun/20 13:59
Start Date: 29/Jun/20 13:59
Worklog Time Spent: 10m 
  Work Description: belugabehr closed pull request #1090:
URL: https://github.com/apache/hive/pull/1090


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 452334)
Time Spent: 20m  (was: 10m)

> Replace Base64 in hive-service Package
> --
>
> Key: HIVE-22676
> URL: https://issues.apache.org/jira/browse/HIVE-22676
> Project: Hive
>  Issue Type: Sub-task
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
>  Labels: pull-request-available
> Attachments: HIVE-22676.1.patch, HIVE-22676.2.patch
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-22676) Replace Base64 in hive-service Package

2020-06-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22676?focusedWorklogId=452336&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-452336
 ]

ASF GitHub Bot logged work on HIVE-22676:
-

Author: ASF GitHub Bot
Created on: 29/Jun/20 14:00
Start Date: 29/Jun/20 14:00
Worklog Time Spent: 10m 
  Work Description: belugabehr opened a new pull request #1090:
URL: https://github.com/apache/hive/pull/1090


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 452336)
Time Spent: 0.5h  (was: 20m)

> Replace Base64 in hive-service Package
> --
>
> Key: HIVE-22676
> URL: https://issues.apache.org/jira/browse/HIVE-22676
> Project: Hive
>  Issue Type: Sub-task
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
>  Labels: pull-request-available
> Attachments: HIVE-22676.1.patch, HIVE-22676.2.patch
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23671) MSCK repair should handle transactional tables in certain usecases

2020-06-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23671?focusedWorklogId=452357&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-452357
 ]

ASF GitHub Bot logged work on HIVE-23671:
-

Author: ASF GitHub Bot
Created on: 29/Jun/20 14:11
Start Date: 29/Jun/20 14:11
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on a change in pull request #1087:
URL: https://github.com/apache/hive/pull/1087#discussion_r447001420



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java
##
@@ -8322,6 +8322,22 @@ public AllocateTableWriteIdsResponse 
allocate_table_write_ids(
   return response;
 }
 
+@Override
+public MaxAllocatedTableWriteIdResponse 
get_max_allocated_table_write_id(MaxAllocatedTableWriteIdRequest rqst)

Review comment:
   weird mix of Camel case and underscore?

##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java
##
@@ -8322,6 +8322,22 @@ public AllocateTableWriteIdsResponse 
allocate_table_write_ids(
   return response;
 }
 
+@Override
+public MaxAllocatedTableWriteIdResponse 
get_max_allocated_table_write_id(MaxAllocatedTableWriteIdRequest rqst)

Review comment:
   weird mix of Camel case and underscore





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 452357)
Time Spent: 2h 10m  (was: 2h)

> MSCK repair should handle transactional tables in certain usecases
> --
>
> Key: HIVE-23671
> URL: https://issues.apache.org/jira/browse/HIVE-23671
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Reporter: Peter Varga
>Assignee: Peter Varga
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> The MSCK REPAIR tool does not handle transactional tables too well. It can 
> find and add new partitions the same way as for non-transactional tables, but 
> since the writeId differences are not handled, the data can not read back 
> from the new partitions.
> We could handle some usecases when the writeIds in the HMS and the underlying 
> data are not conflicting. If the HMS does not contains allocated writes for 
> the table we can seed the table with the writeIds read from the directory 
> structrure.
> Real life use cases could be:
>  * Copy data files from one cluster to another with different HMS, create the 
> table and call MSCK REPAIR
>  * If the HMS db is lost, recreate the table and call MSCK REPAIR
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23671) MSCK repair should handle transactional tables in certain usecases

2020-06-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23671?focusedWorklogId=452354&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-452354
 ]

ASF GitHub Bot logged work on HIVE-23671:
-

Author: ASF GitHub Bot
Created on: 29/Jun/20 14:11
Start Date: 29/Jun/20 14:11
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on a change in pull request #1087:
URL: https://github.com/apache/hive/pull/1087#discussion_r447001420



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java
##
@@ -8322,6 +8322,22 @@ public AllocateTableWriteIdsResponse 
allocate_table_write_ids(
   return response;
 }
 
+@Override
+public MaxAllocatedTableWriteIdResponse 
get_max_allocated_table_write_id(MaxAllocatedTableWriteIdRequest rqst)

Review comment:
   why not Camel case?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 452354)
Time Spent: 2h  (was: 1h 50m)

> MSCK repair should handle transactional tables in certain usecases
> --
>
> Key: HIVE-23671
> URL: https://issues.apache.org/jira/browse/HIVE-23671
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Reporter: Peter Varga
>Assignee: Peter Varga
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> The MSCK REPAIR tool does not handle transactional tables too well. It can 
> find and add new partitions the same way as for non-transactional tables, but 
> since the writeId differences are not handled, the data can not read back 
> from the new partitions.
> We could handle some usecases when the writeIds in the HMS and the underlying 
> data are not conflicting. If the HMS does not contains allocated writes for 
> the table we can seed the table with the writeIds read from the directory 
> structrure.
> Real life use cases could be:
>  * Copy data files from one cluster to another with different HMS, create the 
> table and call MSCK REPAIR
>  * If the HMS db is lost, recreate the table and call MSCK REPAIR
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23671) MSCK repair should handle transactional tables in certain usecases

2020-06-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23671?focusedWorklogId=452364&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-452364
 ]

ASF GitHub Bot logged work on HIVE-23671:
-

Author: ASF GitHub Bot
Created on: 29/Jun/20 14:14
Start Date: 29/Jun/20 14:14
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on a change in pull request #1087:
URL: https://github.com/apache/hive/pull/1087#discussion_r447003871



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStoreChecker.java
##
@@ -111,24 +120,24 @@ public IMetaStoreClient getMsc() {
* @param partitions
*  List of partition name value pairs, if null or empty check all
*  partitions
-   * @param table
-   * @param result
-   *  Fill this with the results of the check
+   * @param table Table we want to run the check for.
+   * @return Results of the check
* @throws MetastoreException
*   Failed to get required information from the metastore.
* @throws IOException
*   Most likely filesystem related
*/
-  public void checkMetastore(String catName, String dbName, String tableName,
-  List> partitions, Table table, CheckResult 
result)
+  public CheckResult checkMetastore(String catName, String dbName, String 
tableName,
+  List> partitions, Table table)
   throws MetastoreException, IOException {
-
+CheckResult result = new CheckResult();
 if (dbName == null || "".equalsIgnoreCase(dbName)) {
   dbName = Warehouse.DEFAULT_DATABASE_NAME;
 }
 
 try {
   if (tableName == null || "".equals(tableName)) {
+// TODO: I do not think this is used by anything other than tests

Review comment:
   should we answer this question in a current patch?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 452364)
Time Spent: 2h 20m  (was: 2h 10m)

> MSCK repair should handle transactional tables in certain usecases
> --
>
> Key: HIVE-23671
> URL: https://issues.apache.org/jira/browse/HIVE-23671
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Reporter: Peter Varga
>Assignee: Peter Varga
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> The MSCK REPAIR tool does not handle transactional tables too well. It can 
> find and add new partitions the same way as for non-transactional tables, but 
> since the writeId differences are not handled, the data can not read back 
> from the new partitions.
> We could handle some usecases when the writeIds in the HMS and the underlying 
> data are not conflicting. If the HMS does not contains allocated writes for 
> the table we can seed the table with the writeIds read from the directory 
> structrure.
> Real life use cases could be:
>  * Copy data files from one cluster to another with different HMS, create the 
> table and call MSCK REPAIR
>  * If the HMS db is lost, recreate the table and call MSCK REPAIR
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23671) MSCK repair should handle transactional tables in certain usecases

2020-06-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23671?focusedWorklogId=452396&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-452396
 ]

ASF GitHub Bot logged work on HIVE-23671:
-

Author: ASF GitHub Bot
Created on: 29/Jun/20 14:47
Start Date: 29/Jun/20 14:47
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on a change in pull request #1087:
URL: https://github.com/apache/hive/pull/1087#discussion_r447028756



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStoreChecker.java
##
@@ -429,6 +451,75 @@ void findUnknownPartitions(Table table, Set 
partPaths,
 LOG.debug("Number of partitions not in metastore : " + 
result.getPartitionsNotInMs().size());
   }
 
+  /**
+   * Calculate the maximum seen writeId from the acid directory structure
+   * @param partPath Path of the partition directory
+   * @param res Partition result to write the max ids
+   * @throws IOException ex
+   */
+  private void setMaxTxnAndWriteIdFromPartition(Path partPath, 
CheckResult.PartitionResult res) throws IOException {
+FileSystem fs = partPath.getFileSystem(conf);
+FileStatus[] deltaOrBaseFiles = fs.listStatus(partPath, 
HIDDEN_FILES_PATH_FILTER);
+
+// Read the writeIds from every base and delta directory and find the max
+long maxWriteId = 0L;
+long maxVisibilityId = 0L;
+for(FileStatus fileStatus : deltaOrBaseFiles) {
+  if (!fileStatus.isDirectory()) {
+continue;
+  }
+  long writeId = 0L;
+  long visibilityId = 0L;
+  String folder = fileStatus.getPath().getName();
+  if (folder.startsWith(BASE_PREFIX)) {
+visibilityId = getVisibilityTxnId(folder);
+if (visibilityId > 0) {
+  folder = removeVisibilityTxnId(folder);
+}
+writeId = Long.parseLong(folder.substring(BASE_PREFIX.length()));
+  } else if (folder.startsWith(DELTA_PREFIX) || 
folder.startsWith(DELETE_DELTA_PREFIX)) {
+// See AcidUtils.parseDelta
+visibilityId = getVisibilityTxnId(folder);
+if (visibilityId > 0) {
+  folder = removeVisibilityTxnId(folder);
+}
+boolean isDeleteDelta = folder.startsWith(DELETE_DELTA_PREFIX);
+String rest = folder.substring((isDeleteDelta ? DELETE_DELTA_PREFIX : 
DELTA_PREFIX).length());
+int split = rest.indexOf('_');

Review comment:
   why not use rest.split('_')

##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStoreChecker.java
##
@@ -429,6 +451,75 @@ void findUnknownPartitions(Table table, Set 
partPaths,
 LOG.debug("Number of partitions not in metastore : " + 
result.getPartitionsNotInMs().size());
   }
 
+  /**
+   * Calculate the maximum seen writeId from the acid directory structure
+   * @param partPath Path of the partition directory
+   * @param res Partition result to write the max ids
+   * @throws IOException ex
+   */
+  private void setMaxTxnAndWriteIdFromPartition(Path partPath, 
CheckResult.PartitionResult res) throws IOException {
+FileSystem fs = partPath.getFileSystem(conf);
+FileStatus[] deltaOrBaseFiles = fs.listStatus(partPath, 
HIDDEN_FILES_PATH_FILTER);
+
+// Read the writeIds from every base and delta directory and find the max
+long maxWriteId = 0L;
+long maxVisibilityId = 0L;
+for(FileStatus fileStatus : deltaOrBaseFiles) {
+  if (!fileStatus.isDirectory()) {
+continue;
+  }
+  long writeId = 0L;
+  long visibilityId = 0L;
+  String folder = fileStatus.getPath().getName();
+  if (folder.startsWith(BASE_PREFIX)) {
+visibilityId = getVisibilityTxnId(folder);
+if (visibilityId > 0) {
+  folder = removeVisibilityTxnId(folder);
+}
+writeId = Long.parseLong(folder.substring(BASE_PREFIX.length()));
+  } else if (folder.startsWith(DELTA_PREFIX) || 
folder.startsWith(DELETE_DELTA_PREFIX)) {
+// See AcidUtils.parseDelta
+visibilityId = getVisibilityTxnId(folder);
+if (visibilityId > 0) {
+  folder = removeVisibilityTxnId(folder);
+}
+boolean isDeleteDelta = folder.startsWith(DELETE_DELTA_PREFIX);
+String rest = folder.substring((isDeleteDelta ? DELETE_DELTA_PREFIX : 
DELTA_PREFIX).length());
+int split = rest.indexOf('_');

Review comment:
   why not use rest.split('_')?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 452396)
Time Spent: 2.5h  (was: 2h 20m

[jira] [Work logged] (HIVE-23671) MSCK repair should handle transactional tables in certain usecases

2020-06-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23671?focusedWorklogId=452411&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-452411
 ]

ASF GitHub Bot logged work on HIVE-23671:
-

Author: ASF GitHub Bot
Created on: 29/Jun/20 14:54
Start Date: 29/Jun/20 14:54
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on a change in pull request #1087:
URL: https://github.com/apache/hive/pull/1087#discussion_r447033768



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStoreChecker.java
##
@@ -429,6 +451,75 @@ void findUnknownPartitions(Table table, Set 
partPaths,
 LOG.debug("Number of partitions not in metastore : " + 
result.getPartitionsNotInMs().size());
   }
 
+  /**
+   * Calculate the maximum seen writeId from the acid directory structure
+   * @param partPath Path of the partition directory
+   * @param res Partition result to write the max ids
+   * @throws IOException ex
+   */
+  private void setMaxTxnAndWriteIdFromPartition(Path partPath, 
CheckResult.PartitionResult res) throws IOException {
+FileSystem fs = partPath.getFileSystem(conf);
+FileStatus[] deltaOrBaseFiles = fs.listStatus(partPath, 
HIDDEN_FILES_PATH_FILTER);
+
+// Read the writeIds from every base and delta directory and find the max
+long maxWriteId = 0L;
+long maxVisibilityId = 0L;
+for(FileStatus fileStatus : deltaOrBaseFiles) {
+  if (!fileStatus.isDirectory()) {
+continue;
+  }
+  long writeId = 0L;
+  long visibilityId = 0L;
+  String folder = fileStatus.getPath().getName();
+  if (folder.startsWith(BASE_PREFIX)) {
+visibilityId = getVisibilityTxnId(folder);
+if (visibilityId > 0) {
+  folder = removeVisibilityTxnId(folder);
+}
+writeId = Long.parseLong(folder.substring(BASE_PREFIX.length()));
+  } else if (folder.startsWith(DELTA_PREFIX) || 
folder.startsWith(DELETE_DELTA_PREFIX)) {
+// See AcidUtils.parseDelta
+visibilityId = getVisibilityTxnId(folder);
+if (visibilityId > 0) {
+  folder = removeVisibilityTxnId(folder);
+}
+boolean isDeleteDelta = folder.startsWith(DELETE_DELTA_PREFIX);
+String rest = folder.substring((isDeleteDelta ? DELETE_DELTA_PREFIX : 
DELTA_PREFIX).length());
+int split = rest.indexOf('_');
+//split2 may be -1 if no statementId
+int split2 = rest.indexOf('_', split + 1);
+// We always want the second part (it is either the same or greater if 
it is a compacted delta)
+writeId = split2 == -1 ? Long.parseLong(rest.substring(split + 1)) : 
Long
+.parseLong(rest.substring(split + 1, split2));
+  }
+  if (writeId > maxWriteId) {
+maxWriteId = writeId;
+  }
+  if (visibilityId > maxVisibilityId) {
+maxVisibilityId = visibilityId;
+  }
+}
+LOG.debug("Max writeId {}, max txnId {} found in partition {}", 
maxWriteId, maxVisibilityId,
+partPath.toUri().toString());
+res.setMaxWriteId(maxWriteId);
+res.setMaxTxnId(maxVisibilityId);
+  }
+  private long getVisibilityTxnId(String folder) {
+int idxOfVis = folder.indexOf(VISIBILITY_PREFIX);

Review comment:
   why not use regex?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 452411)
Time Spent: 2h 40m  (was: 2.5h)

> MSCK repair should handle transactional tables in certain usecases
> --
>
> Key: HIVE-23671
> URL: https://issues.apache.org/jira/browse/HIVE-23671
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Reporter: Peter Varga
>Assignee: Peter Varga
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> The MSCK REPAIR tool does not handle transactional tables too well. It can 
> find and add new partitions the same way as for non-transactional tables, but 
> since the writeId differences are not handled, the data can not read back 
> from the new partitions.
> We could handle some usecases when the writeIds in the HMS and the underlying 
> data are not conflicting. If the HMS does not contains allocated writes for 
> the table we can seed the table with the writeIds read from the directory 
> structrure.
> Real life use cases could be:
>  * Copy data files from one cluster to another with different

[jira] [Work logged] (HIVE-23671) MSCK repair should handle transactional tables in certain usecases

2020-06-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23671?focusedWorklogId=452412&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-452412
 ]

ASF GitHub Bot logged work on HIVE-23671:
-

Author: ASF GitHub Bot
Created on: 29/Jun/20 14:55
Start Date: 29/Jun/20 14:55
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on a change in pull request #1087:
URL: https://github.com/apache/hive/pull/1087#discussion_r447033768



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStoreChecker.java
##
@@ -429,6 +451,75 @@ void findUnknownPartitions(Table table, Set 
partPaths,
 LOG.debug("Number of partitions not in metastore : " + 
result.getPartitionsNotInMs().size());
   }
 
+  /**
+   * Calculate the maximum seen writeId from the acid directory structure
+   * @param partPath Path of the partition directory
+   * @param res Partition result to write the max ids
+   * @throws IOException ex
+   */
+  private void setMaxTxnAndWriteIdFromPartition(Path partPath, 
CheckResult.PartitionResult res) throws IOException {
+FileSystem fs = partPath.getFileSystem(conf);
+FileStatus[] deltaOrBaseFiles = fs.listStatus(partPath, 
HIDDEN_FILES_PATH_FILTER);
+
+// Read the writeIds from every base and delta directory and find the max
+long maxWriteId = 0L;
+long maxVisibilityId = 0L;
+for(FileStatus fileStatus : deltaOrBaseFiles) {
+  if (!fileStatus.isDirectory()) {
+continue;
+  }
+  long writeId = 0L;
+  long visibilityId = 0L;
+  String folder = fileStatus.getPath().getName();
+  if (folder.startsWith(BASE_PREFIX)) {
+visibilityId = getVisibilityTxnId(folder);
+if (visibilityId > 0) {
+  folder = removeVisibilityTxnId(folder);
+}
+writeId = Long.parseLong(folder.substring(BASE_PREFIX.length()));
+  } else if (folder.startsWith(DELTA_PREFIX) || 
folder.startsWith(DELETE_DELTA_PREFIX)) {
+// See AcidUtils.parseDelta
+visibilityId = getVisibilityTxnId(folder);
+if (visibilityId > 0) {
+  folder = removeVisibilityTxnId(folder);
+}
+boolean isDeleteDelta = folder.startsWith(DELETE_DELTA_PREFIX);
+String rest = folder.substring((isDeleteDelta ? DELETE_DELTA_PREFIX : 
DELTA_PREFIX).length());
+int split = rest.indexOf('_');
+//split2 may be -1 if no statementId
+int split2 = rest.indexOf('_', split + 1);
+// We always want the second part (it is either the same or greater if 
it is a compacted delta)
+writeId = split2 == -1 ? Long.parseLong(rest.substring(split + 1)) : 
Long
+.parseLong(rest.substring(split + 1, split2));
+  }
+  if (writeId > maxWriteId) {
+maxWriteId = writeId;
+  }
+  if (visibilityId > maxVisibilityId) {
+maxVisibilityId = visibilityId;
+  }
+}
+LOG.debug("Max writeId {}, max txnId {} found in partition {}", 
maxWriteId, maxVisibilityId,
+partPath.toUri().toString());
+res.setMaxWriteId(maxWriteId);
+res.setMaxTxnId(maxVisibilityId);
+  }
+  private long getVisibilityTxnId(String folder) {
+int idxOfVis = folder.indexOf(VISIBILITY_PREFIX);

Review comment:
   why not use regex? removeVisibilityTxnId probably wouldn't even be needed





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 452412)
Time Spent: 2h 50m  (was: 2h 40m)

> MSCK repair should handle transactional tables in certain usecases
> --
>
> Key: HIVE-23671
> URL: https://issues.apache.org/jira/browse/HIVE-23671
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Reporter: Peter Varga
>Assignee: Peter Varga
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> The MSCK REPAIR tool does not handle transactional tables too well. It can 
> find and add new partitions the same way as for non-transactional tables, but 
> since the writeId differences are not handled, the data can not read back 
> from the new partitions.
> We could handle some usecases when the writeIds in the HMS and the underlying 
> data are not conflicting. If the HMS does not contains allocated writes for 
> the table we can seed the table with the writeIds read from the directory 
> structrure.
> Real life use cases could be:
>  * C

[jira] [Work logged] (HIVE-23671) MSCK repair should handle transactional tables in certain usecases

2020-06-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23671?focusedWorklogId=452415&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-452415
 ]

ASF GitHub Bot logged work on HIVE-23671:
-

Author: ASF GitHub Bot
Created on: 29/Jun/20 14:59
Start Date: 29/Jun/20 14:59
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on a change in pull request #1087:
URL: https://github.com/apache/hive/pull/1087#discussion_r447037520



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/Msck.java
##
@@ -229,102 +239,168 @@ public int repair(MsckInfo msckInfo) {
 throw new MetastoreException(e);
   }
 }
+if (transactionalTable && !MetaStoreServerUtils.isPartitioned(table)) {
+  if (result.getMaxWriteId() > 0) {

Review comment:
   you can remove 1 nesting level





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 452415)
Time Spent: 3h  (was: 2h 50m)

> MSCK repair should handle transactional tables in certain usecases
> --
>
> Key: HIVE-23671
> URL: https://issues.apache.org/jira/browse/HIVE-23671
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Reporter: Peter Varga
>Assignee: Peter Varga
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> The MSCK REPAIR tool does not handle transactional tables too well. It can 
> find and add new partitions the same way as for non-transactional tables, but 
> since the writeId differences are not handled, the data can not read back 
> from the new partitions.
> We could handle some usecases when the writeIds in the HMS and the underlying 
> data are not conflicting. If the HMS does not contains allocated writes for 
> the table we can seed the table with the writeIds read from the directory 
> structrure.
> Real life use cases could be:
>  * Copy data files from one cluster to another with different HMS, create the 
> table and call MSCK REPAIR
>  * If the HMS db is lost, recreate the table and call MSCK REPAIR
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23671) MSCK repair should handle transactional tables in certain usecases

2020-06-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23671?focusedWorklogId=452416&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-452416
 ]

ASF GitHub Bot logged work on HIVE-23671:
-

Author: ASF GitHub Bot
Created on: 29/Jun/20 15:00
Start Date: 29/Jun/20 15:00
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on a change in pull request #1087:
URL: https://github.com/apache/hive/pull/1087#discussion_r447037968



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/Msck.java
##
@@ -229,102 +239,168 @@ public int repair(MsckInfo msckInfo) {
 throw new MetastoreException(e);
   }
 }
+if (transactionalTable && !MetaStoreServerUtils.isPartitioned(table)) {
+  if (result.getMaxWriteId() > 0) {
+if (txnId < 0) {
+  // We need the txnId to check against even if we didn't do the 
locking
+  txnId = getMsc().openTxn(getUserName());
+}
+
+validateAndAddMaxTxnIdAndWriteId(result.getMaxWriteId(), 
result.getMaxTxnId(),
+table.getDbName(), table.getTableName(), txnId);
+  }
+}
   }
   success = true;
 } catch (Exception e) {
   LOG.warn("Failed to run metacheck: ", e);
   success = false;
-  ret = 1;
 } finally {
-  if (msckInfo.getResFile() != null) {
-BufferedWriter resultOut = null;
-try {
-  Path resFile = new Path(msckInfo.getResFile());
-  FileSystem fs = resFile.getFileSystem(getConf());
-  resultOut = new BufferedWriter(new 
OutputStreamWriter(fs.create(resFile)));
-
-  boolean firstWritten = false;
-  firstWritten |= writeMsckResult(result.getTablesNotInMs(),
-"Tables not in metastore:", resultOut, firstWritten);
-  firstWritten |= writeMsckResult(result.getTablesNotOnFs(),
-"Tables missing on filesystem:", resultOut, firstWritten);
-  firstWritten |= writeMsckResult(result.getPartitionsNotInMs(),
-"Partitions not in metastore:", resultOut, firstWritten);
-  firstWritten |= writeMsckResult(result.getPartitionsNotOnFs(),
-"Partitions missing from filesystem:", resultOut, firstWritten);
-  firstWritten |= writeMsckResult(result.getExpiredPartitions(),
-"Expired partitions (retention period: " + partitionExpirySeconds 
+ "s) :", resultOut, firstWritten);
-  // sorting to stabilize qfile output (msck_repair_drop.q)
-  Collections.sort(repairOutput);
-  for (String rout : repairOutput) {
-if (firstWritten) {
-  resultOut.write(terminator);
-} else {
-  firstWritten = true;
-}
-resultOut.write(rout);
-  }
-} catch (IOException e) {
-  LOG.warn("Failed to save metacheck output: ", e);
-  ret = 1;
-} finally {
-  if (resultOut != null) {
-try {
-  resultOut.close();
-} catch (IOException e) {
-  LOG.warn("Failed to close output file: ", e);
-  ret = 1;
-}
-  }
+  if (result!=null) {

Review comment:
   not formatted





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 452416)
Time Spent: 3h 10m  (was: 3h)

> MSCK repair should handle transactional tables in certain usecases
> --
>
> Key: HIVE-23671
> URL: https://issues.apache.org/jira/browse/HIVE-23671
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Reporter: Peter Varga
>Assignee: Peter Varga
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> The MSCK REPAIR tool does not handle transactional tables too well. It can 
> find and add new partitions the same way as for non-transactional tables, but 
> since the writeId differences are not handled, the data can not read back 
> from the new partitions.
> We could handle some usecases when the writeIds in the HMS and the underlying 
> data are not conflicting. If the HMS does not contains allocated writes for 
> the table we can seed the table with the writeIds read from the directory 
> structrure.
> Real life use cases could be:
>  * Copy data files from one cluster to another with different HMS, creat

[jira] [Work logged] (HIVE-23671) MSCK repair should handle transactional tables in certain usecases

2020-06-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23671?focusedWorklogId=452425&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-452425
 ]

ASF GitHub Bot logged work on HIVE-23671:
-

Author: ASF GitHub Bot
Created on: 29/Jun/20 15:05
Start Date: 29/Jun/20 15:05
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on a change in pull request #1087:
URL: https://github.com/apache/hive/pull/1087#discussion_r447042125



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/Msck.java
##
@@ -229,102 +239,168 @@ public int repair(MsckInfo msckInfo) {
 throw new MetastoreException(e);
   }
 }
+if (transactionalTable && !MetaStoreServerUtils.isPartitioned(table)) {
+  if (result.getMaxWriteId() > 0) {
+if (txnId < 0) {
+  // We need the txnId to check against even if we didn't do the 
locking
+  txnId = getMsc().openTxn(getUserName());
+}
+
+validateAndAddMaxTxnIdAndWriteId(result.getMaxWriteId(), 
result.getMaxTxnId(),
+table.getDbName(), table.getTableName(), txnId);
+  }
+}
   }
   success = true;
 } catch (Exception e) {
   LOG.warn("Failed to run metacheck: ", e);
   success = false;
-  ret = 1;
 } finally {
-  if (msckInfo.getResFile() != null) {
-BufferedWriter resultOut = null;
-try {
-  Path resFile = new Path(msckInfo.getResFile());
-  FileSystem fs = resFile.getFileSystem(getConf());
-  resultOut = new BufferedWriter(new 
OutputStreamWriter(fs.create(resFile)));
-
-  boolean firstWritten = false;
-  firstWritten |= writeMsckResult(result.getTablesNotInMs(),
-"Tables not in metastore:", resultOut, firstWritten);
-  firstWritten |= writeMsckResult(result.getTablesNotOnFs(),
-"Tables missing on filesystem:", resultOut, firstWritten);
-  firstWritten |= writeMsckResult(result.getPartitionsNotInMs(),
-"Partitions not in metastore:", resultOut, firstWritten);
-  firstWritten |= writeMsckResult(result.getPartitionsNotOnFs(),
-"Partitions missing from filesystem:", resultOut, firstWritten);
-  firstWritten |= writeMsckResult(result.getExpiredPartitions(),
-"Expired partitions (retention period: " + partitionExpirySeconds 
+ "s) :", resultOut, firstWritten);
-  // sorting to stabilize qfile output (msck_repair_drop.q)
-  Collections.sort(repairOutput);
-  for (String rout : repairOutput) {
-if (firstWritten) {
-  resultOut.write(terminator);
-} else {
-  firstWritten = true;
-}
-resultOut.write(rout);
-  }
-} catch (IOException e) {
-  LOG.warn("Failed to save metacheck output: ", e);
-  ret = 1;
-} finally {
-  if (resultOut != null) {
-try {
-  resultOut.close();
-} catch (IOException e) {
-  LOG.warn("Failed to close output file: ", e);
-  ret = 1;
-}
-  }
+  if (result!=null) {
+logResult(result);
+if (msckInfo.getResFile() != null) {
+  success = writeResultToFile(msckInfo, result, repairOutput, 
partitionExpirySeconds) && success;
 }
   }
 
-  LOG.info("Tables not in metastore: {}", result.getTablesNotInMs());
-  LOG.info("Tables missing on filesystem: {}", result.getTablesNotOnFs());
-  LOG.info("Partitions not in metastore: {}", 
result.getPartitionsNotInMs());
-  LOG.info("Partitions missing from filesystem: {}", 
result.getPartitionsNotOnFs());
-  LOG.info("Expired partitions: {}", result.getExpiredPartitions());
-  if (acquireLock && txnId > 0) {
-  if (success) {
-try {
-  LOG.info("txnId: {} succeeded. Committing..", txnId);
-  getMsc().commitTxn(txnId);
-} catch (Exception e) {
-  LOG.warn("Error while committing txnId: {} for table: {}", 
txnId, qualifiedTableName, e);
-  ret = 1;
-}
-  } else {
-try {
-  LOG.info("txnId: {} failed. Aborting..", txnId);
-  getMsc().abortTxns(Lists.newArrayList(txnId));
-} catch (Exception e) {
-  LOG.warn("Error while aborting txnId: {} for table: {}", txnId, 
qualifiedTableName, e);
-  ret = 1;
-}
-  }
+  if (txnId > 0) {
+success = closeTxn(qualifiedTableName, success, txnId) && success;

Review comment:
   same  success &= closeTxn(qualifiedTableName, success, txnId)





This is an automated message from the 

[jira] [Work logged] (HIVE-23671) MSCK repair should handle transactional tables in certain usecases

2020-06-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23671?focusedWorklogId=452422&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-452422
 ]

ASF GitHub Bot logged work on HIVE-23671:
-

Author: ASF GitHub Bot
Created on: 29/Jun/20 15:05
Start Date: 29/Jun/20 15:05
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on a change in pull request #1087:
URL: https://github.com/apache/hive/pull/1087#discussion_r447041559



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/Msck.java
##
@@ -229,102 +239,168 @@ public int repair(MsckInfo msckInfo) {
 throw new MetastoreException(e);
   }
 }
+if (transactionalTable && !MetaStoreServerUtils.isPartitioned(table)) {
+  if (result.getMaxWriteId() > 0) {
+if (txnId < 0) {
+  // We need the txnId to check against even if we didn't do the 
locking
+  txnId = getMsc().openTxn(getUserName());
+}
+
+validateAndAddMaxTxnIdAndWriteId(result.getMaxWriteId(), 
result.getMaxTxnId(),
+table.getDbName(), table.getTableName(), txnId);
+  }
+}
   }
   success = true;
 } catch (Exception e) {
   LOG.warn("Failed to run metacheck: ", e);
   success = false;
-  ret = 1;
 } finally {
-  if (msckInfo.getResFile() != null) {
-BufferedWriter resultOut = null;
-try {
-  Path resFile = new Path(msckInfo.getResFile());
-  FileSystem fs = resFile.getFileSystem(getConf());
-  resultOut = new BufferedWriter(new 
OutputStreamWriter(fs.create(resFile)));
-
-  boolean firstWritten = false;
-  firstWritten |= writeMsckResult(result.getTablesNotInMs(),
-"Tables not in metastore:", resultOut, firstWritten);
-  firstWritten |= writeMsckResult(result.getTablesNotOnFs(),
-"Tables missing on filesystem:", resultOut, firstWritten);
-  firstWritten |= writeMsckResult(result.getPartitionsNotInMs(),
-"Partitions not in metastore:", resultOut, firstWritten);
-  firstWritten |= writeMsckResult(result.getPartitionsNotOnFs(),
-"Partitions missing from filesystem:", resultOut, firstWritten);
-  firstWritten |= writeMsckResult(result.getExpiredPartitions(),
-"Expired partitions (retention period: " + partitionExpirySeconds 
+ "s) :", resultOut, firstWritten);
-  // sorting to stabilize qfile output (msck_repair_drop.q)
-  Collections.sort(repairOutput);
-  for (String rout : repairOutput) {
-if (firstWritten) {
-  resultOut.write(terminator);
-} else {
-  firstWritten = true;
-}
-resultOut.write(rout);
-  }
-} catch (IOException e) {
-  LOG.warn("Failed to save metacheck output: ", e);
-  ret = 1;
-} finally {
-  if (resultOut != null) {
-try {
-  resultOut.close();
-} catch (IOException e) {
-  LOG.warn("Failed to close output file: ", e);
-  ret = 1;
-}
-  }
+  if (result!=null) {
+logResult(result);
+if (msckInfo.getResFile() != null) {
+  success = writeResultToFile(msckInfo, result, repairOutput, 
partitionExpirySeconds) && success;

Review comment:
   you can do success &= writeResultToFile(msckInfo, result, repairOutput, 
partitionExpirySeconds)
   for readability





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 452422)
Time Spent: 3h 20m  (was: 3h 10m)

> MSCK repair should handle transactional tables in certain usecases
> --
>
> Key: HIVE-23671
> URL: https://issues.apache.org/jira/browse/HIVE-23671
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Reporter: Peter Varga
>Assignee: Peter Varga
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> The MSCK REPAIR tool does not handle transactional tables too well. It can 
> find and add new partitions the same way as for non-transactional tables, but 
> since the writeId differences are not handled, the data can not read back 
> from the new partitions.
> We could handle some usecases when the writeIds in the HMS and the 

[jira] [Work logged] (HIVE-23671) MSCK repair should handle transactional tables in certain usecases

2020-06-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23671?focusedWorklogId=452428&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-452428
 ]

ASF GitHub Bot logged work on HIVE-23671:
-

Author: ASF GitHub Bot
Created on: 29/Jun/20 15:09
Start Date: 29/Jun/20 15:09
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on a change in pull request #1087:
URL: https://github.com/apache/hive/pull/1087#discussion_r447044667



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/Msck.java
##
@@ -229,102 +239,168 @@ public int repair(MsckInfo msckInfo) {
 throw new MetastoreException(e);
   }
 }
+if (transactionalTable && !MetaStoreServerUtils.isPartitioned(table)) {
+  if (result.getMaxWriteId() > 0) {
+if (txnId < 0) {
+  // We need the txnId to check against even if we didn't do the 
locking
+  txnId = getMsc().openTxn(getUserName());
+}
+
+validateAndAddMaxTxnIdAndWriteId(result.getMaxWriteId(), 
result.getMaxTxnId(),
+table.getDbName(), table.getTableName(), txnId);
+  }
+}
   }
   success = true;
 } catch (Exception e) {
   LOG.warn("Failed to run metacheck: ", e);
   success = false;
-  ret = 1;
 } finally {
-  if (msckInfo.getResFile() != null) {
-BufferedWriter resultOut = null;
-try {
-  Path resFile = new Path(msckInfo.getResFile());
-  FileSystem fs = resFile.getFileSystem(getConf());
-  resultOut = new BufferedWriter(new 
OutputStreamWriter(fs.create(resFile)));
-
-  boolean firstWritten = false;
-  firstWritten |= writeMsckResult(result.getTablesNotInMs(),
-"Tables not in metastore:", resultOut, firstWritten);
-  firstWritten |= writeMsckResult(result.getTablesNotOnFs(),
-"Tables missing on filesystem:", resultOut, firstWritten);
-  firstWritten |= writeMsckResult(result.getPartitionsNotInMs(),
-"Partitions not in metastore:", resultOut, firstWritten);
-  firstWritten |= writeMsckResult(result.getPartitionsNotOnFs(),
-"Partitions missing from filesystem:", resultOut, firstWritten);
-  firstWritten |= writeMsckResult(result.getExpiredPartitions(),
-"Expired partitions (retention period: " + partitionExpirySeconds 
+ "s) :", resultOut, firstWritten);
-  // sorting to stabilize qfile output (msck_repair_drop.q)
-  Collections.sort(repairOutput);
-  for (String rout : repairOutput) {
-if (firstWritten) {
-  resultOut.write(terminator);
-} else {
-  firstWritten = true;
-}
-resultOut.write(rout);
-  }
-} catch (IOException e) {
-  LOG.warn("Failed to save metacheck output: ", e);
-  ret = 1;
-} finally {
-  if (resultOut != null) {
-try {
-  resultOut.close();
-} catch (IOException e) {
-  LOG.warn("Failed to close output file: ", e);
-  ret = 1;
-}
-  }
+  if (result!=null) {
+logResult(result);
+if (msckInfo.getResFile() != null) {
+  success = writeResultToFile(msckInfo, result, repairOutput, 
partitionExpirySeconds) && success;
 }
   }
 
-  LOG.info("Tables not in metastore: {}", result.getTablesNotInMs());
-  LOG.info("Tables missing on filesystem: {}", result.getTablesNotOnFs());
-  LOG.info("Partitions not in metastore: {}", 
result.getPartitionsNotInMs());
-  LOG.info("Partitions missing from filesystem: {}", 
result.getPartitionsNotOnFs());
-  LOG.info("Expired partitions: {}", result.getExpiredPartitions());
-  if (acquireLock && txnId > 0) {
-  if (success) {
-try {
-  LOG.info("txnId: {} succeeded. Committing..", txnId);
-  getMsc().commitTxn(txnId);
-} catch (Exception e) {
-  LOG.warn("Error while committing txnId: {} for table: {}", 
txnId, qualifiedTableName, e);
-  ret = 1;
-}
-  } else {
-try {
-  LOG.info("txnId: {} failed. Aborting..", txnId);
-  getMsc().abortTxns(Lists.newArrayList(txnId));
-} catch (Exception e) {
-  LOG.warn("Error while aborting txnId: {} for table: {}", txnId, 
qualifiedTableName, e);
-  ret = 1;
-}
-  }
+  if (txnId > 0) {
+success = closeTxn(qualifiedTableName, success, txnId) && success;
   }
   if (getMsc() != null) {
 getMsc().close();
 msc = null;
   }
 }
+return success ? 0 : 1;
+  }
 
+  private boolean closeTxn(String qualifiedTableName

[jira] [Work logged] (HIVE-23671) MSCK repair should handle transactional tables in certain usecases

2020-06-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23671?focusedWorklogId=452430&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-452430
 ]

ASF GitHub Bot logged work on HIVE-23671:
-

Author: ASF GitHub Bot
Created on: 29/Jun/20 15:09
Start Date: 29/Jun/20 15:09
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on a change in pull request #1087:
URL: https://github.com/apache/hive/pull/1087#discussion_r447044866



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/Msck.java
##
@@ -229,102 +239,168 @@ public int repair(MsckInfo msckInfo) {
 throw new MetastoreException(e);
   }
 }
+if (transactionalTable && !MetaStoreServerUtils.isPartitioned(table)) {
+  if (result.getMaxWriteId() > 0) {
+if (txnId < 0) {
+  // We need the txnId to check against even if we didn't do the 
locking
+  txnId = getMsc().openTxn(getUserName());
+}
+
+validateAndAddMaxTxnIdAndWriteId(result.getMaxWriteId(), 
result.getMaxTxnId(),
+table.getDbName(), table.getTableName(), txnId);
+  }
+}
   }
   success = true;
 } catch (Exception e) {
   LOG.warn("Failed to run metacheck: ", e);
   success = false;
-  ret = 1;
 } finally {
-  if (msckInfo.getResFile() != null) {
-BufferedWriter resultOut = null;
-try {
-  Path resFile = new Path(msckInfo.getResFile());
-  FileSystem fs = resFile.getFileSystem(getConf());
-  resultOut = new BufferedWriter(new 
OutputStreamWriter(fs.create(resFile)));
-
-  boolean firstWritten = false;
-  firstWritten |= writeMsckResult(result.getTablesNotInMs(),
-"Tables not in metastore:", resultOut, firstWritten);
-  firstWritten |= writeMsckResult(result.getTablesNotOnFs(),
-"Tables missing on filesystem:", resultOut, firstWritten);
-  firstWritten |= writeMsckResult(result.getPartitionsNotInMs(),
-"Partitions not in metastore:", resultOut, firstWritten);
-  firstWritten |= writeMsckResult(result.getPartitionsNotOnFs(),
-"Partitions missing from filesystem:", resultOut, firstWritten);
-  firstWritten |= writeMsckResult(result.getExpiredPartitions(),
-"Expired partitions (retention period: " + partitionExpirySeconds 
+ "s) :", resultOut, firstWritten);
-  // sorting to stabilize qfile output (msck_repair_drop.q)
-  Collections.sort(repairOutput);
-  for (String rout : repairOutput) {
-if (firstWritten) {
-  resultOut.write(terminator);
-} else {
-  firstWritten = true;
-}
-resultOut.write(rout);
-  }
-} catch (IOException e) {
-  LOG.warn("Failed to save metacheck output: ", e);
-  ret = 1;
-} finally {
-  if (resultOut != null) {
-try {
-  resultOut.close();
-} catch (IOException e) {
-  LOG.warn("Failed to close output file: ", e);
-  ret = 1;
-}
-  }
+  if (result!=null) {
+logResult(result);
+if (msckInfo.getResFile() != null) {
+  success = writeResultToFile(msckInfo, result, repairOutput, 
partitionExpirySeconds) && success;
 }
   }
 
-  LOG.info("Tables not in metastore: {}", result.getTablesNotInMs());
-  LOG.info("Tables missing on filesystem: {}", result.getTablesNotOnFs());
-  LOG.info("Partitions not in metastore: {}", 
result.getPartitionsNotInMs());
-  LOG.info("Partitions missing from filesystem: {}", 
result.getPartitionsNotOnFs());
-  LOG.info("Expired partitions: {}", result.getExpiredPartitions());
-  if (acquireLock && txnId > 0) {
-  if (success) {
-try {
-  LOG.info("txnId: {} succeeded. Committing..", txnId);
-  getMsc().commitTxn(txnId);
-} catch (Exception e) {
-  LOG.warn("Error while committing txnId: {} for table: {}", 
txnId, qualifiedTableName, e);
-  ret = 1;
-}
-  } else {
-try {
-  LOG.info("txnId: {} failed. Aborting..", txnId);
-  getMsc().abortTxns(Lists.newArrayList(txnId));
-} catch (Exception e) {
-  LOG.warn("Error while aborting txnId: {} for table: {}", txnId, 
qualifiedTableName, e);
-  ret = 1;
-}
-  }
+  if (txnId > 0) {
+success = closeTxn(qualifiedTableName, success, txnId) && success;
   }
   if (getMsc() != null) {
 getMsc().close();
 msc = null;
   }
 }
+return success ? 0 : 1;
+  }
 
+  private boolean closeTxn(String qualifiedTableName

[jira] [Work logged] (HIVE-23671) MSCK repair should handle transactional tables in certain usecases

2020-06-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23671?focusedWorklogId=452432&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-452432
 ]

ASF GitHub Bot logged work on HIVE-23671:
-

Author: ASF GitHub Bot
Created on: 29/Jun/20 15:11
Start Date: 29/Jun/20 15:11
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on a change in pull request #1087:
URL: https://github.com/apache/hive/pull/1087#discussion_r447046016



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/Msck.java
##
@@ -229,102 +239,168 @@ public int repair(MsckInfo msckInfo) {
 throw new MetastoreException(e);
   }
 }
+if (transactionalTable && !MetaStoreServerUtils.isPartitioned(table)) {
+  if (result.getMaxWriteId() > 0) {
+if (txnId < 0) {
+  // We need the txnId to check against even if we didn't do the 
locking
+  txnId = getMsc().openTxn(getUserName());
+}
+
+validateAndAddMaxTxnIdAndWriteId(result.getMaxWriteId(), 
result.getMaxTxnId(),
+table.getDbName(), table.getTableName(), txnId);
+  }
+}
   }
   success = true;
 } catch (Exception e) {
   LOG.warn("Failed to run metacheck: ", e);
   success = false;
-  ret = 1;
 } finally {
-  if (msckInfo.getResFile() != null) {
-BufferedWriter resultOut = null;
-try {
-  Path resFile = new Path(msckInfo.getResFile());
-  FileSystem fs = resFile.getFileSystem(getConf());
-  resultOut = new BufferedWriter(new 
OutputStreamWriter(fs.create(resFile)));
-
-  boolean firstWritten = false;
-  firstWritten |= writeMsckResult(result.getTablesNotInMs(),
-"Tables not in metastore:", resultOut, firstWritten);
-  firstWritten |= writeMsckResult(result.getTablesNotOnFs(),
-"Tables missing on filesystem:", resultOut, firstWritten);
-  firstWritten |= writeMsckResult(result.getPartitionsNotInMs(),
-"Partitions not in metastore:", resultOut, firstWritten);
-  firstWritten |= writeMsckResult(result.getPartitionsNotOnFs(),
-"Partitions missing from filesystem:", resultOut, firstWritten);
-  firstWritten |= writeMsckResult(result.getExpiredPartitions(),
-"Expired partitions (retention period: " + partitionExpirySeconds 
+ "s) :", resultOut, firstWritten);
-  // sorting to stabilize qfile output (msck_repair_drop.q)
-  Collections.sort(repairOutput);
-  for (String rout : repairOutput) {
-if (firstWritten) {
-  resultOut.write(terminator);
-} else {
-  firstWritten = true;
-}
-resultOut.write(rout);
-  }
-} catch (IOException e) {
-  LOG.warn("Failed to save metacheck output: ", e);
-  ret = 1;
-} finally {
-  if (resultOut != null) {
-try {
-  resultOut.close();
-} catch (IOException e) {
-  LOG.warn("Failed to close output file: ", e);
-  ret = 1;
-}
-  }
+  if (result!=null) {
+logResult(result);
+if (msckInfo.getResFile() != null) {
+  success = writeResultToFile(msckInfo, result, repairOutput, 
partitionExpirySeconds) && success;
 }
   }
 
-  LOG.info("Tables not in metastore: {}", result.getTablesNotInMs());
-  LOG.info("Tables missing on filesystem: {}", result.getTablesNotOnFs());
-  LOG.info("Partitions not in metastore: {}", 
result.getPartitionsNotInMs());
-  LOG.info("Partitions missing from filesystem: {}", 
result.getPartitionsNotOnFs());
-  LOG.info("Expired partitions: {}", result.getExpiredPartitions());
-  if (acquireLock && txnId > 0) {
-  if (success) {
-try {
-  LOG.info("txnId: {} succeeded. Committing..", txnId);
-  getMsc().commitTxn(txnId);
-} catch (Exception e) {
-  LOG.warn("Error while committing txnId: {} for table: {}", 
txnId, qualifiedTableName, e);
-  ret = 1;
-}
-  } else {
-try {
-  LOG.info("txnId: {} failed. Aborting..", txnId);
-  getMsc().abortTxns(Lists.newArrayList(txnId));
-} catch (Exception e) {
-  LOG.warn("Error while aborting txnId: {} for table: {}", txnId, 
qualifiedTableName, e);
-  ret = 1;
-}
-  }
+  if (txnId > 0) {
+success = closeTxn(qualifiedTableName, success, txnId) && success;
   }
   if (getMsc() != null) {
 getMsc().close();
 msc = null;
   }
 }
+return success ? 0 : 1;
+  }
 
+  private boolean closeTxn(String qualifiedTableName

[jira] [Updated] (HIVE-23770) Druid filter translation unable to handle inverted between

2020-06-29 Thread Nishant Bangarwa (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23770?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nishant Bangarwa updated HIVE-23770:

Attachment: HIVE-23770.1.patch

> Druid filter translation unable to handle inverted between
> --
>
> Key: HIVE-23770
> URL: https://issues.apache.org/jira/browse/HIVE-23770
> Project: Hive
>  Issue Type: Bug
>Reporter: Nishant Bangarwa
>Assignee: Nishant Bangarwa
>Priority: Major
> Attachments: HIVE-23770.1.patch, HIVE-23770.patch
>
>
> Druid filter translation happens in Calcite and does not uses HiveBetween 
> inverted flag for translation this misses a negation in the planned query



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23770) Druid filter translation unable to handle inverted between

2020-06-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23770?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-23770:
--
Labels: pull-request-available  (was: )

> Druid filter translation unable to handle inverted between
> --
>
> Key: HIVE-23770
> URL: https://issues.apache.org/jira/browse/HIVE-23770
> Project: Hive
>  Issue Type: Bug
>Reporter: Nishant Bangarwa
>Assignee: Nishant Bangarwa
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23770.1.patch, HIVE-23770.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Druid filter translation happens in Calcite and does not uses HiveBetween 
> inverted flag for translation this misses a negation in the planned query



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23770) Druid filter translation unable to handle inverted between

2020-06-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23770?focusedWorklogId=452434&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-452434
 ]

ASF GitHub Bot logged work on HIVE-23770:
-

Author: ASF GitHub Bot
Created on: 29/Jun/20 15:14
Start Date: 29/Jun/20 15:14
Worklog Time Spent: 10m 
  Work Description: nishantmonu51 opened a new pull request #1190:
URL: https://github.com/apache/hive/pull/1190


   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 452434)
Remaining Estimate: 0h
Time Spent: 10m

> Druid filter translation unable to handle inverted between
> --
>
> Key: HIVE-23770
> URL: https://issues.apache.org/jira/browse/HIVE-23770
> Project: Hive
>  Issue Type: Bug
>Reporter: Nishant Bangarwa
>Assignee: Nishant Bangarwa
>Priority: Major
> Attachments: HIVE-23770.1.patch, HIVE-23770.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Druid filter translation happens in Calcite and does not uses HiveBetween 
> inverted flag for translation this misses a negation in the planned query



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23671) MSCK repair should handle transactional tables in certain usecases

2020-06-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23671?focusedWorklogId=452442&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-452442
 ]

ASF GitHub Bot logged work on HIVE-23671:
-

Author: ASF GitHub Bot
Created on: 29/Jun/20 15:18
Start Date: 29/Jun/20 15:18
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on a change in pull request #1087:
URL: https://github.com/apache/hive/pull/1087#discussion_r447050970



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/Msck.java
##
@@ -229,102 +239,168 @@ public int repair(MsckInfo msckInfo) {
 throw new MetastoreException(e);
   }
 }
+if (transactionalTable && !MetaStoreServerUtils.isPartitioned(table)) {
+  if (result.getMaxWriteId() > 0) {
+if (txnId < 0) {
+  // We need the txnId to check against even if we didn't do the 
locking
+  txnId = getMsc().openTxn(getUserName());
+}
+
+validateAndAddMaxTxnIdAndWriteId(result.getMaxWriteId(), 
result.getMaxTxnId(),
+table.getDbName(), table.getTableName(), txnId);
+  }
+}
   }
   success = true;
 } catch (Exception e) {
   LOG.warn("Failed to run metacheck: ", e);
   success = false;
-  ret = 1;
 } finally {
-  if (msckInfo.getResFile() != null) {
-BufferedWriter resultOut = null;
-try {
-  Path resFile = new Path(msckInfo.getResFile());
-  FileSystem fs = resFile.getFileSystem(getConf());
-  resultOut = new BufferedWriter(new 
OutputStreamWriter(fs.create(resFile)));
-
-  boolean firstWritten = false;
-  firstWritten |= writeMsckResult(result.getTablesNotInMs(),
-"Tables not in metastore:", resultOut, firstWritten);
-  firstWritten |= writeMsckResult(result.getTablesNotOnFs(),
-"Tables missing on filesystem:", resultOut, firstWritten);
-  firstWritten |= writeMsckResult(result.getPartitionsNotInMs(),
-"Partitions not in metastore:", resultOut, firstWritten);
-  firstWritten |= writeMsckResult(result.getPartitionsNotOnFs(),
-"Partitions missing from filesystem:", resultOut, firstWritten);
-  firstWritten |= writeMsckResult(result.getExpiredPartitions(),
-"Expired partitions (retention period: " + partitionExpirySeconds 
+ "s) :", resultOut, firstWritten);
-  // sorting to stabilize qfile output (msck_repair_drop.q)
-  Collections.sort(repairOutput);
-  for (String rout : repairOutput) {
-if (firstWritten) {
-  resultOut.write(terminator);
-} else {
-  firstWritten = true;
-}
-resultOut.write(rout);
-  }
-} catch (IOException e) {
-  LOG.warn("Failed to save metacheck output: ", e);
-  ret = 1;
-} finally {
-  if (resultOut != null) {
-try {
-  resultOut.close();
-} catch (IOException e) {
-  LOG.warn("Failed to close output file: ", e);
-  ret = 1;
-}
-  }
+  if (result!=null) {
+logResult(result);
+if (msckInfo.getResFile() != null) {
+  success = writeResultToFile(msckInfo, result, repairOutput, 
partitionExpirySeconds) && success;
 }
   }
 
-  LOG.info("Tables not in metastore: {}", result.getTablesNotInMs());
-  LOG.info("Tables missing on filesystem: {}", result.getTablesNotOnFs());
-  LOG.info("Partitions not in metastore: {}", 
result.getPartitionsNotInMs());
-  LOG.info("Partitions missing from filesystem: {}", 
result.getPartitionsNotOnFs());
-  LOG.info("Expired partitions: {}", result.getExpiredPartitions());
-  if (acquireLock && txnId > 0) {
-  if (success) {
-try {
-  LOG.info("txnId: {} succeeded. Committing..", txnId);
-  getMsc().commitTxn(txnId);
-} catch (Exception e) {
-  LOG.warn("Error while committing txnId: {} for table: {}", 
txnId, qualifiedTableName, e);
-  ret = 1;
-}
-  } else {
-try {
-  LOG.info("txnId: {} failed. Aborting..", txnId);
-  getMsc().abortTxns(Lists.newArrayList(txnId));
-} catch (Exception e) {
-  LOG.warn("Error while aborting txnId: {} for table: {}", txnId, 
qualifiedTableName, e);
-  ret = 1;
-}
-  }
+  if (txnId > 0) {
+success = closeTxn(qualifiedTableName, success, txnId) && success;
   }
   if (getMsc() != null) {
 getMsc().close();
 msc = null;
   }
 }
+return success ? 0 : 1;
+  }
 
+  private boolean closeTxn(String qualifiedTableName

[jira] [Work logged] (HIVE-23671) MSCK repair should handle transactional tables in certain usecases

2020-06-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23671?focusedWorklogId=452444&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-452444
 ]

ASF GitHub Bot logged work on HIVE-23671:
-

Author: ASF GitHub Bot
Created on: 29/Jun/20 15:20
Start Date: 29/Jun/20 15:20
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on a change in pull request #1087:
URL: https://github.com/apache/hive/pull/1087#discussion_r447052663



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/Msck.java
##
@@ -229,102 +239,168 @@ public int repair(MsckInfo msckInfo) {
 throw new MetastoreException(e);
   }
 }
+if (transactionalTable && !MetaStoreServerUtils.isPartitioned(table)) {
+  if (result.getMaxWriteId() > 0) {
+if (txnId < 0) {
+  // We need the txnId to check against even if we didn't do the 
locking
+  txnId = getMsc().openTxn(getUserName());
+}
+
+validateAndAddMaxTxnIdAndWriteId(result.getMaxWriteId(), 
result.getMaxTxnId(),
+table.getDbName(), table.getTableName(), txnId);
+  }
+}
   }
   success = true;
 } catch (Exception e) {
   LOG.warn("Failed to run metacheck: ", e);
   success = false;
-  ret = 1;
 } finally {
-  if (msckInfo.getResFile() != null) {
-BufferedWriter resultOut = null;
-try {
-  Path resFile = new Path(msckInfo.getResFile());
-  FileSystem fs = resFile.getFileSystem(getConf());
-  resultOut = new BufferedWriter(new 
OutputStreamWriter(fs.create(resFile)));
-
-  boolean firstWritten = false;
-  firstWritten |= writeMsckResult(result.getTablesNotInMs(),
-"Tables not in metastore:", resultOut, firstWritten);
-  firstWritten |= writeMsckResult(result.getTablesNotOnFs(),
-"Tables missing on filesystem:", resultOut, firstWritten);
-  firstWritten |= writeMsckResult(result.getPartitionsNotInMs(),
-"Partitions not in metastore:", resultOut, firstWritten);
-  firstWritten |= writeMsckResult(result.getPartitionsNotOnFs(),
-"Partitions missing from filesystem:", resultOut, firstWritten);
-  firstWritten |= writeMsckResult(result.getExpiredPartitions(),
-"Expired partitions (retention period: " + partitionExpirySeconds 
+ "s) :", resultOut, firstWritten);
-  // sorting to stabilize qfile output (msck_repair_drop.q)
-  Collections.sort(repairOutput);
-  for (String rout : repairOutput) {
-if (firstWritten) {
-  resultOut.write(terminator);
-} else {
-  firstWritten = true;
-}
-resultOut.write(rout);
-  }
-} catch (IOException e) {
-  LOG.warn("Failed to save metacheck output: ", e);
-  ret = 1;
-} finally {
-  if (resultOut != null) {
-try {
-  resultOut.close();
-} catch (IOException e) {
-  LOG.warn("Failed to close output file: ", e);
-  ret = 1;
-}
-  }
+  if (result!=null) {
+logResult(result);
+if (msckInfo.getResFile() != null) {
+  success = writeResultToFile(msckInfo, result, repairOutput, 
partitionExpirySeconds) && success;
 }
   }
 
-  LOG.info("Tables not in metastore: {}", result.getTablesNotInMs());
-  LOG.info("Tables missing on filesystem: {}", result.getTablesNotOnFs());
-  LOG.info("Partitions not in metastore: {}", 
result.getPartitionsNotInMs());
-  LOG.info("Partitions missing from filesystem: {}", 
result.getPartitionsNotOnFs());
-  LOG.info("Expired partitions: {}", result.getExpiredPartitions());
-  if (acquireLock && txnId > 0) {
-  if (success) {
-try {
-  LOG.info("txnId: {} succeeded. Committing..", txnId);
-  getMsc().commitTxn(txnId);
-} catch (Exception e) {
-  LOG.warn("Error while committing txnId: {} for table: {}", 
txnId, qualifiedTableName, e);
-  ret = 1;
-}
-  } else {
-try {
-  LOG.info("txnId: {} failed. Aborting..", txnId);
-  getMsc().abortTxns(Lists.newArrayList(txnId));
-} catch (Exception e) {
-  LOG.warn("Error while aborting txnId: {} for table: {}", txnId, 
qualifiedTableName, e);
-  ret = 1;
-}
-  }
+  if (txnId > 0) {
+success = closeTxn(qualifiedTableName, success, txnId) && success;
   }
   if (getMsc() != null) {
 getMsc().close();
 msc = null;
   }
 }
+return success ? 0 : 1;
+  }
 
+  private boolean closeTxn(String qualifiedTableName

[jira] [Work logged] (HIVE-23671) MSCK repair should handle transactional tables in certain usecases

2020-06-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23671?focusedWorklogId=452446&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-452446
 ]

ASF GitHub Bot logged work on HIVE-23671:
-

Author: ASF GitHub Bot
Created on: 29/Jun/20 15:21
Start Date: 29/Jun/20 15:21
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on a change in pull request #1087:
URL: https://github.com/apache/hive/pull/1087#discussion_r447053621



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/Msck.java
##
@@ -229,102 +239,168 @@ public int repair(MsckInfo msckInfo) {
 throw new MetastoreException(e);
   }
 }
+if (transactionalTable && !MetaStoreServerUtils.isPartitioned(table)) {
+  if (result.getMaxWriteId() > 0) {
+if (txnId < 0) {
+  // We need the txnId to check against even if we didn't do the 
locking
+  txnId = getMsc().openTxn(getUserName());
+}
+
+validateAndAddMaxTxnIdAndWriteId(result.getMaxWriteId(), 
result.getMaxTxnId(),
+table.getDbName(), table.getTableName(), txnId);
+  }
+}
   }
   success = true;
 } catch (Exception e) {
   LOG.warn("Failed to run metacheck: ", e);
   success = false;
-  ret = 1;
 } finally {
-  if (msckInfo.getResFile() != null) {
-BufferedWriter resultOut = null;
-try {
-  Path resFile = new Path(msckInfo.getResFile());
-  FileSystem fs = resFile.getFileSystem(getConf());
-  resultOut = new BufferedWriter(new 
OutputStreamWriter(fs.create(resFile)));
-
-  boolean firstWritten = false;
-  firstWritten |= writeMsckResult(result.getTablesNotInMs(),
-"Tables not in metastore:", resultOut, firstWritten);
-  firstWritten |= writeMsckResult(result.getTablesNotOnFs(),
-"Tables missing on filesystem:", resultOut, firstWritten);
-  firstWritten |= writeMsckResult(result.getPartitionsNotInMs(),
-"Partitions not in metastore:", resultOut, firstWritten);
-  firstWritten |= writeMsckResult(result.getPartitionsNotOnFs(),
-"Partitions missing from filesystem:", resultOut, firstWritten);
-  firstWritten |= writeMsckResult(result.getExpiredPartitions(),
-"Expired partitions (retention period: " + partitionExpirySeconds 
+ "s) :", resultOut, firstWritten);
-  // sorting to stabilize qfile output (msck_repair_drop.q)
-  Collections.sort(repairOutput);
-  for (String rout : repairOutput) {
-if (firstWritten) {
-  resultOut.write(terminator);
-} else {
-  firstWritten = true;
-}
-resultOut.write(rout);
-  }
-} catch (IOException e) {
-  LOG.warn("Failed to save metacheck output: ", e);
-  ret = 1;
-} finally {
-  if (resultOut != null) {
-try {
-  resultOut.close();
-} catch (IOException e) {
-  LOG.warn("Failed to close output file: ", e);
-  ret = 1;
-}
-  }
+  if (result!=null) {
+logResult(result);
+if (msckInfo.getResFile() != null) {
+  success = writeResultToFile(msckInfo, result, repairOutput, 
partitionExpirySeconds) && success;
 }
   }
 
-  LOG.info("Tables not in metastore: {}", result.getTablesNotInMs());
-  LOG.info("Tables missing on filesystem: {}", result.getTablesNotOnFs());
-  LOG.info("Partitions not in metastore: {}", 
result.getPartitionsNotInMs());
-  LOG.info("Partitions missing from filesystem: {}", 
result.getPartitionsNotOnFs());
-  LOG.info("Expired partitions: {}", result.getExpiredPartitions());
-  if (acquireLock && txnId > 0) {
-  if (success) {
-try {
-  LOG.info("txnId: {} succeeded. Committing..", txnId);
-  getMsc().commitTxn(txnId);
-} catch (Exception e) {
-  LOG.warn("Error while committing txnId: {} for table: {}", 
txnId, qualifiedTableName, e);
-  ret = 1;
-}
-  } else {
-try {
-  LOG.info("txnId: {} failed. Aborting..", txnId);
-  getMsc().abortTxns(Lists.newArrayList(txnId));
-} catch (Exception e) {
-  LOG.warn("Error while aborting txnId: {} for table: {}", txnId, 
qualifiedTableName, e);
-  ret = 1;
-}
-  }
+  if (txnId > 0) {
+success = closeTxn(qualifiedTableName, success, txnId) && success;
   }
   if (getMsc() != null) {
 getMsc().close();
 msc = null;
   }
 }
+return success ? 0 : 1;
+  }
 
+  private boolean closeTxn(String qualifiedTableName

[jira] [Work logged] (HIVE-23598) Add option to rewrite NTILE and RANK to sketch functions

2020-06-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23598?focusedWorklogId=452457&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-452457
 ]

ASF GitHub Bot logged work on HIVE-23598:
-

Author: ASF GitHub Bot
Created on: 29/Jun/20 16:00
Start Date: 29/Jun/20 16:00
Worklog Time Spent: 10m 
  Work Description: kgyrtkirk commented on a change in pull request #1126:
URL: https://github.com/apache/hive/pull/1126#discussion_r447081115



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveRewriteToDataSketchesRules.java
##
@@ -483,46 +489,44 @@ protected final SqlOperator getSqlOperator(String fnName) 
{
   }
 
   /**
-   * Rewrites {@code cume_dist() over (order by id)}.
+   * Provides a generic way to rewrite function into using an estimation based 
on CDF.
+   *
+   *  There are a few methods which could be supported this way: NTILE, 
CUME_DIST, RANK
*
+   *  For example:
*  
*   SELECT id, CUME_DIST() OVER (ORDER BY id) FROM sketch_input;
-   * ⇒ SELECT id, 1.0-ds_kll_cdf(ds, CAST(-id AS FLOAT) )[0]
+   * ⇒ SELECT id, ds_kll_cdf(ds, CAST(id AS FLOAT) )[0]
*   FROM sketch_input JOIN (
-   * SELECT ds_kll_sketch(CAST(-id AS FLOAT)) AS ds FROM sketch_input
+   * SELECT ds_kll_sketch(CAST(id AS FLOAT)) AS ds FROM sketch_input
*   ) q;
*  
*/
-  public static class CumeDistRewrite extends 
WindowingToProjectAggregateJoinProject {
+  public static abstract class AbstractRankBasedRewriteRule extends 
WindowingToProjectAggregateJoinProject {
 
-public CumeDistRewrite(String sketchType) {
+public AbstractRankBasedRewriteRule(String sketchType) {
   super(sketchType);
 }
 
-@Override
-protected VbuilderPAP buildProcessor(RelOptRuleCall call) {
-  return new VB(sketchType, call.builder());
-}
+protected static abstract class AbstractRankBasedRewriteBuilder extends 
VbuilderPAP {
 
-private static class VB extends VbuilderPAP {
-
-  protected VB(String sketchClass, RelBuilder relBuilder) {
+  protected AbstractRankBasedRewriteBuilder(String sketchClass, RelBuilder 
relBuilder) {
 super(sketchClass, relBuilder);
   }
 
   @Override
-  boolean isApplicable(RexOver over) {
-SqlAggFunction aggOp = over.getAggOperator();
+  final boolean isApplicable(RexOver over) {
 RexWindow window = over.getWindow();
-if (aggOp.getName().equalsIgnoreCase("cume_dist") && 
window.orderKeys.size() == 1
-&& window.getLowerBound().isUnbounded() && 
window.getUpperBound().isUnbounded()) {
+if (window.orderKeys.size() == 1
+&& window.getLowerBound().isUnbounded() && 
window.getUpperBound().isUnbounded()

Review comment:
   interesting; mostly the second :D
   for the current functions (ntile,cume_dist) doesn't really make sense to set 
the window anything than unbounded (or at least I don't see a usecase for it)
   
   I've tried this out for the below query:
   ```
   select id,ntile(id) over (order by id rows between 1 preceding and 1 
following) from sketch_input order by id nulls last;
   ```
   * mysql: rejects it with an error that `ntile` doesn't support it
   * psql: accepts and executes it without interpreting the preceding/following 
stuff correctly
   * hive: stops with a semanticexception
   





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 452457)
Time Spent: 50m  (was: 40m)

> Add option to rewrite NTILE and RANK to sketch functions
> 
>
> Key: HIVE-23598
> URL: https://issues.apache.org/jira/browse/HIVE-23598
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-23775) investigate windowing spec when an order by is present

2020-06-29 Thread Zoltan Haindrich (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich reassigned HIVE-23775:
---


> investigate windowing spec when an order by is present
> --
>
> Key: HIVE-23775
> URL: https://issues.apache.org/jira/browse/HIVE-23775
> Project: Hive
>  Issue Type: Bug
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>
> this is some weird stuff came up during review
> https://github.com/apache/hive/pull/1126#discussion_r442266978
> Order by spec -> range, unbounded preceding, current row
> This also aligns with most RDBMSs implementation



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23598) Add option to rewrite NTILE and RANK to sketch functions

2020-06-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23598?focusedWorklogId=452463&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-452463
 ]

ASF GitHub Bot logged work on HIVE-23598:
-

Author: ASF GitHub Bot
Created on: 29/Jun/20 16:25
Start Date: 29/Jun/20 16:25
Worklog Time Spent: 10m 
  Work Description: kgyrtkirk commented on a change in pull request #1126:
URL: https://github.com/apache/hive/pull/1126#discussion_r447097434



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveRewriteToDataSketchesRules.java
##
@@ -483,46 +489,44 @@ protected final SqlOperator getSqlOperator(String fnName) 
{
   }
 
   /**
-   * Rewrites {@code cume_dist() over (order by id)}.
+   * Provides a generic way to rewrite function into using an estimation based 
on CDF.
+   *
+   *  There are a few methods which could be supported this way: NTILE, 
CUME_DIST, RANK
*
+   *  For example:
*  
*   SELECT id, CUME_DIST() OVER (ORDER BY id) FROM sketch_input;
-   * ⇒ SELECT id, 1.0-ds_kll_cdf(ds, CAST(-id AS FLOAT) )[0]
+   * ⇒ SELECT id, ds_kll_cdf(ds, CAST(id AS FLOAT) )[0]
*   FROM sketch_input JOIN (
-   * SELECT ds_kll_sketch(CAST(-id AS FLOAT)) AS ds FROM sketch_input
+   * SELECT ds_kll_sketch(CAST(id AS FLOAT)) AS ds FROM sketch_input
*   ) q;
*  
*/
-  public static class CumeDistRewrite extends 
WindowingToProjectAggregateJoinProject {
+  public static abstract class AbstractRankBasedRewriteRule extends 
WindowingToProjectAggregateJoinProject {
 
-public CumeDistRewrite(String sketchType) {
+public AbstractRankBasedRewriteRule(String sketchType) {
   super(sketchType);
 }
 
-@Override
-protected VbuilderPAP buildProcessor(RelOptRuleCall call) {
-  return new VB(sketchType, call.builder());
-}
+protected static abstract class AbstractRankBasedRewriteBuilder extends 
VbuilderPAP {
 
-private static class VB extends VbuilderPAP {
-
-  protected VB(String sketchClass, RelBuilder relBuilder) {
+  protected AbstractRankBasedRewriteBuilder(String sketchClass, RelBuilder 
relBuilder) {
 super(sketchClass, relBuilder);
   }
 
   @Override
-  boolean isApplicable(RexOver over) {
-SqlAggFunction aggOp = over.getAggOperator();
+  final boolean isApplicable(RexOver over) {
 RexWindow window = over.getWindow();
-if (aggOp.getName().equalsIgnoreCase("cume_dist") && 
window.orderKeys.size() == 1
-&& window.getLowerBound().isUnbounded() && 
window.getUpperBound().isUnbounded()) {
+if (window.orderKeys.size() == 1
+&& window.getLowerBound().isUnbounded() && 
window.getUpperBound().isUnbounded()

Review comment:
   the logic to handle 
   ```
   Order by spec -> range, unbounded preceding, current row
   This also aligns with most RDBMSs implementation
   ```
   I think at the time this rule fires it will see unbounded/unbounded...but 
it's very weird I'll open a separate ticket to invetigate that 
   
   opened: HIVE-23775





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 452463)
Time Spent: 1h  (was: 50m)

> Add option to rewrite NTILE and RANK to sketch functions
> 
>
> Key: HIVE-23598
> URL: https://issues.apache.org/jira/browse/HIVE-23598
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23741) Store CacheTags in the file cache level

2020-06-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23741?focusedWorklogId=452465&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-452465
 ]

ASF GitHub Bot logged work on HIVE-23741:
-

Author: ASF GitHub Bot
Created on: 29/Jun/20 16:33
Start Date: 29/Jun/20 16:33
Worklog Time Spent: 10m 
  Work Description: szlta merged pull request #1159:
URL: https://github.com/apache/hive/pull/1159


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 452465)
Time Spent: 50m  (was: 40m)

> Store CacheTags in the file cache level
> ---
>
> Key: HIVE-23741
> URL: https://issues.apache.org/jira/browse/HIVE-23741
> Project: Hive
>  Issue Type: Improvement
>Reporter: Antal Sinkovits
>Assignee: Antal Sinkovits
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> CacheTags are currently stored for every data buffer. The strings are 
> internalized, but the number of cache tag objects can be reduced by moving 
> them to the file cache level, and back referencing them.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-23776) Retire quickstats autocollection

2020-06-29 Thread Zoltan Haindrich (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich reassigned HIVE-23776:
---


> Retire quickstats autocollection
> 
>
> Key: HIVE-23776
> URL: https://issues.apache.org/jira/browse/HIVE-23776
> Project: Hive
>  Issue Type: Improvement
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>
> this is about:
> * num files
> * datasize (sum of filesizes)
> * num erasure coded files
> right now these are scanned during every BasicStatsTask execution - which 
> means some filesystem reads/etc - for small inserts these are visible in case 
> the fs is a bit slower (s3 and friends)
> I don't think they are really in use...we rely more on columnstats which are 
> more accurate ; and because of the datasize in this case is for "offline" 
> (ondisk) - while we should be insted calculate with "online" sizes...
> proposal:
> * remove collection and storage of this data
> * collect it on the fly during "desc formatted" statements to provide them 
> for informational purposes



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-23776) Retire quickstats autocollection

2020-06-29 Thread Prasanth Jayachandran (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17147939#comment-17147939
 ] 

Prasanth Jayachandran commented on HIVE-23776:
--

{quote}I don't think they are really in use...
{quote}
It is used in many places. There is stats annotation fallback which relies on 
this. There are compile time counters added for this which can be used for 
workload management guardrails. There are some existing pre-hooks which relies 
on this or could be relying on this. I am -1 on removing this without having 
substantial evidence that this is not used. 

> Retire quickstats autocollection
> 
>
> Key: HIVE-23776
> URL: https://issues.apache.org/jira/browse/HIVE-23776
> Project: Hive
>  Issue Type: Improvement
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>
> this is about:
> * num files
> * datasize (sum of filesizes)
> * num erasure coded files
> right now these are scanned during every BasicStatsTask execution - which 
> means some filesystem reads/etc - for small inserts these are visible in case 
> the fs is a bit slower (s3 and friends)
> I don't think they are really in use...we rely more on columnstats which are 
> more accurate ; and because of the datasize in this case is for "offline" 
> (ondisk) - while we should be insted calculate with "online" sizes...
> proposal:
> * remove collection and storage of this data
> * collect it on the fly during "desc formatted" statements to provide them 
> for informational purposes



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23741) Store CacheTags in the file cache level

2020-06-29 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HIVE-23741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ádám Szita updated HIVE-23741:
--
Fix Version/s: 4.0.0
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

> Store CacheTags in the file cache level
> ---
>
> Key: HIVE-23741
> URL: https://issues.apache.org/jira/browse/HIVE-23741
> Project: Hive
>  Issue Type: Improvement
>Reporter: Antal Sinkovits
>Assignee: Antal Sinkovits
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> CacheTags are currently stored for every data buffer. The strings are 
> internalized, but the number of cache tag objects can be reduced by moving 
> them to the file cache level, and back referencing them.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-23741) Store CacheTags in the file cache level

2020-06-29 Thread Jira


[ 
https://issues.apache.org/jira/browse/HIVE-23741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17147943#comment-17147943
 ] 

Ádám Szita commented on HIVE-23741:
---

Committed to master. Thanks Anti!

> Store CacheTags in the file cache level
> ---
>
> Key: HIVE-23741
> URL: https://issues.apache.org/jira/browse/HIVE-23741
> Project: Hive
>  Issue Type: Improvement
>Reporter: Antal Sinkovits
>Assignee: Antal Sinkovits
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> CacheTags are currently stored for every data buffer. The strings are 
> internalized, but the number of cache tag objects can be reduced by moving 
> them to the file cache level, and back referencing them.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-23573) [HMS] Advance the write id for the table for DDL

2020-06-29 Thread Vihang Karajgaonkar (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17147992#comment-17147992
 ] 

Vihang Karajgaonkar commented on HIVE-23573:


Patch was reviewed on https://github.com/apache/hive/pull/1095. I just merged 
it in master branch. Thanks [~kishendas] for your contribution.

> [HMS] Advance the write id for the table for DDL
> 
>
> Key: HIVE-23573
> URL: https://issues.apache.org/jira/browse/HIVE-23573
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Kishen Das
>Assignee: Kishen Das
>Priority: Major
> Attachments: HIVE-23573.1.patch
>
>
> Every write request will advance the write id for the table for DDL. The 
> writeid will be marked committed locally in HMS client. The next read request 
> is guaranteed to read from db until the notification log catch up to the 
> commit message of the transaction commit, since the writeid is newer than the 
> cache (the writeid for the transaction is committed locally, but is not 
> committed on HMS until notification log catch up).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-23573) [HMS] Advance the write id for the table for DDL

2020-06-29 Thread Vihang Karajgaonkar (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vihang Karajgaonkar resolved HIVE-23573.

Fix Version/s: 4.0.0
   Resolution: Fixed

> [HMS] Advance the write id for the table for DDL
> 
>
> Key: HIVE-23573
> URL: https://issues.apache.org/jira/browse/HIVE-23573
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Kishen Das
>Assignee: Kishen Das
>Priority: Major
> Fix For: 4.0.0
>
> Attachments: HIVE-23573.1.patch
>
>
> Every write request will advance the write id for the table for DDL. The 
> writeid will be marked committed locally in HMS client. The next read request 
> is guaranteed to read from db until the notification log catch up to the 
> commit message of the transaction commit, since the writeid is newer than the 
> cache (the writeid for the transaction is committed locally, but is not 
> committed on HMS until notification log catch up).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-23776) Retire quickstats autocollection

2020-06-29 Thread Zoltan Haindrich (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17148017#comment-17148017
 ] 

Zoltan Haindrich commented on HIVE-23776:
-

I'm not sure if it was clear; basicstats right now is composed from
* collected stuff: numRows; rawDataSize
* "quickstats" stuff: numFiles,totalSize ; this ticket is about these things

>  There is stats annotation fallback which relies on this.
stats also has it's 
[own|https://github.com/apache/hive/blob/6440d93981e6d6aab59ecf2e77ffa45cd84d47de/ql/src/java/org/apache/hadoop/hive/ql/stats/BasicStats.java#L205]
  file scanning machinery so it could easily live without it - but in most 
cases the compiler will rely on columnstats based infos - so I don't see any 
problem here

Right now I don't know why workload management would use this kind of info (it 
may also resort to file scanning if it really need this) 

I've done a quick check and neither NUM_FILES,TOTAL_SIZE was used where it 
looked problematic...
but I guess if I start removing it; the code and the tests will tell whether it 
could be removed

> Retire quickstats autocollection
> 
>
> Key: HIVE-23776
> URL: https://issues.apache.org/jira/browse/HIVE-23776
> Project: Hive
>  Issue Type: Improvement
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>
> this is about:
> * num files
> * datasize (sum of filesizes)
> * num erasure coded files
> right now these are scanned during every BasicStatsTask execution - which 
> means some filesystem reads/etc - for small inserts these are visible in case 
> the fs is a bit slower (s3 and friends)
> I don't think they are really in use...we rely more on columnstats which are 
> more accurate ; and because of the datasize in this case is for "offline" 
> (ondisk) - while we should be insted calculate with "online" sizes...
> proposal:
> * remove collection and storage of this data
> * collect it on the fly during "desc formatted" statements to provide them 
> for informational purposes



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-23776) Retire quickstats autocollection

2020-06-29 Thread Prasanth Jayachandran (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17148054#comment-17148054
 ] 

Prasanth Jayachandran commented on HIVE-23776:
--

Yes. I know the quickstats part. The workload management triggers can define 
*any* hive counters that includes the following counters newly added.

[https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/tez/CompileTimeCounters.java]
 

If text files land in some staging table and if there are workload management 
trigger/guardrails that says "if query scans > 10TB kill query" then removing 
these quick stats will break the functionality. These staging tables are not 
going to get analyzed in some cases for it to collect statistics. 

Just searching the hive code base, unit testing will alone not be sufficient to 
know if customers are using it or not. If there is a specific need to remove 
this put it behind a config, deprecate and remove in iterations before removing 
it in one go. 

 

> Retire quickstats autocollection
> 
>
> Key: HIVE-23776
> URL: https://issues.apache.org/jira/browse/HIVE-23776
> Project: Hive
>  Issue Type: Improvement
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>
> this is about:
> * num files
> * datasize (sum of filesizes)
> * num erasure coded files
> right now these are scanned during every BasicStatsTask execution - which 
> means some filesystem reads/etc - for small inserts these are visible in case 
> the fs is a bit slower (s3 and friends)
> I don't think they are really in use...we rely more on columnstats which are 
> more accurate ; and because of the datasize in this case is for "offline" 
> (ondisk) - while we should be insted calculate with "online" sizes...
> proposal:
> * remove collection and storage of this data
> * collect it on the fly during "desc formatted" statements to provide them 
> for informational purposes



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23611) Mandate fully qualified absolute path for external table base dir during REPL operation

2020-06-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23611?focusedWorklogId=452569&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-452569
 ]

ASF GitHub Bot logged work on HIVE-23611:
-

Author: ASF GitHub Bot
Created on: 29/Jun/20 18:38
Start Date: 29/Jun/20 18:38
Worklog Time Spent: 10m 
  Work Description: pkumarsinha commented on a change in pull request #1120:
URL: https://github.com/apache/hive/pull/1120#discussion_r447174957



##
File path: 
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/BaseReplicationAcrossInstances.java
##
@@ -103,6 +114,12 @@ public static void classLevelTearDown() throws IOException 
{
 replica.close();
   }
 
+  private static void setReplicaExternalBase(FileSystem fs, Map confMap) throws IOException {
+fs.mkdirs(REPLICA_EXTERNAL_BASE);
+fullyQualifiedReplicaExternalBase =  
fs.getFileStatus(REPLICA_EXTERNAL_BASE).getPath().toString();
+confMap.put(HiveConf.ConfVars.REPL_EXTERNAL_TABLE_BASE_DIR.varname, 
fullyQualifiedReplicaExternalBase);

Review comment:
   Yes, needed both the places. Both are different case.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 452569)
Time Spent: 0.5h  (was: 20m)

> Mandate fully qualified absolute path for external table base dir during REPL 
> operation
> ---
>
> Key: HIVE-23611
> URL: https://issues.apache.org/jira/browse/HIVE-23611
> Project: Hive
>  Issue Type: Improvement
>Reporter: Pravin Sinha
>Assignee: Pravin Sinha
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23611.01.patch, HIVE-23611.02.patch
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work started] (HIVE-23767) Send ValidWriteIDList in request for all the new HMS get_* APIs that are in request/response form

2020-06-29 Thread Kishen Das (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-23767 started by Kishen Das.
-
> Send ValidWriteIDList in request for all the new HMS get_* APIs that are in 
> request/response form
> -
>
> Key: HIVE-23767
> URL: https://issues.apache.org/jira/browse/HIVE-23767
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Kishen Das
>Assignee: Kishen Das
>Priority: Major
>
> We recently introduced new set of HMS APIs that take ValidWriteIDList in the 
> request, as part of HIVE-22017.
> We should switch to these new APIs, wherever required and start sending 
> ValidWriteIDList in request for all the new HMS get_* APIs that are in 
> request/response form.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23611) Mandate fully qualified absolute path for external table base dir during REPL operation

2020-06-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23611?focusedWorklogId=452574&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-452574
 ]

ASF GitHub Bot logged work on HIVE-23611:
-

Author: ASF GitHub Bot
Created on: 29/Jun/20 18:45
Start Date: 29/Jun/20 18:45
Worklog Time Spent: 10m 
  Work Description: pkumarsinha commented on a change in pull request #1120:
URL: https://github.com/apache/hive/pull/1120#discussion_r447178408



##
File path: 
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestReplicationScenariosExternalTables.java
##
@@ -159,14 +154,14 @@ public void externalTableReplicationWithDefaultPaths() 
throws Throwable {
 .run("insert into table t2 partition(country='india') values 
('bangalore')")
 .run("insert into table t2 partition(country='us') values ('austin')")
 .run("insert into table t2 partition(country='france') values 
('paris')")
-.dump(primaryDbName, withClauseOptions);
+.dump(primaryDbName);

Review comment:
   Not needed anymore

##
File path: 
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestReplicationScenariosExternalTables.java
##
@@ -503,8 +495,7 @@ public void externalTableIncrementalCheckpointing() throws 
Throwable {
 
   @Test
   public void externalTableIncrementalReplication() throws Throwable {
-List withClause = externalTableBasePathWithClause();
-WarehouseInstance.Tuple tuple = primary.dump(primaryDbName, withClause);
+WarehouseInstance.Tuple tuple = primary.dump(primaryDbName);

Review comment:
   Not needed anymore

##
File path: 
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestReplicationScenariosExternalTables.java
##
@@ -623,7 +612,7 @@ public void bootstrapExternalTablesDuringIncrementalPhase() 
throws Throwable {
 assertFalse(primary.miniDFSCluster.getFileSystem()
 .exists(new Path(metadataPath + relativeExtInfoPath(null;
 
-replica.load(replicatedDbName, primaryDbName, loadWithClause)
+replica.load(replicatedDbName, primaryDbName)

Review comment:
   Not needed anymore





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 452574)
Time Spent: 40m  (was: 0.5h)

> Mandate fully qualified absolute path for external table base dir during REPL 
> operation
> ---
>
> Key: HIVE-23611
> URL: https://issues.apache.org/jira/browse/HIVE-23611
> Project: Hive
>  Issue Type: Improvement
>Reporter: Pravin Sinha
>Assignee: Pravin Sinha
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23611.01.patch, HIVE-23611.02.patch
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23611) Mandate fully qualified absolute path for external table base dir during REPL operation

2020-06-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23611?focusedWorklogId=452575&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-452575
 ]

ASF GitHub Bot logged work on HIVE-23611:
-

Author: ASF GitHub Bot
Created on: 29/Jun/20 18:45
Start Date: 29/Jun/20 18:45
Worklog Time Spent: 10m 
  Work Description: pkumarsinha commented on a change in pull request #1120:
URL: https://github.com/apache/hive/pull/1120#discussion_r447178750



##
File path: 
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestReplicationScenariosExternalTables.java
##
@@ -517,7 +508,7 @@ public void externalTableIncrementalReplication() throws 
Throwable {
 + "'")
 .run("alter table t1 add partition(country='india')")
 .run("alter table t1 add partition(country='us')")
-.dump(primaryDbName, withClause);

Review comment:
   Not needed anymore





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 452575)
Time Spent: 50m  (was: 40m)

> Mandate fully qualified absolute path for external table base dir during REPL 
> operation
> ---
>
> Key: HIVE-23611
> URL: https://issues.apache.org/jira/browse/HIVE-23611
> Project: Hive
>  Issue Type: Improvement
>Reporter: Pravin Sinha
>Assignee: Pravin Sinha
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23611.01.patch, HIVE-23611.02.patch
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-23779) BasicStatsTask Info is not getting printed in beeline console

2020-06-29 Thread Naresh P R (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naresh P R reassigned HIVE-23779:
-


> BasicStatsTask Info is not getting printed in beeline console
> -
>
> Key: HIVE-23779
> URL: https://issues.apache.org/jira/browse/HIVE-23779
> Project: Hive
>  Issue Type: Bug
>Reporter: Naresh P R
>Assignee: Naresh P R
>Priority: Major
>
> After HIVE-16061, partition basic stats are not getting printed in beeline 
> console.
> {code:java}
> INFO : Partition {dt=2020-06-29} stats: [numFiles=21, numRows=22, 
> totalSize=14607, rawDataSize=0]{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23779) BasicStatsTask Info is not getting printed in beeline console

2020-06-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23779?focusedWorklogId=452592&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-452592
 ]

ASF GitHub Bot logged work on HIVE-23779:
-

Author: ASF GitHub Bot
Created on: 29/Jun/20 19:17
Start Date: 29/Jun/20 19:17
Worklog Time Spent: 10m 
  Work Description: nareshpr opened a new pull request #1191:
URL: https://github.com/apache/hive/pull/1191


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 452592)
Remaining Estimate: 0h
Time Spent: 10m

> BasicStatsTask Info is not getting printed in beeline console
> -
>
> Key: HIVE-23779
> URL: https://issues.apache.org/jira/browse/HIVE-23779
> Project: Hive
>  Issue Type: Bug
>Reporter: Naresh P R
>Assignee: Naresh P R
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> After HIVE-16061, partition basic stats are not getting printed in beeline 
> console.
> {code:java}
> INFO : Partition {dt=2020-06-29} stats: [numFiles=21, numRows=22, 
> totalSize=14607, rawDataSize=0]{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23779) BasicStatsTask Info is not getting printed in beeline console

2020-06-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-23779:
--
Labels: pull-request-available  (was: )

> BasicStatsTask Info is not getting printed in beeline console
> -
>
> Key: HIVE-23779
> URL: https://issues.apache.org/jira/browse/HIVE-23779
> Project: Hive
>  Issue Type: Bug
>Reporter: Naresh P R
>Assignee: Naresh P R
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> After HIVE-16061, partition basic stats are not getting printed in beeline 
> console.
> {code:java}
> INFO : Partition {dt=2020-06-29} stats: [numFiles=21, numRows=22, 
> totalSize=14607, rawDataSize=0]{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-23776) Retire quickstats autocollection

2020-06-29 Thread Peter Vary (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17148098#comment-17148098
 ] 

Peter Vary commented on HIVE-23776:
---

[~prasanth_j]: I have been analyzing the ACID update queries execution time on 
S3 with simple, 1 row updates. The flamegraph for the HS2 side shows that 1/4 
of the time there is spent on stats generation, specifically on listing of the 
files and directories.
Thanks, Peter 

> Retire quickstats autocollection
> 
>
> Key: HIVE-23776
> URL: https://issues.apache.org/jira/browse/HIVE-23776
> Project: Hive
>  Issue Type: Improvement
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>
> this is about:
> * num files
> * datasize (sum of filesizes)
> * num erasure coded files
> right now these are scanned during every BasicStatsTask execution - which 
> means some filesystem reads/etc - for small inserts these are visible in case 
> the fs is a bit slower (s3 and friends)
> I don't think they are really in use...we rely more on columnstats which are 
> more accurate ; and because of the datasize in this case is for "offline" 
> (ondisk) - while we should be insted calculate with "online" sizes...
> proposal:
> * remove collection and storage of this data
> * collect it on the fly during "desc formatted" statements to provide them 
> for informational purposes



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-19549) Enable TestAcidOnTez#testCtasTezUnion

2020-06-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-19549?focusedWorklogId=452599&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-452599
 ]

ASF GitHub Bot logged work on HIVE-19549:
-

Author: ASF GitHub Bot
Created on: 29/Jun/20 19:56
Start Date: 29/Jun/20 19:56
Worklog Time Spent: 10m 
  Work Description: kgyrtkirk commented on pull request #1188:
URL: https://github.com/apache/hive/pull/1188#issuecomment-651326251


   before enabling flaky tests back please run the checker; without that there 
is no proof that the test is most likely stable
   I've started it for this one:
   http://ci.hive.apache.org/job/hive-flaky-check/54/
   http://ci.hive.apache.org/job/hive-flaky-check/55/



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 452599)
Time Spent: 20m  (was: 10m)

> Enable TestAcidOnTez#testCtasTezUnion
> -
>
> Key: HIVE-19549
> URL: https://issues.apache.org/jira/browse/HIVE-19549
> Project: Hive
>  Issue Type: Test
>  Components: Test
>Affects Versions: 3.1.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Krisztian Kasa
>Priority: Critical
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23772) Relocate calcite-core to prevent NoSuchFiledError

2020-06-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23772?focusedWorklogId=452598&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-452598
 ]

ASF GitHub Bot logged work on HIVE-23772:
-

Author: ASF GitHub Bot
Created on: 29/Jun/20 19:56
Start Date: 29/Jun/20 19:56
Worklog Time Spent: 10m 
  Work Description: kgyrtkirk commented on pull request #1187:
URL: https://github.com/apache/hive/pull/1187#issuecomment-651325059


   yeah; now I remember - I've also reached these avatica exceptions; the story 
behind them is:
   * jdbc driver is used to open the connection; it passes some things as 
arguments to the jdbc driver
   * the typefactory is "org.calcite...""
   * the expected type factory is "shaded.org.calcite"
   so it fails with an exception... 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 452598)
Time Spent: 0.5h  (was: 20m)

> Relocate calcite-core to prevent NoSuchFiledError
> -
>
> Key: HIVE-23772
> URL: https://issues.apache.org/jira/browse/HIVE-23772
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Exception trace due to conflict with {{calcite-core}}
> {noformat}
> Caused by: java.lang.NoSuchFieldError: operands
> at 
> org.apache.hadoop.hive.ql.optimizer.calcite.translator.ASTConverter$RexVisitor.visitCall(ASTConverter.java:785)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.optimizer.calcite.translator.ASTConverter$RexVisitor.visitCall(ASTConverter.java:509)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at org.apache.calcite.rex.RexCall.accept(RexCall.java:191) 
> ~[calcite-core-1.21.0.jar:1.21.0]
> at 
> org.apache.hadoop.hive.ql.optimizer.calcite.translator.ASTConverter.convert(ASTConverter.java:239)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.optimizer.calcite.translator.ASTConverter.convertSource(ASTConverter.java:437)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.optimizer.calcite.translator.ASTConverter.convert(ASTConverter.java:124)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.optimizer.calcite.translator.ASTConverter.convert(ASTConverter.java:112)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.getOptimizedAST(CalcitePlanner.java:1620)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:555)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:12456)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:433)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:290)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:220) 
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:104) 
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:184) 
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:602) 
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:548) 
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:542) 
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.compileAndRespond(ReExecDriver.java:125)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:199)
>  ~[hive-service-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-23776) Retire quickstats autocollection

2020-06-29 Thread Prasanth Jayachandran (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17148114#comment-17148114
 ] 

Prasanth Jayachandran commented on HIVE-23776:
--

[~pvary] I understand the performance concerns that the basicstats brings esp. 
on the cloud environments. But I would like to discuss the alternatives instead 
of just removing it as there are certainly dependencies on file sizes and 
number of files which cannot be removed. The rawDataSize is good but only 
represents the in-memory representation which is certainly good for most 
optimizations but not for all.. The totalFileSize vs rawDataSize gives 
approximately the compression ratio which still is beneficial for some 
optimizations (totalFileSize can be used for estimating the splits, estimating 
the number of containers/nodes required without running the scans etc.). It is 
better to pay the cost of it once upfront during ETL when compared to every 
time when we run a query or desc formatted. If the basicstats are published as 
counters from the tasks then tez AM can aggregate it at DAG level 
(https://github.com/apache/hive/blob/6440d93981e6d6aab59ecf2e77ffa45cd84d47de/ql/src/test/results/clientpositive/llap/tez_compile_counters.q.out#L1524-L1530)
 which HS2 can use to store it into the metastore without ever doing file 
listing. This is one such approach and this can be abstracted out if this 
required for other engines. We could explore alternative approaches as well. I 
do not think it is good idea to remove it just because it is slow on one cloud 
filesystem.

> Retire quickstats autocollection
> 
>
> Key: HIVE-23776
> URL: https://issues.apache.org/jira/browse/HIVE-23776
> Project: Hive
>  Issue Type: Improvement
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>
> this is about:
> * num files
> * datasize (sum of filesizes)
> * num erasure coded files
> right now these are scanned during every BasicStatsTask execution - which 
> means some filesystem reads/etc - for small inserts these are visible in case 
> the fs is a bit slower (s3 and friends)
> I don't think they are really in use...we rely more on columnstats which are 
> more accurate ; and because of the datasize in this case is for "offline" 
> (ondisk) - while we should be insted calculate with "online" sizes...
> proposal:
> * remove collection and storage of this data
> * collect it on the fly during "desc formatted" statements to provide them 
> for informational purposes



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23772) Relocate calcite-core to prevent NoSuchFiledError

2020-06-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23772?focusedWorklogId=452628&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-452628
 ]

ASF GitHub Bot logged work on HIVE-23772:
-

Author: ASF GitHub Bot
Created on: 29/Jun/20 21:00
Start Date: 29/Jun/20 21:00
Worklog Time Spent: 10m 
  Work Description: ayushtkn commented on pull request #1187:
URL: https://github.com/apache/hive/pull/1187#issuecomment-651363049


   The values passed are from :
   
https://github.com/apache/calcite/blob/branch-1.21/core/src/main/java/org/apache/calcite/jdbc/CalciteConnectionImpl.java#L127
   The type factory must be null since it is the else part.
   
   The trace says its L127 -
   `at 
org.apache.calcite.jdbc.CalciteConnectionImpl.(CalciteConnectionImpl.java:127`
   
   and both argument ideally should be from `calcite-core` hence should have 
got relocated, they are further tweaked at `calcite-avatica` only, may be some 
dependency which uses calcite-core is called, Need to find, For that need to 
clone and get the dependency tree and stuff, or debug where the signature got 
changed from shaded to non shaded.
   Will find time to dig in more may be this or next weekend for this. Do let 
me know if you have any further pointers.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 452628)
Time Spent: 40m  (was: 0.5h)

> Relocate calcite-core to prevent NoSuchFiledError
> -
>
> Key: HIVE-23772
> URL: https://issues.apache.org/jira/browse/HIVE-23772
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Exception trace due to conflict with {{calcite-core}}
> {noformat}
> Caused by: java.lang.NoSuchFieldError: operands
> at 
> org.apache.hadoop.hive.ql.optimizer.calcite.translator.ASTConverter$RexVisitor.visitCall(ASTConverter.java:785)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.optimizer.calcite.translator.ASTConverter$RexVisitor.visitCall(ASTConverter.java:509)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at org.apache.calcite.rex.RexCall.accept(RexCall.java:191) 
> ~[calcite-core-1.21.0.jar:1.21.0]
> at 
> org.apache.hadoop.hive.ql.optimizer.calcite.translator.ASTConverter.convert(ASTConverter.java:239)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.optimizer.calcite.translator.ASTConverter.convertSource(ASTConverter.java:437)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.optimizer.calcite.translator.ASTConverter.convert(ASTConverter.java:124)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.optimizer.calcite.translator.ASTConverter.convert(ASTConverter.java:112)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.getOptimizedAST(CalcitePlanner.java:1620)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:555)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:12456)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:433)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:290)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:220) 
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:104) 
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:184) 
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:602) 
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:548) 
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.Driver.compile

[jira] [Work logged] (HIVE-23772) Relocate calcite-core to prevent NoSuchFiledError

2020-06-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23772?focusedWorklogId=452638&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-452638
 ]

ASF GitHub Bot logged work on HIVE-23772:
-

Author: ASF GitHub Bot
Created on: 29/Jun/20 21:13
Start Date: 29/Jun/20 21:13
Worklog Time Spent: 10m 
  Work Description: ayushtkn edited a comment on pull request #1187:
URL: https://github.com/apache/hive/pull/1187#issuecomment-651363049


   The values passed are from :
   
https://github.com/apache/calcite/blob/branch-1.21/core/src/main/java/org/apache/calcite/jdbc/CalciteConnectionImpl.java#L127
   The type factory must be null since it is the else part.
   
   The trace says its L127 -
   `at 
org.apache.calcite.jdbc.CalciteConnectionImpl.(CalciteConnectionImpl.java:127`
   
   and both argument ideally should be from `calcite-core` hence should have 
got relocated, they are further tweaked at `calcite-avatica` only, may be some 
dependency which uses calcite-core is called, Need to find, For that need to 
clone and get the dependency tree and stuff, or debug where the signature got 
changed from shaded to non shaded.
   Will find time to dig in more may be this or next weekend for this. Need to 
see Calcite working as well, Which I have no idea. Do let me know if you have 
any further pointers. :-) 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 452638)
Time Spent: 50m  (was: 40m)

> Relocate calcite-core to prevent NoSuchFiledError
> -
>
> Key: HIVE-23772
> URL: https://issues.apache.org/jira/browse/HIVE-23772
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Exception trace due to conflict with {{calcite-core}}
> {noformat}
> Caused by: java.lang.NoSuchFieldError: operands
> at 
> org.apache.hadoop.hive.ql.optimizer.calcite.translator.ASTConverter$RexVisitor.visitCall(ASTConverter.java:785)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.optimizer.calcite.translator.ASTConverter$RexVisitor.visitCall(ASTConverter.java:509)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at org.apache.calcite.rex.RexCall.accept(RexCall.java:191) 
> ~[calcite-core-1.21.0.jar:1.21.0]
> at 
> org.apache.hadoop.hive.ql.optimizer.calcite.translator.ASTConverter.convert(ASTConverter.java:239)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.optimizer.calcite.translator.ASTConverter.convertSource(ASTConverter.java:437)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.optimizer.calcite.translator.ASTConverter.convert(ASTConverter.java:124)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.optimizer.calcite.translator.ASTConverter.convert(ASTConverter.java:112)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.getOptimizedAST(CalcitePlanner.java:1620)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:555)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:12456)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:433)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:290)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:220) 
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:104) 
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:184) 
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:602) 
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:548) 
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4

[jira] [Assigned] (HIVE-23780) Fail dropTable if acid cleanup fails

2020-06-29 Thread Mustafa Iman (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mustafa Iman reassigned HIVE-23780:
---


> Fail dropTable if acid cleanup fails
> 
>
> Key: HIVE-23780
> URL: https://issues.apache.org/jira/browse/HIVE-23780
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore, Standalone Metastore, Transactions
>Reporter: Mustafa Iman
>Assignee: Mustafa Iman
>Priority: Major
>
> Acid cleanup happens after dropTable is committed. If cleanup fails for some 
> reason, there are leftover entries in acid tables. This later causes dropped 
> table's name to be unusable by new tables.
> [~pvary] [~ngangam]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23780) Fail dropTable if acid cleanup fails

2020-06-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-23780:
--
Labels: pull-request-available  (was: )

> Fail dropTable if acid cleanup fails
> 
>
> Key: HIVE-23780
> URL: https://issues.apache.org/jira/browse/HIVE-23780
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore, Standalone Metastore, Transactions
>Reporter: Mustafa Iman
>Assignee: Mustafa Iman
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Acid cleanup happens after dropTable is committed. If cleanup fails for some 
> reason, there are leftover entries in acid tables. This later causes dropped 
> table's name to be unusable by new tables.
> [~pvary] [~ngangam]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23780) Fail dropTable if acid cleanup fails

2020-06-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23780?focusedWorklogId=452668&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-452668
 ]

ASF GitHub Bot logged work on HIVE-23780:
-

Author: ASF GitHub Bot
Created on: 29/Jun/20 22:15
Start Date: 29/Jun/20 22:15
Worklog Time Spent: 10m 
  Work Description: mustafaiman opened a new pull request #1192:
URL: https://github.com/apache/hive/pull/1192


   Change-Id: Ica7666afe40cb0f0128266c9c3f6ebc560b24c0e
   
   ## NOTICE
   
   Please create an issue in ASF JIRA before opening a pull request,
   and you need to set the title of the pull request which starts with
   the corresponding JIRA issue number. (e.g. HIVE-X: Fix a typo in YYY)
   For more details, please see 
https://cwiki.apache.org/confluence/display/Hive/HowToContribute
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 452668)
Remaining Estimate: 0h
Time Spent: 10m

> Fail dropTable if acid cleanup fails
> 
>
> Key: HIVE-23780
> URL: https://issues.apache.org/jira/browse/HIVE-23780
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore, Standalone Metastore, Transactions
>Reporter: Mustafa Iman
>Assignee: Mustafa Iman
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Acid cleanup happens after dropTable is committed. If cleanup fails for some 
> reason, there are leftover entries in acid tables. This later causes dropped 
> table's name to be unusable by new tables.
> [~pvary] [~ngangam]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23597) VectorizedOrcAcidRowBatchReader::ColumnizedDeleteEventRegistry reads delete delta directories multiple times

2020-06-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23597?focusedWorklogId=452674&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-452674
 ]

ASF GitHub Bot logged work on HIVE-23597:
-

Author: ASF GitHub Bot
Created on: 29/Jun/20 22:49
Start Date: 29/Jun/20 22:49
Worklog Time Spent: 10m 
  Work Description: rbalamohan commented on a change in pull request #1081:
URL: https://github.com/apache/hive/pull/1081#discussion_r447302965



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/io/orc/VectorizedOrcAcidRowBatchReader.java
##
@@ -1605,6 +1618,46 @@ public int compareTo(CompressedOwid other) {
 throw e; // rethrow the exception so that the caller can handle.
   }
 }
+
+/**
+ * Create delete delta reader. Caching orc tail to avoid FS lookup/reads 
for repeated scans.
+ *
+ * @param deleteDeltaFile
+ * @param conf
+ * @param fs FileSystem
+ * @return delete file reader
+ * @throws IOException
+ */
+private Reader getDeleteDeltaReader(Path deleteDeltaFile, JobConf conf, 
FileSystem fs) throws IOException {
+  OrcTail deleteDeltaTail = 
deleteDeltaOrcTailCache.getIfPresent(deleteDeltaFile);

Review comment:
   Yes, it will not change for the file.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 452674)
Time Spent: 1h 20m  (was: 1h 10m)

> VectorizedOrcAcidRowBatchReader::ColumnizedDeleteEventRegistry reads delete 
> delta directories multiple times
> 
>
> Key: HIVE-23597
> URL: https://issues.apache.org/jira/browse/HIVE-23597
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/io/orc/VectorizedOrcAcidRowBatchReader.java#L1562]
> {code:java}
> try {
> final Path[] deleteDeltaDirs = getDeleteDeltaDirsFromSplit(orcSplit);
> if (deleteDeltaDirs.length > 0) {
>   int totalDeleteEventCount = 0;
>   for (Path deleteDeltaDir : deleteDeltaDirs) {
> {code}
>  
> Consider a directory layout like the following. This was created by having 
> simple set of "insert --> update --> select" queries.
>  
> {noformat}
> /warehouse-1591131255-hl5z/warehouse/tablespace/managed/hive/sequential_update_4/base_001
> /warehouse-1591131255-hl5z/warehouse/tablespace/managed/hive/sequential_update_4/base_002
> /warehouse-1591131255-hl5z/warehouse/tablespace/managed/hive/sequential_update_4/delete_delta_003_003_
> /warehouse-1591131255-hl5z/warehouse/tablespace/managed/hive/sequential_update_4/delete_delta_004_004_
> /warehouse-1591131255-hl5z/warehouse/tablespace/managed/hive/sequential_update_4/delete_delta_005_005_
> /warehouse-1591131255-hl5z/warehouse/tablespace/managed/hive/sequential_update_4/delete_delta_006_006_
> /warehouse-1591131255-hl5z/warehouse/tablespace/managed/hive/sequential_update_4/delete_delta_007_007_
> /warehouse-1591131255-hl5z/warehouse/tablespace/managed/hive/sequential_update_4/delete_delta_008_008_
> /warehouse-1591131255-hl5z/warehouse/tablespace/managed/hive/sequential_update_4/delete_delta_009_009_
> /warehouse-1591131255-hl5z/warehouse/tablespace/managed/hive/sequential_update_4/delete_delta_010_010_
> /warehouse-1591131255-hl5z/warehouse/tablespace/managed/hive/sequential_update_4/delete_delta_011_011_
> /warehouse-1591131255-hl5z/warehouse/tablespace/managed/hive/sequential_update_4/delete_delta_012_012_
> /warehouse-1591131255-hl5z/warehouse/tablespace/managed/hive/sequential_update_4/delete_delta_013_013_
> /warehouse-1591131255-hl5z/warehouse/tablespace/managed/hive/sequential_update_4/delta_003_003_
> /warehouse-1591131255-hl5z/warehouse/tablespace/managed/hive/sequential_update_4/delta_004_004_
> /warehouse-1591131255-hl5z/warehouse/tablespace/managed/hive/sequential_update_4/delta_005_005_
> /warehouse-1591131255-hl5z/warehouse/tablespace/managed/hive/sequential_update_4/delta_006_006_
> /warehouse-1591131255-hl5z/warehouse/tablespace/managed/hive/sequential_update_4/delta_007_007_
> /warehouse-1591131255-hl5z/warehouse/tablespace/man

[jira] [Resolved] (HIVE-23748) tez task with File Merge operator generate tmp file with wrong suffix

2020-06-29 Thread wanguangping (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wanguangping resolved HIVE-23748.
-
Resolution: Fixed

> tez task with File Merge operator generate tmp file with wrong suffix
> -
>
> Key: HIVE-23748
> URL: https://issues.apache.org/jira/browse/HIVE-23748
> Project: Hive
>  Issue Type: Bug
>  Components: Tez
>Affects Versions: 3.1.0
>Reporter: wanguangping
>Priority: Major
>
> h1. background
>  * SQL on TEZ 
>  * it's a Occasional problem
> h1. hiveserver2 log
> SLF4J: Class path contains multiple SLF4J bindings.
>  SLF4J: Found binding in 
> [jar:file:/usr/hdp/3.1.4.0-315/hive/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>  SLF4J: Found binding in 
> [jar:file:/usr/hdp/3.1.4.0-315/hadoop/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>  SLF4J: See [http://www.slf4j.org/codes.html#multiple_bindings] for an 
> explanation.
>  SLF4J: Actual binding is of type 
> [org.apache.logging.slf4j.Log4jLoggerFactory]
>  Connecting to jdbc:hive2://xxx:1/prod
>  Connected to: Apache Hive (version 3.1.0.3.1.4.0-315)
>  Driver: Hive JDBC (version 3.1.0.3.1.4.0-315)
>  Transaction isolation: TRANSACTION_REPEATABLE_READ
>  INFO : Compiling 
> command(queryId=hive_20200609033313_ed882b48-7ab4-42a2-84e4-c9ef764271e2): 
> use prod
>  INFO : Concurrency mode is disabled, not creating a lock manager
>  INFO : Semantic Analysis Completed (retrial = false)
>  INFO : Returning Hive schema: Schema(fieldSchemas:null, properties:null)
>  INFO : Completed compiling 
> command(queryId=hive_20200609033313_ed882b48-7ab4-42a2-84e4-c9ef764271e2); 
> Time taken: 0.887 seconds
>  INFO : Concurrency mode is disabled, not creating a lock manager
>  INFO : Executing 
> command(queryId=hive_20200609033313_ed882b48-7ab4-42a2-84e4-c9ef764271e2): 
> use prod
>  INFO : Starting task [Stage-0:DDL] in serial mode
>  INFO : Completed executing 
> command(queryId=hive_20200609033313_ed882b48-7ab4-42a2-84e4-c9ef764271e2); 
> Time taken: 0.197 seconds
>  INFO : OK
>  INFO : Concurrency mode is disabled, not creating a lock manager
>  No rows affected (1.096 seconds)
>  No rows affected (0.004 seconds)
>  INFO : Compiling 
> command(queryId=hive_20200609033314_cba66b08-ad42-4b94-ad61-d15fe48efe23): 
> drop table if exists temp.shawnlee_newbase_devicebase
>  INFO : Concurrency mode is disabled, not creating a lock manager
>  INFO : Semantic Analysis Completed (retrial = false)
>  INFO : Returning Hive schema: Schema(fieldSchemas:null, properties:null)
>  INFO : Completed compiling 
> command(queryId=hive_20200609033314_cba66b08-ad42-4b94-ad61-d15fe48efe23); 
> Time taken: 1.324 seconds
>  INFO : Concurrency mode is disabled, not creating a lock manager
>  INFO : Executing 
> command(queryId=hive_20200609033314_cba66b08-ad42-4b94-ad61-d15fe48efe23): 
> drop table if exists temp.shawnlee_newbase_devicebase
>  INFO : Starting task [Stage-0:DDL] in serial mode
>  INFO : Completed executing 
> command(queryId=hive_20200609033314_cba66b08-ad42-4b94-ad61-d15fe48efe23); 
> Time taken: 12.895 seconds
>  INFO : OK
>  INFO : Concurrency mode is disabled, not creating a lock manager
>  No rows affected (14.229 seconds)
>  INFO : Compiling 
> command(queryId=hive_20200609033329_3fbf0a38-e5b0-4e3a-ae8b-ef95f400b50f): 
> x
>  INFO : Concurrency mode is disabled, not creating a lock manager
>  INFO : No Stats for user_profile@dw_uba_event_daily, Columns: attribute, 
> event
>  INFO : Semantic Analysis Completed (retrial = false)
>  INFO : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:day, 
> type:string, comment:null), FieldSchema(name:device_id, type:string, 
> comment:null), FieldSchema(name:is_new, type:int, comment:null), 
> FieldSchema(name:first_attribute, type:map, comment:null), 
> FieldSchema(name:first_app_version, type:string, comment:null), 
> FieldSchema(name:first_platform_type, type:string, comment:null), 
> FieldSchema(name:first_manufacturer, type:string, comment:null), 
> FieldSchema(name:first_model, type:string, comment:null), 
> FieldSchema(name:first_ipprovince, type:string, comment:null), 
> FieldSchema(name:first_ipcity, type:string, comment:null), 
> FieldSchema(name:last_attribute, type:map, comment:null), 
> FieldSchema(name:last_app_version, type:string, comment:null), 
> FieldSchema(name:last_platform_type, type:string, comment:null), 
> FieldSchema(name:last_manufacturer, type:string, comment:null), 
> FieldSchema(name:last_model, type:string, comment:null), 
> FieldSchema(name:last_ipprovince, type:string, comment:null), 
> FieldSchema(name:last_ipcity, type:string, comment:null)], properties:null)
>  INFO : Completed compiling 
> command(queryId=hive_20200609033329_3fbf0a38-e5b0-4e3a-ae8b-ef95f400b50f

[jira] [Reopened] (HIVE-23748) tez task with File Merge operator generate tmp file with wrong suffix

2020-06-29 Thread wanguangping (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wanguangping reopened HIVE-23748:
-

> tez task with File Merge operator generate tmp file with wrong suffix
> -
>
> Key: HIVE-23748
> URL: https://issues.apache.org/jira/browse/HIVE-23748
> Project: Hive
>  Issue Type: Bug
>  Components: Tez
>Affects Versions: 3.1.0
>Reporter: wanguangping
>Priority: Major
>
> h1. background
>  * SQL on TEZ 
>  * it's a Occasional problem
> h1. hiveserver2 log
> SLF4J: Class path contains multiple SLF4J bindings.
>  SLF4J: Found binding in 
> [jar:file:/usr/hdp/3.1.4.0-315/hive/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>  SLF4J: Found binding in 
> [jar:file:/usr/hdp/3.1.4.0-315/hadoop/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>  SLF4J: See [http://www.slf4j.org/codes.html#multiple_bindings] for an 
> explanation.
>  SLF4J: Actual binding is of type 
> [org.apache.logging.slf4j.Log4jLoggerFactory]
>  Connecting to jdbc:hive2://xxx:1/prod
>  Connected to: Apache Hive (version 3.1.0.3.1.4.0-315)
>  Driver: Hive JDBC (version 3.1.0.3.1.4.0-315)
>  Transaction isolation: TRANSACTION_REPEATABLE_READ
>  INFO : Compiling 
> command(queryId=hive_20200609033313_ed882b48-7ab4-42a2-84e4-c9ef764271e2): 
> use prod
>  INFO : Concurrency mode is disabled, not creating a lock manager
>  INFO : Semantic Analysis Completed (retrial = false)
>  INFO : Returning Hive schema: Schema(fieldSchemas:null, properties:null)
>  INFO : Completed compiling 
> command(queryId=hive_20200609033313_ed882b48-7ab4-42a2-84e4-c9ef764271e2); 
> Time taken: 0.887 seconds
>  INFO : Concurrency mode is disabled, not creating a lock manager
>  INFO : Executing 
> command(queryId=hive_20200609033313_ed882b48-7ab4-42a2-84e4-c9ef764271e2): 
> use prod
>  INFO : Starting task [Stage-0:DDL] in serial mode
>  INFO : Completed executing 
> command(queryId=hive_20200609033313_ed882b48-7ab4-42a2-84e4-c9ef764271e2); 
> Time taken: 0.197 seconds
>  INFO : OK
>  INFO : Concurrency mode is disabled, not creating a lock manager
>  No rows affected (1.096 seconds)
>  No rows affected (0.004 seconds)
>  INFO : Compiling 
> command(queryId=hive_20200609033314_cba66b08-ad42-4b94-ad61-d15fe48efe23): 
> drop table if exists temp.shawnlee_newbase_devicebase
>  INFO : Concurrency mode is disabled, not creating a lock manager
>  INFO : Semantic Analysis Completed (retrial = false)
>  INFO : Returning Hive schema: Schema(fieldSchemas:null, properties:null)
>  INFO : Completed compiling 
> command(queryId=hive_20200609033314_cba66b08-ad42-4b94-ad61-d15fe48efe23); 
> Time taken: 1.324 seconds
>  INFO : Concurrency mode is disabled, not creating a lock manager
>  INFO : Executing 
> command(queryId=hive_20200609033314_cba66b08-ad42-4b94-ad61-d15fe48efe23): 
> drop table if exists temp.shawnlee_newbase_devicebase
>  INFO : Starting task [Stage-0:DDL] in serial mode
>  INFO : Completed executing 
> command(queryId=hive_20200609033314_cba66b08-ad42-4b94-ad61-d15fe48efe23); 
> Time taken: 12.895 seconds
>  INFO : OK
>  INFO : Concurrency mode is disabled, not creating a lock manager
>  No rows affected (14.229 seconds)
>  INFO : Compiling 
> command(queryId=hive_20200609033329_3fbf0a38-e5b0-4e3a-ae8b-ef95f400b50f): 
> x
>  INFO : Concurrency mode is disabled, not creating a lock manager
>  INFO : No Stats for user_profile@dw_uba_event_daily, Columns: attribute, 
> event
>  INFO : Semantic Analysis Completed (retrial = false)
>  INFO : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:day, 
> type:string, comment:null), FieldSchema(name:device_id, type:string, 
> comment:null), FieldSchema(name:is_new, type:int, comment:null), 
> FieldSchema(name:first_attribute, type:map, comment:null), 
> FieldSchema(name:first_app_version, type:string, comment:null), 
> FieldSchema(name:first_platform_type, type:string, comment:null), 
> FieldSchema(name:first_manufacturer, type:string, comment:null), 
> FieldSchema(name:first_model, type:string, comment:null), 
> FieldSchema(name:first_ipprovince, type:string, comment:null), 
> FieldSchema(name:first_ipcity, type:string, comment:null), 
> FieldSchema(name:last_attribute, type:map, comment:null), 
> FieldSchema(name:last_app_version, type:string, comment:null), 
> FieldSchema(name:last_platform_type, type:string, comment:null), 
> FieldSchema(name:last_manufacturer, type:string, comment:null), 
> FieldSchema(name:last_model, type:string, comment:null), 
> FieldSchema(name:last_ipprovince, type:string, comment:null), 
> FieldSchema(name:last_ipcity, type:string, comment:null)], properties:null)
>  INFO : Completed compiling 
> command(queryId=hive_20200609033329_3fbf0a38-e5b0-4e3a-ae8b-ef95f400b50f); 
> Time taken: 78.5

[jira] [Resolved] (HIVE-23748) tez task with File Merge operator generate tmp file with wrong suffix

2020-06-29 Thread wanguangping (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wanguangping resolved HIVE-23748.
-
Resolution: Fixed

> tez task with File Merge operator generate tmp file with wrong suffix
> -
>
> Key: HIVE-23748
> URL: https://issues.apache.org/jira/browse/HIVE-23748
> Project: Hive
>  Issue Type: Bug
>  Components: Tez
>Affects Versions: 3.1.0
>Reporter: wanguangping
>Priority: Major
>
> h1. background
>  * SQL on TEZ 
>  * it's a Occasional problem
> h1. hiveserver2 log
> SLF4J: Class path contains multiple SLF4J bindings.
>  SLF4J: Found binding in 
> [jar:file:/usr/hdp/3.1.4.0-315/hive/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>  SLF4J: Found binding in 
> [jar:file:/usr/hdp/3.1.4.0-315/hadoop/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>  SLF4J: See [http://www.slf4j.org/codes.html#multiple_bindings] for an 
> explanation.
>  SLF4J: Actual binding is of type 
> [org.apache.logging.slf4j.Log4jLoggerFactory]
>  Connecting to jdbc:hive2://xxx:1/prod
>  Connected to: Apache Hive (version 3.1.0.3.1.4.0-315)
>  Driver: Hive JDBC (version 3.1.0.3.1.4.0-315)
>  Transaction isolation: TRANSACTION_REPEATABLE_READ
>  INFO : Compiling 
> command(queryId=hive_20200609033313_ed882b48-7ab4-42a2-84e4-c9ef764271e2): 
> use prod
>  INFO : Concurrency mode is disabled, not creating a lock manager
>  INFO : Semantic Analysis Completed (retrial = false)
>  INFO : Returning Hive schema: Schema(fieldSchemas:null, properties:null)
>  INFO : Completed compiling 
> command(queryId=hive_20200609033313_ed882b48-7ab4-42a2-84e4-c9ef764271e2); 
> Time taken: 0.887 seconds
>  INFO : Concurrency mode is disabled, not creating a lock manager
>  INFO : Executing 
> command(queryId=hive_20200609033313_ed882b48-7ab4-42a2-84e4-c9ef764271e2): 
> use prod
>  INFO : Starting task [Stage-0:DDL] in serial mode
>  INFO : Completed executing 
> command(queryId=hive_20200609033313_ed882b48-7ab4-42a2-84e4-c9ef764271e2); 
> Time taken: 0.197 seconds
>  INFO : OK
>  INFO : Concurrency mode is disabled, not creating a lock manager
>  No rows affected (1.096 seconds)
>  No rows affected (0.004 seconds)
>  INFO : Compiling 
> command(queryId=hive_20200609033314_cba66b08-ad42-4b94-ad61-d15fe48efe23): 
> drop table if exists temp.shawnlee_newbase_devicebase
>  INFO : Concurrency mode is disabled, not creating a lock manager
>  INFO : Semantic Analysis Completed (retrial = false)
>  INFO : Returning Hive schema: Schema(fieldSchemas:null, properties:null)
>  INFO : Completed compiling 
> command(queryId=hive_20200609033314_cba66b08-ad42-4b94-ad61-d15fe48efe23); 
> Time taken: 1.324 seconds
>  INFO : Concurrency mode is disabled, not creating a lock manager
>  INFO : Executing 
> command(queryId=hive_20200609033314_cba66b08-ad42-4b94-ad61-d15fe48efe23): 
> drop table if exists temp.shawnlee_newbase_devicebase
>  INFO : Starting task [Stage-0:DDL] in serial mode
>  INFO : Completed executing 
> command(queryId=hive_20200609033314_cba66b08-ad42-4b94-ad61-d15fe48efe23); 
> Time taken: 12.895 seconds
>  INFO : OK
>  INFO : Concurrency mode is disabled, not creating a lock manager
>  No rows affected (14.229 seconds)
>  INFO : Compiling 
> command(queryId=hive_20200609033329_3fbf0a38-e5b0-4e3a-ae8b-ef95f400b50f): 
> x
>  INFO : Concurrency mode is disabled, not creating a lock manager
>  INFO : No Stats for user_profile@dw_uba_event_daily, Columns: attribute, 
> event
>  INFO : Semantic Analysis Completed (retrial = false)
>  INFO : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:day, 
> type:string, comment:null), FieldSchema(name:device_id, type:string, 
> comment:null), FieldSchema(name:is_new, type:int, comment:null), 
> FieldSchema(name:first_attribute, type:map, comment:null), 
> FieldSchema(name:first_app_version, type:string, comment:null), 
> FieldSchema(name:first_platform_type, type:string, comment:null), 
> FieldSchema(name:first_manufacturer, type:string, comment:null), 
> FieldSchema(name:first_model, type:string, comment:null), 
> FieldSchema(name:first_ipprovince, type:string, comment:null), 
> FieldSchema(name:first_ipcity, type:string, comment:null), 
> FieldSchema(name:last_attribute, type:map, comment:null), 
> FieldSchema(name:last_app_version, type:string, comment:null), 
> FieldSchema(name:last_platform_type, type:string, comment:null), 
> FieldSchema(name:last_manufacturer, type:string, comment:null), 
> FieldSchema(name:last_model, type:string, comment:null), 
> FieldSchema(name:last_ipprovince, type:string, comment:null), 
> FieldSchema(name:last_ipcity, type:string, comment:null)], properties:null)
>  INFO : Completed compiling 
> command(queryId=hive_20200609033329_3fbf0a38-e5b0-4e3a-ae8b-ef95f400b50f

[jira] [Reopened] (HIVE-23748) tez task with File Merge operator generate tmp file with wrong suffix

2020-06-29 Thread wanguangping (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wanguangping reopened HIVE-23748:
-

> tez task with File Merge operator generate tmp file with wrong suffix
> -
>
> Key: HIVE-23748
> URL: https://issues.apache.org/jira/browse/HIVE-23748
> Project: Hive
>  Issue Type: Bug
>  Components: Tez
>Affects Versions: 3.1.0
>Reporter: wanguangping
>Priority: Major
>
> h1. background
>  * SQL on TEZ 
>  * it's a Occasional problem
> h1. hiveserver2 log
> SLF4J: Class path contains multiple SLF4J bindings.
>  SLF4J: Found binding in 
> [jar:file:/usr/hdp/3.1.4.0-315/hive/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>  SLF4J: Found binding in 
> [jar:file:/usr/hdp/3.1.4.0-315/hadoop/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>  SLF4J: See [http://www.slf4j.org/codes.html#multiple_bindings] for an 
> explanation.
>  SLF4J: Actual binding is of type 
> [org.apache.logging.slf4j.Log4jLoggerFactory]
>  Connecting to jdbc:hive2://xxx:1/prod
>  Connected to: Apache Hive (version 3.1.0.3.1.4.0-315)
>  Driver: Hive JDBC (version 3.1.0.3.1.4.0-315)
>  Transaction isolation: TRANSACTION_REPEATABLE_READ
>  INFO : Compiling 
> command(queryId=hive_20200609033313_ed882b48-7ab4-42a2-84e4-c9ef764271e2): 
> use prod
>  INFO : Concurrency mode is disabled, not creating a lock manager
>  INFO : Semantic Analysis Completed (retrial = false)
>  INFO : Returning Hive schema: Schema(fieldSchemas:null, properties:null)
>  INFO : Completed compiling 
> command(queryId=hive_20200609033313_ed882b48-7ab4-42a2-84e4-c9ef764271e2); 
> Time taken: 0.887 seconds
>  INFO : Concurrency mode is disabled, not creating a lock manager
>  INFO : Executing 
> command(queryId=hive_20200609033313_ed882b48-7ab4-42a2-84e4-c9ef764271e2): 
> use prod
>  INFO : Starting task [Stage-0:DDL] in serial mode
>  INFO : Completed executing 
> command(queryId=hive_20200609033313_ed882b48-7ab4-42a2-84e4-c9ef764271e2); 
> Time taken: 0.197 seconds
>  INFO : OK
>  INFO : Concurrency mode is disabled, not creating a lock manager
>  No rows affected (1.096 seconds)
>  No rows affected (0.004 seconds)
>  INFO : Compiling 
> command(queryId=hive_20200609033314_cba66b08-ad42-4b94-ad61-d15fe48efe23): 
> drop table if exists temp.shawnlee_newbase_devicebase
>  INFO : Concurrency mode is disabled, not creating a lock manager
>  INFO : Semantic Analysis Completed (retrial = false)
>  INFO : Returning Hive schema: Schema(fieldSchemas:null, properties:null)
>  INFO : Completed compiling 
> command(queryId=hive_20200609033314_cba66b08-ad42-4b94-ad61-d15fe48efe23); 
> Time taken: 1.324 seconds
>  INFO : Concurrency mode is disabled, not creating a lock manager
>  INFO : Executing 
> command(queryId=hive_20200609033314_cba66b08-ad42-4b94-ad61-d15fe48efe23): 
> drop table if exists temp.shawnlee_newbase_devicebase
>  INFO : Starting task [Stage-0:DDL] in serial mode
>  INFO : Completed executing 
> command(queryId=hive_20200609033314_cba66b08-ad42-4b94-ad61-d15fe48efe23); 
> Time taken: 12.895 seconds
>  INFO : OK
>  INFO : Concurrency mode is disabled, not creating a lock manager
>  No rows affected (14.229 seconds)
>  INFO : Compiling 
> command(queryId=hive_20200609033329_3fbf0a38-e5b0-4e3a-ae8b-ef95f400b50f): 
> x
>  INFO : Concurrency mode is disabled, not creating a lock manager
>  INFO : No Stats for user_profile@dw_uba_event_daily, Columns: attribute, 
> event
>  INFO : Semantic Analysis Completed (retrial = false)
>  INFO : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:day, 
> type:string, comment:null), FieldSchema(name:device_id, type:string, 
> comment:null), FieldSchema(name:is_new, type:int, comment:null), 
> FieldSchema(name:first_attribute, type:map, comment:null), 
> FieldSchema(name:first_app_version, type:string, comment:null), 
> FieldSchema(name:first_platform_type, type:string, comment:null), 
> FieldSchema(name:first_manufacturer, type:string, comment:null), 
> FieldSchema(name:first_model, type:string, comment:null), 
> FieldSchema(name:first_ipprovince, type:string, comment:null), 
> FieldSchema(name:first_ipcity, type:string, comment:null), 
> FieldSchema(name:last_attribute, type:map, comment:null), 
> FieldSchema(name:last_app_version, type:string, comment:null), 
> FieldSchema(name:last_platform_type, type:string, comment:null), 
> FieldSchema(name:last_manufacturer, type:string, comment:null), 
> FieldSchema(name:last_model, type:string, comment:null), 
> FieldSchema(name:last_ipprovince, type:string, comment:null), 
> FieldSchema(name:last_ipcity, type:string, comment:null)], properties:null)
>  INFO : Completed compiling 
> command(queryId=hive_20200609033329_3fbf0a38-e5b0-4e3a-ae8b-ef95f400b50f); 
> Time taken: 78.5

[jira] [Resolved] (HIVE-23748) tez task with File Merge operator generate tmp file with wrong suffix

2020-06-29 Thread wanguangping (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wanguangping resolved HIVE-23748.
-
Resolution: Fixed

> tez task with File Merge operator generate tmp file with wrong suffix
> -
>
> Key: HIVE-23748
> URL: https://issues.apache.org/jira/browse/HIVE-23748
> Project: Hive
>  Issue Type: Bug
>  Components: Tez
>Affects Versions: 3.1.0
>Reporter: wanguangping
>Priority: Major
>
> h1. background
>  * SQL on TEZ 
>  * it's a Occasional problem
> h1. hiveserver2 log
> SLF4J: Class path contains multiple SLF4J bindings.
>  SLF4J: Found binding in 
> [jar:file:/usr/hdp/3.1.4.0-315/hive/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>  SLF4J: Found binding in 
> [jar:file:/usr/hdp/3.1.4.0-315/hadoop/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>  SLF4J: See [http://www.slf4j.org/codes.html#multiple_bindings] for an 
> explanation.
>  SLF4J: Actual binding is of type 
> [org.apache.logging.slf4j.Log4jLoggerFactory]
>  Connecting to jdbc:hive2://xxx:1/prod
>  Connected to: Apache Hive (version 3.1.0.3.1.4.0-315)
>  Driver: Hive JDBC (version 3.1.0.3.1.4.0-315)
>  Transaction isolation: TRANSACTION_REPEATABLE_READ
>  INFO : Compiling 
> command(queryId=hive_20200609033313_ed882b48-7ab4-42a2-84e4-c9ef764271e2): 
> use prod
>  INFO : Concurrency mode is disabled, not creating a lock manager
>  INFO : Semantic Analysis Completed (retrial = false)
>  INFO : Returning Hive schema: Schema(fieldSchemas:null, properties:null)
>  INFO : Completed compiling 
> command(queryId=hive_20200609033313_ed882b48-7ab4-42a2-84e4-c9ef764271e2); 
> Time taken: 0.887 seconds
>  INFO : Concurrency mode is disabled, not creating a lock manager
>  INFO : Executing 
> command(queryId=hive_20200609033313_ed882b48-7ab4-42a2-84e4-c9ef764271e2): 
> use prod
>  INFO : Starting task [Stage-0:DDL] in serial mode
>  INFO : Completed executing 
> command(queryId=hive_20200609033313_ed882b48-7ab4-42a2-84e4-c9ef764271e2); 
> Time taken: 0.197 seconds
>  INFO : OK
>  INFO : Concurrency mode is disabled, not creating a lock manager
>  No rows affected (1.096 seconds)
>  No rows affected (0.004 seconds)
>  INFO : Compiling 
> command(queryId=hive_20200609033314_cba66b08-ad42-4b94-ad61-d15fe48efe23): 
> drop table if exists temp.shawnlee_newbase_devicebase
>  INFO : Concurrency mode is disabled, not creating a lock manager
>  INFO : Semantic Analysis Completed (retrial = false)
>  INFO : Returning Hive schema: Schema(fieldSchemas:null, properties:null)
>  INFO : Completed compiling 
> command(queryId=hive_20200609033314_cba66b08-ad42-4b94-ad61-d15fe48efe23); 
> Time taken: 1.324 seconds
>  INFO : Concurrency mode is disabled, not creating a lock manager
>  INFO : Executing 
> command(queryId=hive_20200609033314_cba66b08-ad42-4b94-ad61-d15fe48efe23): 
> drop table if exists temp.shawnlee_newbase_devicebase
>  INFO : Starting task [Stage-0:DDL] in serial mode
>  INFO : Completed executing 
> command(queryId=hive_20200609033314_cba66b08-ad42-4b94-ad61-d15fe48efe23); 
> Time taken: 12.895 seconds
>  INFO : OK
>  INFO : Concurrency mode is disabled, not creating a lock manager
>  No rows affected (14.229 seconds)
>  INFO : Compiling 
> command(queryId=hive_20200609033329_3fbf0a38-e5b0-4e3a-ae8b-ef95f400b50f): 
> x
>  INFO : Concurrency mode is disabled, not creating a lock manager
>  INFO : No Stats for user_profile@dw_uba_event_daily, Columns: attribute, 
> event
>  INFO : Semantic Analysis Completed (retrial = false)
>  INFO : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:day, 
> type:string, comment:null), FieldSchema(name:device_id, type:string, 
> comment:null), FieldSchema(name:is_new, type:int, comment:null), 
> FieldSchema(name:first_attribute, type:map, comment:null), 
> FieldSchema(name:first_app_version, type:string, comment:null), 
> FieldSchema(name:first_platform_type, type:string, comment:null), 
> FieldSchema(name:first_manufacturer, type:string, comment:null), 
> FieldSchema(name:first_model, type:string, comment:null), 
> FieldSchema(name:first_ipprovince, type:string, comment:null), 
> FieldSchema(name:first_ipcity, type:string, comment:null), 
> FieldSchema(name:last_attribute, type:map, comment:null), 
> FieldSchema(name:last_app_version, type:string, comment:null), 
> FieldSchema(name:last_platform_type, type:string, comment:null), 
> FieldSchema(name:last_manufacturer, type:string, comment:null), 
> FieldSchema(name:last_model, type:string, comment:null), 
> FieldSchema(name:last_ipprovince, type:string, comment:null), 
> FieldSchema(name:last_ipcity, type:string, comment:null)], properties:null)
>  INFO : Completed compiling 
> command(queryId=hive_20200609033329_3fbf0a38-e5b0-4e3a-ae8b-ef95f400b50f

[jira] [Updated] (HIVE-23611) Mandate fully qualified absolute path for external table base dir during REPL operation

2020-06-29 Thread Pravin Sinha (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23611?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pravin Sinha updated HIVE-23611:

Attachment: HIVE-23611.03.patch

> Mandate fully qualified absolute path for external table base dir during REPL 
> operation
> ---
>
> Key: HIVE-23611
> URL: https://issues.apache.org/jira/browse/HIVE-23611
> Project: Hive
>  Issue Type: Improvement
>Reporter: Pravin Sinha
>Assignee: Pravin Sinha
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23611.01.patch, HIVE-23611.02.patch, 
> HIVE-23611.03.patch
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23779) BasicStatsTask Info is not getting printed in beeline console

2020-06-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23779?focusedWorklogId=452788&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-452788
 ]

ASF GitHub Bot logged work on HIVE-23779:
-

Author: ASF GitHub Bot
Created on: 30/Jun/20 04:22
Start Date: 30/Jun/20 04:22
Worklog Time Spent: 10m 
  Work Description: nareshpr commented on pull request #1191:
URL: https://github.com/apache/hive/pull/1191#issuecomment-651522808


   /retest



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 452788)
Time Spent: 20m  (was: 10m)

> BasicStatsTask Info is not getting printed in beeline console
> -
>
> Key: HIVE-23779
> URL: https://issues.apache.org/jira/browse/HIVE-23779
> Project: Hive
>  Issue Type: Bug
>Reporter: Naresh P R
>Assignee: Naresh P R
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> After HIVE-16061, partition basic stats are not getting printed in beeline 
> console.
> {code:java}
> INFO : Partition {dt=2020-06-29} stats: [numFiles=21, numRows=22, 
> totalSize=14607, rawDataSize=0]{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23779) BasicStatsTask Info is not getting printed in beeline console

2020-06-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23779?focusedWorklogId=452790&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-452790
 ]

ASF GitHub Bot logged work on HIVE-23779:
-

Author: ASF GitHub Bot
Created on: 30/Jun/20 04:26
Start Date: 30/Jun/20 04:26
Worklog Time Spent: 10m 
  Work Description: nareshpr removed a comment on pull request #1191:
URL: https://github.com/apache/hive/pull/1191#issuecomment-651522808


   /retest



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 452790)
Time Spent: 0.5h  (was: 20m)

> BasicStatsTask Info is not getting printed in beeline console
> -
>
> Key: HIVE-23779
> URL: https://issues.apache.org/jira/browse/HIVE-23779
> Project: Hive
>  Issue Type: Bug
>Reporter: Naresh P R
>Assignee: Naresh P R
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> After HIVE-16061, partition basic stats are not getting printed in beeline 
> console.
> {code:java}
> INFO : Partition {dt=2020-06-29} stats: [numFiles=21, numRows=22, 
> totalSize=14607, rawDataSize=0]{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23770) Druid filter translation unable to handle inverted between

2020-06-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23770?focusedWorklogId=452799&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-452799
 ]

ASF GitHub Bot logged work on HIVE-23770:
-

Author: ASF GitHub Bot
Created on: 30/Jun/20 05:10
Start Date: 30/Jun/20 05:10
Worklog Time Spent: 10m 
  Work Description: jcamachor commented on a change in pull request #1190:
URL: https://github.com/apache/hive/pull/1190#discussion_r447411719



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveDruidPushInvertIntoBetweenRule.java
##
@@ -0,0 +1,92 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.ql.optimizer.calcite.rules;
+
+import org.apache.calcite.plan.RelOptRule;
+import org.apache.calcite.plan.RelOptRuleCall;
+import org.apache.calcite.rel.RelNode;
+import org.apache.calcite.rel.core.Filter;
+import org.apache.calcite.rex.RexBuilder;
+import org.apache.calcite.rex.RexCall;
+import org.apache.calcite.rex.RexNode;
+import org.apache.calcite.rex.RexShuttle;
+import org.apache.calcite.rex.RexUtil;
+import org.apache.calcite.sql.SqlKind;
+import org.apache.commons.logging.Log;
+import org.apache.commons.logging.LogFactory;
+import org.apache.hadoop.hive.ql.optimizer.calcite.reloperators.HiveBetween;
+
+/**
+ * This rule is opposite of HiveDruidPullInvertFromBetweenRule
+ * It pushed invert back into Between.
+ */
+public class HiveDruidPushInvertIntoBetweenRule extends RelOptRule {
+
+protected static final Log LOG = 
LogFactory.getLog(HiveDruidPushInvertIntoBetweenRule.class);
+
+public static final HiveDruidPushInvertIntoBetweenRule INSTANCE =
+new HiveDruidPushInvertIntoBetweenRule();
+
+private HiveDruidPushInvertIntoBetweenRule() {
+super(operand(Filter.class, any()));
+}
+
+@Override
+public void onMatch(RelOptRuleCall call) {
+final Filter filter = call.rel(0);
+final RexBuilder rexBuilder = filter.getCluster().getRexBuilder();
+final RexNode condition = RexUtil.pullFactors(rexBuilder, 
filter.getCondition());
+
+RexPullInvertFromBetween t = new RexPullInvertFromBetween(rexBuilder);
+RexNode newCondition = t.apply(condition);
+
+// If we could not transform anything, we bail out
+if (newCondition.toString().equals(condition.toString())) {

Review comment:
   Same comment as in the other rule.

##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveDruidPushInvertIntoBetweenRule.java
##
@@ -0,0 +1,92 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.ql.optimizer.calcite.rules;
+
+import org.apache.calcite.plan.RelOptRule;
+import org.apache.calcite.plan.RelOptRuleCall;
+import org.apache.calcite.rel.RelNode;
+import org.apache.calcite.rel.core.Filter;
+import org.apache.calcite.rex.RexBuilder;
+import org.apache.calcite.rex.RexCall;
+import org.apache.calcite.rex.RexNode;
+import org.apache.calcite.rex.RexShuttle;
+import org.apache.calcite.rex.RexUtil;
+import org.apache.calcite.sql.SqlKind;
+import org.apache.commons.logging.Log;
+import org.apache.commons.logging.LogFactory;
+import org.apache.hadoop.hive.ql.optimizer.calcite.reloperators.HiveBetween;
+
+/**
+ * This rule is opposite of HiveDruidPullInvertFromBetweenRul

[jira] [Work logged] (HIVE-19549) Enable TestAcidOnTez#testCtasTezUnion

2020-06-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-19549?focusedWorklogId=452818&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-452818
 ]

ASF GitHub Bot logged work on HIVE-19549:
-

Author: ASF GitHub Bot
Created on: 30/Jun/20 06:24
Start Date: 30/Jun/20 06:24
Worklog Time Spent: 10m 
  Work Description: kasakrisz merged pull request #1188:
URL: https://github.com/apache/hive/pull/1188


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 452818)
Time Spent: 0.5h  (was: 20m)

> Enable TestAcidOnTez#testCtasTezUnion
> -
>
> Key: HIVE-19549
> URL: https://issues.apache.org/jira/browse/HIVE-19549
> Project: Hive
>  Issue Type: Test
>  Components: Test
>Affects Versions: 3.1.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Krisztian Kasa
>Priority: Critical
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-19549) Enable TestAcidOnTez#testCtasTezUnion

2020-06-29 Thread Krisztian Kasa (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-19549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Kasa resolved HIVE-19549.
---
Resolution: Fixed

Pushed to master. Thank you [~pvary] for review.

> Enable TestAcidOnTez#testCtasTezUnion
> -
>
> Key: HIVE-19549
> URL: https://issues.apache.org/jira/browse/HIVE-19549
> Project: Hive
>  Issue Type: Test
>  Components: Test
>Affects Versions: 3.1.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Krisztian Kasa
>Priority: Critical
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-23751) QTest: Override #mkdirs() method in ProxyFileSystem To Align After HADOOP-16582

2020-06-29 Thread Zoltan Haindrich (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17148349#comment-17148349
 ] 

Zoltan Haindrich commented on HIVE-23751:
-

we can do this for our tests...but I think the HADOOP-16582 change is 
problematic...seems like no one cares about my concern...

+1

> QTest: Override #mkdirs() method in ProxyFileSystem To Align After 
> HADOOP-16582
> ---
>
> Key: HIVE-23751
> URL: https://issues.apache.org/jira/browse/HIVE-23751
> Project: Hive
>  Issue Type: Task
>Reporter: Syed Shameerur Rahman
>Assignee: Syed Shameerur Rahman
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0, 3.2.0
>
> Attachments: HIVE-23751.01.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> HADOOP-16582 have changed the way how mkdirs() work:
> *Before HADOOP-16582:*
> All calls to mkdirs(p) were fast-tracked to FileSystem.mkdirs which were then 
> re-routed to mkdirs(p, permission) method. For ProxyFileSytem the call would 
> look like
> {code:java}
> FileUtiles.mkdir(p)  ->  FileSystem.mkdirs(p) ---> 
> ProxyFileSytem.mkdirs(p,permission)
> {code}
> An implementation of FileSystem have only needed implement mkdirs(p, 
> permission)
> *After HADOOP-16582:*
> Since FilterFileSystem overrides mkdirs(p) method the new call to 
> ProxyFileSystem would look like
> {code:java}
> FileUtiles.mkdir(p) ---> FilterFileSystem.mkdirs(p) -->
> {code}
> This will make all the qtests fails with the below exception 
> {code:java}
> Caused by: java.lang.IllegalArgumentException: Wrong FS: 
> pfile:/media/ebs1/workspace/hive-3.1-qtest/group/5/label/HiveQTest/hive-1.2.0/itests/qtest/target/warehouse/dest1,
>  expected: file:///
> {code}
> Note: We will hit this issue when we bump up hadoop version in hive.
> So as per the discussion in HADOOP-16963 ProxyFileSystem would need to 
> override the mkdirs(p) method inorder to solve the above problem. So now the 
> new flow would look like
> {code:java}
> FileUtiles.mkdir(p)  >   ProxyFileSytem.mkdirs(p) ---> 
> ProxyFileSytem.mkdirs(p, permission) --->
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23772) Relocate calcite-core to prevent NoSuchFiledError

2020-06-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23772?focusedWorklogId=452822&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-452822
 ]

ASF GitHub Bot logged work on HIVE-23772:
-

Author: ASF GitHub Bot
Created on: 30/Jun/20 06:46
Start Date: 30/Jun/20 06:46
Worklog Time Spent: 10m 
  Work Description: kgyrtkirk commented on pull request #1187:
URL: https://github.com/apache/hive/pull/1187#issuecomment-651580913


   iirc the point where the switch to the "non-shaded calcite-core" happens is 
when the jdbc driver is loaded  



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 452822)
Time Spent: 1h  (was: 50m)

> Relocate calcite-core to prevent NoSuchFiledError
> -
>
> Key: HIVE-23772
> URL: https://issues.apache.org/jira/browse/HIVE-23772
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Exception trace due to conflict with {{calcite-core}}
> {noformat}
> Caused by: java.lang.NoSuchFieldError: operands
> at 
> org.apache.hadoop.hive.ql.optimizer.calcite.translator.ASTConverter$RexVisitor.visitCall(ASTConverter.java:785)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.optimizer.calcite.translator.ASTConverter$RexVisitor.visitCall(ASTConverter.java:509)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at org.apache.calcite.rex.RexCall.accept(RexCall.java:191) 
> ~[calcite-core-1.21.0.jar:1.21.0]
> at 
> org.apache.hadoop.hive.ql.optimizer.calcite.translator.ASTConverter.convert(ASTConverter.java:239)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.optimizer.calcite.translator.ASTConverter.convertSource(ASTConverter.java:437)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.optimizer.calcite.translator.ASTConverter.convert(ASTConverter.java:124)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.optimizer.calcite.translator.ASTConverter.convert(ASTConverter.java:112)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.getOptimizedAST(CalcitePlanner.java:1620)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:555)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:12456)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:433)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:290)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:220) 
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:104) 
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:184) 
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:602) 
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:548) 
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:542) 
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.compileAndRespond(ReExecDriver.java:125)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:199)
>  ~[hive-service-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)