date:20200701

[jira] [Assigned] (HIVE-23721) MetaStoreDirectSql.ensureDbInit() need to optimize QuerySQL

2020-07-01 Thread zhangbutao (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhangbutao reassigned HIVE-23721:
-

Assignee: zhangbutao

> MetaStoreDirectSql.ensureDbInit() need to optimize QuerySQL
> ---
>
> Key: HIVE-23721
> URL: https://issues.apache.org/jira/browse/HIVE-23721
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.1.2
> Environment: Hadoop 3.1（1700+ nodes）
> YARN 3.1 （with timelineserver enabled，https enabled)
> Hive 3.1 (15 HS2 instance)
> 6+ YARN Applications every day
>Reporter: YulongZ
>Assignee: zhangbutao
>Priority: Critical
> Attachments: HIVE-23721.01.patch
>
>
> From Hive3.0，catalog added to hivemeta，many schema of metastore added column 
> “catName”，and index for table added column “catName”。
> In MetaStoreDirectSql.ensureDbInit() ，two queries below
> “
>   initQueries.add(pm.newQuery(MTableColumnStatistics.class, "dbName == 
> ''"));
>   initQueries.add(pm.newQuery(MPartitionColumnStatistics.class, "dbName 
> == ''"));
> ”
> should use "catName == ''" instead of "dbName == ''"，because “catName” is the 
> first index column。
> When  data of metastore become large，for example， table of 
> MPartitionColumnStatistics have millions of lines。The 
> “newQuery(MPartitionColumnStatistics.class, "dbName == ''")” for metastore 
> executed very slowly，and the query “show tables“ for hiveserver2 executed 
> very slowly too。



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-23721) MetaStoreDirectSql.ensureDbInit() need to optimize QuerySQL

2020-07-01 Thread zhangbutao (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhangbutao updated HIVE-23721:
--
   Attachment: HIVE-23721.01.patch
Fix Version/s: 4.0.0
   Status: Patch Available  (was: Open)

> MetaStoreDirectSql.ensureDbInit() need to optimize QuerySQL
> ---
>
> Key: HIVE-23721
> URL: https://issues.apache.org/jira/browse/HIVE-23721
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.1.2
> Environment: Hadoop 3.1（1700+ nodes）
> YARN 3.1 （with timelineserver enabled，https enabled)
> Hive 3.1 (15 HS2 instance)
> 6+ YARN Applications every day
>Reporter: YulongZ
>Assignee: zhangbutao
>Priority: Critical
> Fix For: 4.0.0
>
> Attachments: HIVE-23721.01.patch
>
>
> From Hive3.0，catalog added to hivemeta，many schema of metastore added column 
> “catName”，and index for table added column “catName”。
> In MetaStoreDirectSql.ensureDbInit() ，two queries below
> “
>   initQueries.add(pm.newQuery(MTableColumnStatistics.class, "dbName == 
> ''"));
>   initQueries.add(pm.newQuery(MPartitionColumnStatistics.class, "dbName 
> == ''"));
> ”
> should use "catName == ''" instead of "dbName == ''"，because “catName” is the 
> first index column。
> When  data of metastore become large，for example， table of 
> MPartitionColumnStatistics have millions of lines。The 
> “newQuery(MPartitionColumnStatistics.class, "dbName == ''")” for metastore 
> executed very slowly，and the query “show tables“ for hiveserver2 executed 
> very slowly too。



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Comment Edited] (HIVE-23737) LLAP: Reuse dagDelete Feature Of Tez Custom Shuffle Handler Instead Of LLAP's dagDelete

2020-07-01 Thread Syed Shameerur Rahman (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-23737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17149896#comment-17149896
 ] 

Syed Shameerur Rahman edited comment on HIVE-23737 at 7/2/20, 5:32 AM:
---

So now whenever a new feature gets added in Tez, We can simply override that in 
LlapContainerLauncher.java and do the required changes in Shuffle Handler 
(LLAP) to support it.
cc: [~ashutoshc]


was (Author: srahman):
So now whenever a new feature gets added in Tez, We can simply override that in 
LlapContainerLauncher.java and do the required changes in Shuffle Handler 
(LLAP) to support it.

> LLAP: Reuse dagDelete Feature Of Tez Custom Shuffle Handler Instead Of LLAP's 
> dagDelete
> ---
>
> Key: HIVE-23737
> URL: https://issues.apache.org/jira/browse/HIVE-23737
> Project: Hive
>  Issue Type: Improvement
>  Components: llap
>Reporter: Syed Shameerur Rahman
>Assignee: Syed Shameerur Rahman
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23737.01.patch, HIVE-23737.02.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> LLAP have a dagDelete feature added as part of HIVE-9911, But now that Tez 
> have added support for dagDelete in custom shuffle handler (TEZ-3362) we 
> could re-use that feature in LLAP. 
> There are some added advantages of using Tez's dagDelete feature rather than 
> the current LLAP's dagDelete feature.
> 1) We can easily extend this feature to accommodate the upcoming features 
> such as vertex and failed task attempt shuffle data clean up. Refer TEZ-3363 
> and TEZ-4129
> 2) It will be more easier to maintain this feature by separating it out from 
> the Hive's code path. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HIVE-23737) LLAP: Reuse dagDelete Feature Of Tez Custom Shuffle Handler Instead Of LLAP's dagDelete

2020-07-01 Thread Syed Shameerur Rahman (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-23737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17149896#comment-17149896
 ] 

Syed Shameerur Rahman commented on HIVE-23737:
--

So now whenever a new feature gets added in Tez, We can simply override that in 
LlapContainerLauncher.java and do the required changes in Shuffle Handler 
(LLAP) to support it.

> LLAP: Reuse dagDelete Feature Of Tez Custom Shuffle Handler Instead Of LLAP's 
> dagDelete
> ---
>
> Key: HIVE-23737
> URL: https://issues.apache.org/jira/browse/HIVE-23737
> Project: Hive
>  Issue Type: Improvement
>  Components: llap
>Reporter: Syed Shameerur Rahman
>Assignee: Syed Shameerur Rahman
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23737.01.patch, HIVE-23737.02.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> LLAP have a dagDelete feature added as part of HIVE-9911, But now that Tez 
> have added support for dagDelete in custom shuffle handler (TEZ-3362) we 
> could re-use that feature in LLAP. 
> There are some added advantages of using Tez's dagDelete feature rather than 
> the current LLAP's dagDelete feature.
> 1) We can easily extend this feature to accommodate the upcoming features 
> such as vertex and failed task attempt shuffle data clean up. Refer TEZ-3363 
> and TEZ-4129
> 2) It will be more easier to maintain this feature by separating it out from 
> the Hive's code path. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HIVE-13781) Tez Job failed with FileNotFoundException when partition dir doesnt exists

2020-07-01 Thread Peter Vary (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-13781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17149788#comment-17149788
 ] 

Peter Vary commented on HIVE-13781:
---

I would prefer to handle this on execution side. Having an extra check during 
compilation could cause unnecessary delays with normal queries scanning 
multiple S3 files. And this should be a rare edge case where something out of 
hive modified the underlying FS, and which can be repaired with a correct msck 
repair comand.
As a user I would prefer to be notified that something went wrong and would not 
like the system swallow this error.
Thanks, Peter 

> Tez Job failed with FileNotFoundException when partition dir doesnt exists 
> ---
>
> Key: HIVE-13781
> URL: https://issues.apache.org/jira/browse/HIVE-13781
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2, Query Planning
>Affects Versions: 0.14.0, 2.0.0, 3.1.1
>Reporter: Feng Yuan
>Assignee: zhangbutao
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: HIVE-13781.1.patch
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> when i have a partitioned table a with partition "day",in metadata a have 
> partition day: 20160501,20160502,but partition 20160501's dir didnt exits.
> so when i use tez engine to run hive -e "select day,count(*) from a where 
> xx=xx group by day"
> hive throws FileNotFoundException.
> but mr work.
> repo eg:
> CREATE EXTERNAL TABLE `a`(
>   `a` string)
> PARTITIONED BY ( 
>   `l_date` string);
> insert overwrite table a partition(l_date='2016-04-08') values (1),(2);
> insert overwrite table a partition(l_date='2016-04-09') values (1),(2);
> hadoop dfs -rm -r -f /warehouse/a/l_date=2016-04-09
> select l_date,count(*) from a where a='1' group by l_date;
> error:
> ut: a initializer failed, vertex=vertex_1463493135662_10445_1_00 [Map 1], 
> org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: 
> hdfs://bfdhadoopcool/warehouse/test.db/a/l_date=2015-04-09
>   at 
> org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:285)
>   at 
> org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:228)
>   at 
> org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:313)
>   at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:300)
>   at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:402)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:129)
>   at 
> org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:245)
>   at 
> org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:239)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
>   at 
> org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:239)
>   at 
> org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:226)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-23727) Improve SQLOperation log handling when cleanup

2020-07-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23727?focusedWorklogId=453752&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-453752
 ]

ASF GitHub Bot logged work on HIVE-23727:
-

Author: ASF GitHub Bot
Created on: 02/Jul/20 02:41
Start Date: 02/Jul/20 02:41
Worklog Time Spent: 10m 
  Work Description: dengzhhu653 edited a comment on pull request #1149:
URL: https://github.com/apache/hive/pull/1149#issuecomment-648507858


   @belugabehr could you please take a look? thanks



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 453752)
Time Spent: 1h  (was: 50m)

> Improve SQLOperation log handling when cleanup
> --
>
> Key: HIVE-23727
> URL: https://issues.apache.org/jira/browse/HIVE-23727
> Project: Hive
>  Issue Type: Improvement
>Reporter: Zhihua Deng
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> The SQLOperation checks _if (shouldRunAsync() && state != 
> OperationState.CANCELED && state != OperationState.TIMEDOUT)_ to cancel the 
> background task. If true, the state should not be OperationState.CANCELED, so 
> logging under the state == OperationState.CANCELED should never happen.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-23797) Throwing exception when no metastore spec found in zookeeper

2020-07-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-23797:
--
Labels: pull-request-available  (was: )

> Throwing exception when no metastore spec found in zookeeper
> 
>
> Key: HIVE-23797
> URL: https://issues.apache.org/jira/browse/HIVE-23797
> Project: Hive
>  Issue Type: Bug
>Reporter: Zhihua Deng
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> When enable service discovery for metastore, there is a chance that the 
> client may find no metastore uris available in zookeeper, such as during 
> metastores startup or the client wrongly configured the path. This results to 
> redundant retries and finally MetaException with "Unknown exception" message.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-23797) Throwing exception when no metastore spec found in zookeeper

2020-07-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23797?focusedWorklogId=453738&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-453738
 ]

ASF GitHub Bot logged work on HIVE-23797:
-

Author: ASF GitHub Bot
Created on: 02/Jul/20 01:29
Start Date: 02/Jul/20 01:29
Worklog Time Spent: 10m 
  Work Description: dengzhhu653 opened a new pull request #1201:
URL: https://github.com/apache/hive/pull/1201


   ## NOTICE
   
   Please create an issue in ASF JIRA before opening a pull request,
   and you need to set the title of the pull request which starts with
   the corresponding JIRA issue number. (e.g. HIVE-X: Fix a typo in YYY)
   For more details, please see 
https://cwiki.apache.org/confluence/display/Hive/HowToContribute
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 453738)
Remaining Estimate: 0h
Time Spent: 10m

> Throwing exception when no metastore spec found in zookeeper
> 
>
> Key: HIVE-23797
> URL: https://issues.apache.org/jira/browse/HIVE-23797
> Project: Hive
>  Issue Type: Bug
>Reporter: Zhihua Deng
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> When enable service discovery for metastore, there is a chance that the 
> client may find no metastore uris available in zookeeper, such as during 
> metastores startup or the client wrongly configured the path. This results to 
> redundant retries and finally MetaException with "Unknown exception" message.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-3236) allow column names to be prefixed by table alias in select all queries

2020-07-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-3236?focusedWorklogId=453731&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-453731
 ]

ASF GitHub Bot logged work on HIVE-3236:


Author: ASF GitHub Bot
Created on: 02/Jul/20 00:31
Start Date: 02/Jul/20 00:31
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] closed pull request #5:
URL: https://github.com/apache/hive/pull/5


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 453731)
Time Spent: 20m  (was: 10m)

> allow column names to be prefixed by table alias in select all queries
> --
>
> Key: HIVE-3236
> URL: https://issues.apache.org/jira/browse/HIVE-3236
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Affects Versions: 0.9.1, 0.10.0
>Reporter: Keegan Mosley
>Priority: Minor
>  Labels: pull-request-available
> Attachments: HIVE-3236.1.patch.txt
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> When using "CREATE TABLE x AS SELECT ..." where the select joins tables with 
> hundreds of columns it is not a simple task to resolve duplicate column name 
> exceptions (particularly with self-joins). The user must either manually 
> specify aliases for all duplicate columns (potentially hundreds) or write a 
> script to generate the data set in a separate select query, then create the 
> table and load the data in.
> There should be some conf flag that would allow queries like
> "create table joined as select one.\*, two.\* from mytable one join mytable 
> two on (one.duplicate_field = two.duplicate_field1);"
> to create a table with columns one_duplicate_field and two_duplicate_field.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-23347) MSCK REPAIR cannot discover partitions with upper case directory names.

2020-07-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23347?focusedWorklogId=453730&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-453730
 ]

ASF GitHub Bot logged work on HIVE-23347:
-

Author: ASF GitHub Bot
Created on: 02/Jul/20 00:31
Start Date: 02/Jul/20 00:31
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] commented on pull request #1003:
URL: https://github.com/apache/hive/pull/1003#issuecomment-652711587


   This pull request has been automatically marked as stale because it has not 
had recent activity. It will be closed if no further activity occurs.
   Feel free to reach out on the d...@hive.apache.org list if the patch is in 
need of reviews.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 453730)
Time Spent: 20m  (was: 10m)

> MSCK REPAIR cannot discover partitions with upper case directory names.
> ---
>
> Key: HIVE-23347
> URL: https://issues.apache.org/jira/browse/HIVE-23347
> Project: Hive
>  Issue Type: Bug
>  Components: Standalone Metastore
>Affects Versions: 3.1.0
>Reporter: Sankar Hariappan
>Assignee: Adesh Kumar Rao
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: HIVE-23347.01.patch, HIVE-23347.10.patch, 
> HIVE-23347.2.patch, HIVE-23347.3.patch, HIVE-23347.4.patch, 
> HIVE-23347.5.patch, HIVE-23347.6.patch, HIVE-23347.7.patch, 
> HIVE-23347.8.patch, HIVE-23347.9.patch
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> For the following scenario, we expect MSCK REPAIR to discover partitions but 
> it couldn't.
> 1. Have partitioned data path as follows.
> hdfs://mycluster/datapath/t1/Year=2020/Month=03/Day=10
> hdfs://mycluster/datapath/t1/Year=2020/Month=03/Day=11
> 2. create external table t1 (key int, value string) partitioned by (Year int, 
> Month int, Day int) stored as orc location hdfs://mycluster/datapath/t1'';
> 3. msck repair table t1;
> 4. show partitions t1; --> Returns zero partitions
> 5. select * from t1; --> Returns empty data.
> When the partition directory names are changed to lower case, this works fine.
> hdfs://mycluster/datapath/t1/year=2020/month=03/day=10
> hdfs://mycluster/datapath/t1/year=2020/month=03/day=11



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-5596) hive-default.xml.template is invalid

2020-07-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-5596?focusedWorklogId=453729&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-453729
 ]

ASF GitHub Bot logged work on HIVE-5596:


Author: ASF GitHub Bot
Created on: 02/Jul/20 00:30
Start Date: 02/Jul/20 00:30
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] closed pull request #12:
URL: https://github.com/apache/hive/pull/12


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 453729)
Remaining Estimate: 23h 40m  (was: 23h 50m)
Time Spent: 20m  (was: 10m)

> hive-default.xml.template is invalid 
> -
>
> Key: HIVE-5596
> URL: https://issues.apache.org/jira/browse/HIVE-5596
> Project: Hive
>  Issue Type: Bug
>  Components: Configuration
>Affects Versions: 0.12.0
> Environment: OS: Oracle Linux 6
> JDK:1.6
> Hadoop: 2.2.0
>Reporter: Kevin Huang
>Assignee: Kevin Huang
>Priority: Critical
>  Labels: patch, pull-request-available
> Fix For: 0.13.0
>
> Attachments: HIVE-5596.patch
>
>   Original Estimate: 24h
>  Time Spent: 20m
>  Remaining Estimate: 23h 40m
>
> Line 2000:16 in hive-default.xml.template is
> auth
> I think is invalid and it will lead Hive crash if you use this template. The 
> error message is as followed:
> [Fatal Error] hive-site.xml:2000:16: The element type "value" must be 
> terminated by the matching end-tag "".



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HIVE-23363) Upgrade DataNucleus dependency to 5.2

2020-07-01 Thread Ashutosh Chauhan (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-23363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17149685#comment-17149685
 ] 

Ashutosh Chauhan commented on HIVE-23363:
-

+1 LGTM.

> Upgrade DataNucleus dependency to 5.2
> -
>
> Key: HIVE-23363
> URL: https://issues.apache.org/jira/browse/HIVE-23363
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 4.0.0
>Reporter: Zoltan Chovan
>Assignee: Zoltan Chovan
>Priority: Critical
>  Labels: pull-request-available
> Attachments: HIVE-23363.2.patch, HIVE-23363.patch
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> Upgrade Datanucleus from 4.2 to 5.2 as based on it's docs 4.2 has been 
> retired:
> [http://www.datanucleus.org/documentation/products.html]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-22934) Hive server interactive log counters to error stream

2020-07-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-22934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-22934:
--
Labels: pull-request-available  (was: )

> Hive server interactive log counters to error stream
> 
>
> Key: HIVE-22934
> URL: https://issues.apache.org/jira/browse/HIVE-22934
> Project: Hive
>  Issue Type: Bug
>Reporter: Slim Bouguerra
>Assignee: Antal Sinkovits
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-22934.01.patch, HIVE-22934.02.patch, 
> HIVE-22934.03.patch, HIVE-22934.04.patch, HIVE-22934.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Hive server is logging the console output to system error stream.
> This need to be fixed because 
> First we do not roll the file.
> Second writing to such file is done sequential and can lead to throttle/poor 
> perf.
> {code}
> -rw-r--r--  1 hive hadoop 9.5G Feb 26 17:22 hive-server2-interactive.err
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-22934) Hive server interactive log counters to error stream

2020-07-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-22934?focusedWorklogId=453618&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-453618
 ]

ASF GitHub Bot logged work on HIVE-22934:
-

Author: ASF GitHub Bot
Created on: 01/Jul/20 19:27
Start Date: 01/Jul/20 19:27
Worklog Time Spent: 10m 
  Work Description: ramesh0201 opened a new pull request #1200:
URL: https://github.com/apache/hive/pull/1200


   ## NOTICE
   
   Please create an issue in ASF JIRA before opening a pull request,
   and you need to set the title of the pull request which starts with
   the corresponding JIRA issue number. (e.g. HIVE-X: Fix a typo in YYY)
   For more details, please see 
https://cwiki.apache.org/confluence/display/Hive/HowToContribute
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 453618)
Remaining Estimate: 0h
Time Spent: 10m

> Hive server interactive log counters to error stream
> 
>
> Key: HIVE-22934
> URL: https://issues.apache.org/jira/browse/HIVE-22934
> Project: Hive
>  Issue Type: Bug
>Reporter: Slim Bouguerra
>Assignee: Antal Sinkovits
>Priority: Major
> Attachments: HIVE-22934.01.patch, HIVE-22934.02.patch, 
> HIVE-22934.03.patch, HIVE-22934.04.patch, HIVE-22934.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Hive server is logging the console output to system error stream.
> This need to be fixed because 
> First we do not roll the file.
> Second writing to such file is done sequential and can lead to throttle/poor 
> perf.
> {code}
> -rw-r--r--  1 hive hadoop 9.5G Feb 26 17:22 hive-server2-interactive.err
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-23363) Upgrade DataNucleus dependency to 5.2

2020-07-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23363?focusedWorklogId=453612&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-453612
 ]

ASF GitHub Bot logged work on HIVE-23363:
-

Author: ASF GitHub Bot
Created on: 01/Jul/20 19:14
Start Date: 01/Jul/20 19:14
Worklog Time Spent: 10m 
  Work Description: belugabehr edited a comment on pull request #1118:
URL: https://github.com/apache/hive/pull/1118#issuecomment-652546609


   @ashutoshc Let me see if I can address all of your questions with some 
background and context.  It took me a long time to get these changes to pass 
the unit tests.
   
   So, these mappings, in some respect, don't really matter.  When HMS is 
started, users use the `schema-tool` to create the HMS schema for real.  Some 
of these mappings in the `jdo` file (like indexes) are only applied when unit 
testing because the unit tests build the schema via DN and 
`datanucleus.schema.autoCreateAll`.  For unit testing, the database backend is 
Apache Derby.  I changed the name of the index to match the Derby schema more 
closely.  In trying to debug these various errors, I was very confused at first 
about it complaining about "COLUMNS_PK".
   
   
https://github.com/apache/hive/blob/4942a7c0b4be3a5b0c889a89b903e9a70c57d494/standalone-metastore/metastore-server/src/main/sql/derby/hive-schema-4.0.0.derby.sql#L364
   
   With that said, when I upgraded to DN 5.x, the unit tests would not pass.  I 
narrowed the issue down to this one table definition.  I tried several 
iterations to get success, but this is the one that worked.  I derived this 
solution by closely examining the docs on this topic.  It has an example that 
very closely aligns with this use case:
   
   
http://www.datanucleus.org/products/accessplatform/jdo/mapping.html#embedded_collection
   
   It is a bit of a wonder looking at the existing JDO definition how this ever 
worked.
   
   ```
 
   
 
   ```
   
   This is not correct, this should be a compound primary key of CD_ID *and* 
COLUMN_NAME.  This exact scenario is covered in the second half of:
   
   
http://www.datanucleus.org/products/accessplatform/jdo/mapping.html#embedded_collection
   
   
   In the official schema (hive-schema-4.0.0.derby.sql), the primary key is 
enforced by the `SQL110922153006740` index.  As things currently stand, the 
COLUMN_NAME definition in the `jdo` file says that the COLUMN_NAME is not 
defined to be non-null.  This caused an error with Derby as it didn't allow 
creating a PRIMARY KEY on a field that could be null.
   
   So, putting it all together, I came to the current solution.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 453612)
Time Spent: 1h 40m  (was: 1.5h)

> Upgrade DataNucleus dependency to 5.2
> -
>
> Key: HIVE-23363
> URL: https://issues.apache.org/jira/browse/HIVE-23363
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 4.0.0
>Reporter: Zoltan Chovan
>Assignee: Zoltan Chovan
>Priority: Critical
>  Labels: pull-request-available
> Attachments: HIVE-23363.2.patch, HIVE-23363.patch
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> Upgrade Datanucleus from 4.2 to 5.2 as based on it's docs 4.2 has been 
> retired:
> [http://www.datanucleus.org/documentation/products.html]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-23363) Upgrade DataNucleus dependency to 5.2

2020-07-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23363?focusedWorklogId=453611&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-453611
 ]

ASF GitHub Bot logged work on HIVE-23363:
-

Author: ASF GitHub Bot
Created on: 01/Jul/20 19:11
Start Date: 01/Jul/20 19:11
Worklog Time Spent: 10m 
  Work Description: belugabehr edited a comment on pull request #1118:
URL: https://github.com/apache/hive/pull/1118#issuecomment-652546609


   @ashutoshc Let me see if I can address all of your questions with some 
background and context.  It took me a long time to get these changes to pass 
the unit tests.
   
   So, these mappings, in some respect, don't really matter.  When HMS is 
started, users use the `schema-tool` to create the HMS schema for real.  Some 
of these mappings in the `jdo` file (like indexes) are only applied when unit 
testing because the unit tests build the schema via DN and 
`datanucleus.schema.autoCreateAll`.  For unit testing, the database backend is 
Apache Derby.  I changed the name of the index to match the Derby schema more 
closely.  In trying to debug these various errors, I was very confused at first 
about it complaining about "COLUMNS_PK".
   
   
https://github.com/apache/hive/blob/4942a7c0b4be3a5b0c889a89b903e9a70c57d494/standalone-metastore/metastore-server/src/main/sql/derby/hive-schema-4.0.0.derby.sql#L364
   
   With that said, when I upgraded to DN 5.x, the unit tests would not pass.  I 
narrowed the issue down to this one table definition.  I tried several 
iterations to get success, but this is the one that worked.  I derived this 
solution by closely examining the docs on this topic.  It has an example that 
very closely aligns with this use case:
   
   
http://www.datanucleus.org/products/accessplatform/jdo/mapping.html#embedded_collection
   
   It is a bit of a wonder looking at the existing JDO definition how this ever 
worked.
   
   ```
 
   
 
   ```
   
   This is not correct, this should be a compound primary key of CD_ID *and* 
COLUMN_NAME.  This is enforced by `SQL110922153006740` in the full schema.  As 
things currently stand, the COLUMN_NAME definition in the `jdo` file says that 
the COLUMN_NAME is not defined to be non-null.  This caused an error with Derby 
as it didn't allow creating a PRIMARY KEY on a field that could be null.
   
   So, putting it all together, I came to the current solution.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 453611)
Time Spent: 1.5h  (was: 1h 20m)

> Upgrade DataNucleus dependency to 5.2
> -
>
> Key: HIVE-23363
> URL: https://issues.apache.org/jira/browse/HIVE-23363
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 4.0.0
>Reporter: Zoltan Chovan
>Assignee: Zoltan Chovan
>Priority: Critical
>  Labels: pull-request-available
> Attachments: HIVE-23363.2.patch, HIVE-23363.patch
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Upgrade Datanucleus from 4.2 to 5.2 as based on it's docs 4.2 has been 
> retired:
> [http://www.datanucleus.org/documentation/products.html]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-23363) Upgrade DataNucleus dependency to 5.2

2020-07-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23363?focusedWorklogId=453610&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-453610
 ]

ASF GitHub Bot logged work on HIVE-23363:
-

Author: ASF GitHub Bot
Created on: 01/Jul/20 19:10
Start Date: 01/Jul/20 19:10
Worklog Time Spent: 10m 
  Work Description: belugabehr commented on a change in pull request #1118:
URL: https://github.com/apache/hive/pull/1118#discussion_r448563837



##
File path: standalone-metastore/metastore-server/src/main/resources/package.jdo
##
@@ -345,20 +345,20 @@
   
 
   
-  

Review comment:
   Just following the directions here:
   
   
http://www.datanucleus.org/products/accessplatform/jdo/mapping.html#embedded_collection





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 453610)
Time Spent: 1h 20m  (was: 1h 10m)

> Upgrade DataNucleus dependency to 5.2
> -
>
> Key: HIVE-23363
> URL: https://issues.apache.org/jira/browse/HIVE-23363
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 4.0.0
>Reporter: Zoltan Chovan
>Assignee: Zoltan Chovan
>Priority: Critical
>  Labels: pull-request-available
> Attachments: HIVE-23363.2.patch, HIVE-23363.patch
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Upgrade Datanucleus from 4.2 to 5.2 as based on it's docs 4.2 has been 
> retired:
> [http://www.datanucleus.org/documentation/products.html]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-23363) Upgrade DataNucleus dependency to 5.2

2020-07-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23363?focusedWorklogId=453602&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-453602
 ]

ASF GitHub Bot logged work on HIVE-23363:
-

Author: ASF GitHub Bot
Created on: 01/Jul/20 19:04
Start Date: 01/Jul/20 19:04
Worklog Time Spent: 10m 
  Work Description: belugabehr commented on a change in pull request #1118:
URL: https://github.com/apache/hive/pull/1118#discussion_r448560561



##
File path: standalone-metastore/metastore-server/src/main/resources/package.jdo
##
@@ -345,20 +345,20 @@
   
 
   
-  

Review comment:
   Changed to:
   ```
   
   ```





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 453602)
Time Spent: 1h 10m  (was: 1h)

> Upgrade DataNucleus dependency to 5.2
> -
>
> Key: HIVE-23363
> URL: https://issues.apache.org/jira/browse/HIVE-23363
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 4.0.0
>Reporter: Zoltan Chovan
>Assignee: Zoltan Chovan
>Priority: Critical
>  Labels: pull-request-available
> Attachments: HIVE-23363.2.patch, HIVE-23363.patch
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Upgrade Datanucleus from 4.2 to 5.2 as based on it's docs 4.2 has been 
> retired:
> [http://www.datanucleus.org/documentation/products.html]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-23363) Upgrade DataNucleus dependency to 5.2

2020-07-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23363?focusedWorklogId=453599&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-453599
 ]

ASF GitHub Bot logged work on HIVE-23363:
-

Author: ASF GitHub Bot
Created on: 01/Jul/20 18:59
Start Date: 01/Jul/20 18:59
Worklog Time Spent: 10m 
  Work Description: belugabehr commented on a change in pull request #1118:
URL: https://github.com/apache/hive/pull/1118#discussion_r448558398



##
File path: standalone-metastore/metastore-server/src/main/resources/package.jdo
##
@@ -345,20 +345,20 @@
   
 
   
-  
+  
 
-

Review comment:
   ```
   If a foreign-key is specified (in MetaData) for the relation field then 
leave any deletion to the datastore to perform
   ``` 
   I don't see any such relationship cascading relationship defined in the 
schema for Derby or MySQL, so DN should be doing it.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 453599)
Time Spent: 1h  (was: 50m)

> Upgrade DataNucleus dependency to 5.2
> -
>
> Key: HIVE-23363
> URL: https://issues.apache.org/jira/browse/HIVE-23363
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 4.0.0
>Reporter: Zoltan Chovan
>Assignee: Zoltan Chovan
>Priority: Critical
>  Labels: pull-request-available
> Attachments: HIVE-23363.2.patch, HIVE-23363.patch
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Upgrade Datanucleus from 4.2 to 5.2 as based on it's docs 4.2 has been 
> retired:
> [http://www.datanucleus.org/documentation/products.html]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-23791) Optimize ACID stats generation

2020-07-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23791?focusedWorklogId=453565&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-453565
 ]

ASF GitHub Bot logged work on HIVE-23791:
-

Author: ASF GitHub Bot
Created on: 01/Jul/20 17:56
Start Date: 01/Jul/20 17:56
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #1196:
URL: https://github.com/apache/hive/pull/1196#discussion_r448527274



##
File path: ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java
##
@@ -2614,28 +2633,25 @@ public static Path getVersionFilePath(Path deltaOrBase) 
{
   + " from " + jc.get(ValidTxnWriteIdList.VALID_TABLES_WRITEIDS_KEY));
   return null;
 }
-Directory acidInfo = AcidUtils.getAcidState(fs, dir, jc, idList, null, 
false);
+if (fs == null) {
+  fs = dir.getFileSystem(jc);
+}
+// Collect the all of the files/dirs
+Map hdfsDirSnapshots = 
AcidUtils.getHdfsDirSnapshots(fs, dir);

Review comment:
   Ohh.. I think I get it now.
   * You are right that this will do stuff which is not really needed in this 
case - namely creating objects which are not needed at here 
(dirSnapshot.metaDataFile/dirSnapshot.acidFormatFile), also we might list and 
create objects which are not needed in this snapshot. On the other hand the 
costly part on S3 (and on HDFS as well) is the number of remote calls, which is 
reduced to a single listing instead of doing the listing for every directory 
1-by-1.
   * It is not possible that it does not scan some location which needed. If 
this happens then this is a bug in AcidUtils.getAcidState, as it has to return 
every directory which is readable
   
   What I do not understand in your comment is "this method would return a 
something (it could still be a map) which could fill in stuff from hdfs if its 
not cached already" - the main thing we would like to avoid here is the need of 
reading the HDFS again and again. The only way to realize that something is 
missing is reading the directory again... or I miss something :)
   





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 453565)
Time Spent: 40m  (was: 0.5h)

> Optimize ACID stats generation
> --
>
> Key: HIVE-23791
> URL: https://issues.apache.org/jira/browse/HIVE-23791
> Project: Hive
>  Issue Type: Improvement
>  Components: Statistics, Transactions
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Currently basic stats generation uses file listing for getting statistics, 
> and also uses a file listing for getting the acid state. We should optimize 
> this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (HIVE-23794) HiveConnection.rollback always throws a "Method not supported" exception

2020-07-01 Thread Amol Dixit (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amol Dixit reassigned HIVE-23794:
-

Assignee: Amol Dixit

> HiveConnection.rollback always throws a "Method not supported" exception
> 
>
> Key: HIVE-23794
> URL: https://issues.apache.org/jira/browse/HIVE-23794
> Project: Hive
>  Issue Type: Bug
>Reporter: Amol Dixit
>Assignee: Amol Dixit
>Priority: Major
>
> HiveConnection.rollback automatically generated implementation always throws 
> a generic "Method not supported" exception and thus is not compliant with the 
> JDBC spec. For HiveConnection autoCommit mode is always on and this 
> connection do not allow to set the autoCommit mode to false. If setAutoCommit 
> is called and the auto-commit mode is not changed, the call is a no-op.
> Per JDBC spec, an exception can be thrown only if the connection is closed, 
> DB access error occurs or the method is called during a transaction (which is 
> not a case for HiveConnection).
> JDBC spec does not say a word about not supporting the method by the driver. 
> The most correct behavior could be to throw only if the request tries to 
> explicitly call rollback (as HiveConnection.getAutoCommit always returns true 
> and setAutoCommit call is no-op).
> This issue is a blocker for JDBC connection pools (i.e. HikariCP) that expect 
> JDBC-compliant behavior from the driver.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-23363) Upgrade DataNucleus dependency to 5.2

2020-07-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23363?focusedWorklogId=453544&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-453544
 ]

ASF GitHub Bot logged work on HIVE-23363:
-

Author: ASF GitHub Bot
Created on: 01/Jul/20 17:18
Start Date: 01/Jul/20 17:18
Worklog Time Spent: 10m 
  Work Description: belugabehr commented on pull request #1118:
URL: https://github.com/apache/hive/pull/1118#issuecomment-652546609


   @ashutoshc Let me see if I can address all of your questions with some 
background and context.  It took me a long time to get these changes to pass 
the unit tests.
   
   So, these mappings, in some respect, don't really matter.  When HMS is 
started, users use the `schema-tool` to create the HMS schema for real.  Some 
of these mappings in the `jdo` file (like indexes) are only applied when unit 
testing because the unit tests build the schema via DN and 
`datanucleus.schema.autoCreateAll`.  For unit testing, the database backend is 
Apache Derby.  I changed the name of the index to match the Derby schema more 
closely.  In trying to debug these various errors, I was very confused at first 
about it complaining about "COLUMNS_PK".
   
   
https://github.com/apache/hive/blob/4942a7c0b4be3a5b0c889a89b903e9a70c57d494/standalone-metastore/metastore-server/src/main/sql/derby/hive-schema-4.0.0.derby.sql#L364
   
   With that said, when I upgraded to DN 5.x, the unit tests would not pass.  I 
narrowed the issue down to this one table definition.  I tried several 
iterations to get success, but this is the one that worked.  I derived this 
solution by closely examining the docs on this topic.  It has an example that 
very closely aligns with this use case:
   
   
http://www.datanucleus.org/products/accessplatform/jpa/mapping.html#embedded_collection
   
   It is a bit of a wonder looking at the existing JDO definition how this ever 
worked.
   
   ```
 
   
 
   ```
   
   This is not correct, this should be a compound primary key of CD_ID *and* 
COLUMN_NAME.  This is enforced by `SQL110922153006740` in the full schema.  As 
things currently stand, the COLUMN_NAME definition in the `jdo` file says that 
the COLUMN_NAME is not defined to be non-null.  This caused an error with Derby 
as it didn't allow creating a PRIMARY KEY on a field that could be null.
   
   So, putting it all together, I came to the current solution.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 453544)
Time Spent: 50m  (was: 40m)

> Upgrade DataNucleus dependency to 5.2
> -
>
> Key: HIVE-23363
> URL: https://issues.apache.org/jira/browse/HIVE-23363
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 4.0.0
>Reporter: Zoltan Chovan
>Assignee: Zoltan Chovan
>Priority: Critical
>  Labels: pull-request-available
> Attachments: HIVE-23363.2.patch, HIVE-23363.patch
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Upgrade Datanucleus from 4.2 to 5.2 as based on it's docs 4.2 has been 
> retired:
> [http://www.datanucleus.org/documentation/products.html]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HIVE-23388) CTAS queries should use target's location for staging.

2020-07-01 Thread Naveen Gangam (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-23388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17149567#comment-17149567
 ] 

Naveen Gangam commented on HIVE-23388:
--

[~samuelan] [~jcamachorodriguez] Could you please review? Thank you

> CTAS queries should use target's location for staging.
> --
>
> Key: HIVE-23388
> URL: https://issues.apache.org/jira/browse/HIVE-23388
> Project: Hive
>  Issue Type: Sub-task
>  Components: HiveServer2
>Affects Versions: 4.0.0
>Reporter: Naveen Gangam
>Assignee: Naveen Gangam
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> In cloud based storage systems, renaming files across different root level 
> buckets seem to be disallowed. The S3AFileSystem throws the following 
> exception. This appears to be bug in S3FS impl.
> Failed with exception Wrong FS 
> s3a://hive-managed/clusters/env-x/warehouse--/warehouse/tablespace/managed/hive/tpch.db/customer/delta_001_001_
>  -expected s3a://hive-external
> 2020-04-27T19:34:27,573 INFO  [Thread-6] jdbc.TestDriver: 
> java.lang.IllegalArgumentException: Wrong FS 
> s3a://hive-managed//clusters/env-/warehouse--/warehouse/tablespace/managed/hive/tpch.db/customer/delta_001_001_
>  -expected s3a://hive-external
> But we should fix our query plans to use the target table's directory for 
> staging as well. That should resolve this issue and it is the right thing to 
> do as well (in case there are different encryption zones/keys for these 
> buckets).
> Fix in HIVE-22995 probably changed this behavior.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (HIVE-23722) Show the operation's drilldown link to client

2020-07-01 Thread Zoltan Haindrich (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich resolved HIVE-23722.
-
  Assignee: Zhihua Deng
Resolution: Fixed

pushed to master. Thank you [~dengzh]!

> Show the operation's drilldown link to client
> -
>
> Key: HIVE-23722
> URL: https://issues.apache.org/jira/browse/HIVE-23722
> Project: Hive
>  Issue Type: Improvement
>Reporter: Zhihua Deng
>Assignee: Zhihua Deng
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> Now the HiveServer2 webui provides a drilldown link for many collected 
> metrics or messages about a operation, but it's not easy for a end user to 
> find the target url of his submitted query. Less knowledge on the deployment, 
> HA based environment, and the multiple running queries can make things more 
> difficult. The jira provides a way to show the link to the interested end 
> user when enabled.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-23722) Show the operation's drilldown link to client

2020-07-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23722?focusedWorklogId=453518&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-453518
 ]

ASF GitHub Bot logged work on HIVE-23722:
-

Author: ASF GitHub Bot
Created on: 01/Jul/20 16:42
Start Date: 01/Jul/20 16:42
Worklog Time Spent: 10m 
  Work Description: kgyrtkirk merged pull request #1145:
URL: https://github.com/apache/hive/pull/1145


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 453518)
Time Spent: 1h 40m  (was: 1.5h)

> Show the operation's drilldown link to client
> -
>
> Key: HIVE-23722
> URL: https://issues.apache.org/jira/browse/HIVE-23722
> Project: Hive
>  Issue Type: Improvement
>Reporter: Zhihua Deng
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> Now the HiveServer2 webui provides a drilldown link for many collected 
> metrics or messages about a operation, but it's not easy for a end user to 
> find the target url of his submitted query. Less knowledge on the deployment, 
> HA based environment, and the multiple running queries can make things more 
> difficult. The jira provides a way to show the link to the interested end 
> user when enabled.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-23751) QTest: Override #mkdirs() method in ProxyFileSystem To Align After HADOOP-16582

2020-07-01 Thread Zoltan Haindrich (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich updated HIVE-23751:

Resolution: Fixed
Status: Resolved  (was: Patch Available)

pushed to master. Thank you [~srahman]!

> QTest: Override #mkdirs() method in ProxyFileSystem To Align After 
> HADOOP-16582
> ---
>
> Key: HIVE-23751
> URL: https://issues.apache.org/jira/browse/HIVE-23751
> Project: Hive
>  Issue Type: Task
>Reporter: Syed Shameerur Rahman
>Assignee: Syed Shameerur Rahman
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0, 3.2.0
>
> Attachments: HIVE-23751.01.patch
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> HADOOP-16582 have changed the way how mkdirs() work:
> *Before HADOOP-16582:*
> All calls to mkdirs(p) were fast-tracked to FileSystem.mkdirs which were then 
> re-routed to mkdirs(p, permission) method. For ProxyFileSytem the call would 
> look like
> {code:java}
> FileUtiles.mkdir(p)  ->  FileSystem.mkdirs(p) ---> 
> ProxyFileSytem.mkdirs(p,permission)
> {code}
> An implementation of FileSystem have only needed implement mkdirs(p, 
> permission)
> *After HADOOP-16582:*
> Since FilterFileSystem overrides mkdirs(p) method the new call to 
> ProxyFileSystem would look like
> {code:java}
> FileUtiles.mkdir(p) ---> FilterFileSystem.mkdirs(p) -->
> {code}
> This will make all the qtests fails with the below exception 
> {code:java}
> Caused by: java.lang.IllegalArgumentException: Wrong FS: 
> pfile:/media/ebs1/workspace/hive-3.1-qtest/group/5/label/HiveQTest/hive-1.2.0/itests/qtest/target/warehouse/dest1,
>  expected: file:///
> {code}
> Note: We will hit this issue when we bump up hadoop version in hive.
> So as per the discussion in HADOOP-16963 ProxyFileSystem would need to 
> override the mkdirs(p) method inorder to solve the above problem. So now the 
> new flow would look like
> {code:java}
> FileUtiles.mkdir(p)  >   ProxyFileSytem.mkdirs(p) ---> 
> ProxyFileSytem.mkdirs(p, permission) --->
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-23751) QTest: Override #mkdirs() method in ProxyFileSystem To Align After HADOOP-16582

2020-07-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23751?focusedWorklogId=453517&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-453517
 ]

ASF GitHub Bot logged work on HIVE-23751:
-

Author: ASF GitHub Bot
Created on: 01/Jul/20 16:40
Start Date: 01/Jul/20 16:40
Worklog Time Spent: 10m 
  Work Description: kgyrtkirk merged pull request #1167:
URL: https://github.com/apache/hive/pull/1167


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 453517)
Time Spent: 40m  (was: 0.5h)

> QTest: Override #mkdirs() method in ProxyFileSystem To Align After 
> HADOOP-16582
> ---
>
> Key: HIVE-23751
> URL: https://issues.apache.org/jira/browse/HIVE-23751
> Project: Hive
>  Issue Type: Task
>Reporter: Syed Shameerur Rahman
>Assignee: Syed Shameerur Rahman
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0, 3.2.0
>
> Attachments: HIVE-23751.01.patch
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> HADOOP-16582 have changed the way how mkdirs() work:
> *Before HADOOP-16582:*
> All calls to mkdirs(p) were fast-tracked to FileSystem.mkdirs which were then 
> re-routed to mkdirs(p, permission) method. For ProxyFileSytem the call would 
> look like
> {code:java}
> FileUtiles.mkdir(p)  ->  FileSystem.mkdirs(p) ---> 
> ProxyFileSytem.mkdirs(p,permission)
> {code}
> An implementation of FileSystem have only needed implement mkdirs(p, 
> permission)
> *After HADOOP-16582:*
> Since FilterFileSystem overrides mkdirs(p) method the new call to 
> ProxyFileSystem would look like
> {code:java}
> FileUtiles.mkdir(p) ---> FilterFileSystem.mkdirs(p) -->
> {code}
> This will make all the qtests fails with the below exception 
> {code:java}
> Caused by: java.lang.IllegalArgumentException: Wrong FS: 
> pfile:/media/ebs1/workspace/hive-3.1-qtest/group/5/label/HiveQTest/hive-1.2.0/itests/qtest/target/warehouse/dest1,
>  expected: file:///
> {code}
> Note: We will hit this issue when we bump up hadoop version in hive.
> So as per the discussion in HADOOP-16963 ProxyFileSystem would need to 
> override the mkdirs(p) method inorder to solve the above problem. So now the 
> new flow would look like
> {code:java}
> FileUtiles.mkdir(p)  >   ProxyFileSytem.mkdirs(p) ---> 
> ProxyFileSytem.mkdirs(p, permission) --->
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-23791) Optimize ACID stats generation

2020-07-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23791?focusedWorklogId=453516&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-453516
 ]

ASF GitHub Bot logged work on HIVE-23791:
-

Author: ASF GitHub Bot
Created on: 01/Jul/20 16:36
Start Date: 01/Jul/20 16:36
Worklog Time Spent: 10m 
  Work Description: kgyrtkirk commented on a change in pull request #1196:
URL: https://github.com/apache/hive/pull/1196#discussion_r448482057



##
File path: ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java
##
@@ -2614,28 +2633,25 @@ public static Path getVersionFilePath(Path deltaOrBase) 
{
   + " from " + jc.get(ValidTxnWriteIdList.VALID_TABLES_WRITEIDS_KEY));
   return null;
 }
-Directory acidInfo = AcidUtils.getAcidState(fs, dir, jc, idList, null, 
false);
+if (fs == null) {
+  fs = dir.getFileSystem(jc);
+}
+// Collect the all of the files/dirs
+Map hdfsDirSnapshots = 
AcidUtils.getHdfsDirSnapshots(fs, dir);

Review comment:
   this might be out-of-scope for this change: but this *static* method in 
`AcidUtils` is trying to do all the work upfront...
   which might lead to:
   * that it does work which is not even needed
   * it doesn't scan some location - and the map just returns null ; so it 
might be not noticable
   
   I think it would be better if this method would return a something (it could 
still be a map) which could fill in stuff from hdfs if its not cached already...





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 453516)
Time Spent: 0.5h  (was: 20m)

> Optimize ACID stats generation
> --
>
> Key: HIVE-23791
> URL: https://issues.apache.org/jira/browse/HIVE-23791
> Project: Hive
>  Issue Type: Improvement
>  Components: Statistics, Transactions
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Currently basic stats generation uses file listing for getting statistics, 
> and also uses a file listing for getting the acid state. We should optimize 
> this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-23751) QTest: Override #mkdirs() method in ProxyFileSystem To Align After HADOOP-16582

2020-07-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23751?focusedWorklogId=453514&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-453514
 ]

ASF GitHub Bot logged work on HIVE-23751:
-

Author: ASF GitHub Bot
Created on: 01/Jul/20 16:35
Start Date: 01/Jul/20 16:35
Worklog Time Spent: 10m 
  Work Description: shameersss1 commented on pull request #1167:
URL: https://github.com/apache/hive/pull/1167#issuecomment-652524782


   @kgyrtkirk I am Okay with "@users.noreply.github.com" Please continue the 
merge!
   Thank You! for the review.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 453514)
Time Spent: 0.5h  (was: 20m)

> QTest: Override #mkdirs() method in ProxyFileSystem To Align After 
> HADOOP-16582
> ---
>
> Key: HIVE-23751
> URL: https://issues.apache.org/jira/browse/HIVE-23751
> Project: Hive
>  Issue Type: Task
>Reporter: Syed Shameerur Rahman
>Assignee: Syed Shameerur Rahman
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0, 3.2.0
>
> Attachments: HIVE-23751.01.patch
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> HADOOP-16582 have changed the way how mkdirs() work:
> *Before HADOOP-16582:*
> All calls to mkdirs(p) were fast-tracked to FileSystem.mkdirs which were then 
> re-routed to mkdirs(p, permission) method. For ProxyFileSytem the call would 
> look like
> {code:java}
> FileUtiles.mkdir(p)  ->  FileSystem.mkdirs(p) ---> 
> ProxyFileSytem.mkdirs(p,permission)
> {code}
> An implementation of FileSystem have only needed implement mkdirs(p, 
> permission)
> *After HADOOP-16582:*
> Since FilterFileSystem overrides mkdirs(p) method the new call to 
> ProxyFileSystem would look like
> {code:java}
> FileUtiles.mkdir(p) ---> FilterFileSystem.mkdirs(p) -->
> {code}
> This will make all the qtests fails with the below exception 
> {code:java}
> Caused by: java.lang.IllegalArgumentException: Wrong FS: 
> pfile:/media/ebs1/workspace/hive-3.1-qtest/group/5/label/HiveQTest/hive-1.2.0/itests/qtest/target/warehouse/dest1,
>  expected: file:///
> {code}
> Note: We will hit this issue when we bump up hadoop version in hive.
> So as per the discussion in HADOOP-16963 ProxyFileSystem would need to 
> override the mkdirs(p) method inorder to solve the above problem. So now the 
> new flow would look like
> {code:java}
> FileUtiles.mkdir(p)  >   ProxyFileSytem.mkdirs(p) ---> 
> ProxyFileSytem.mkdirs(p, permission) --->
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-23363) Upgrade DataNucleus dependency to 5.2

2020-07-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23363?focusedWorklogId=453508&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-453508
 ]

ASF GitHub Bot logged work on HIVE-23363:
-

Author: ASF GitHub Bot
Created on: 01/Jul/20 16:31
Start Date: 01/Jul/20 16:31
Worklog Time Spent: 10m 
  Work Description: ashutoshc commented on a change in pull request #1118:
URL: https://github.com/apache/hive/pull/1118#discussion_r448480980



##
File path: standalone-metastore/metastore-server/src/main/resources/package.jdo
##
@@ -345,20 +345,20 @@
   
 
   
-  

Review comment:
   Can you describe the need for this change?

##
File path: standalone-metastore/metastore-server/src/main/resources/package.jdo
##
@@ -345,20 +345,20 @@
   
 
   
-  
+  
 
-
 
-  

Review comment:
   Any reason to change the name here?

##
File path: standalone-metastore/metastore-server/src/main/resources/package.jdo
##
@@ -345,20 +345,20 @@
   
 
   
-  
+  
 
-

Review comment:
   We do want cascade-delete here. Any reason to remove it?

##
File path: standalone-metastore/metastore-server/src/main/resources/package.jdo
##
@@ -345,20 +345,20 @@
   
 
   
-  
+  
 
-
 
-  
+  
+

Review comment:
   This probably is fine to do. Though, was it necessary?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 453508)
Time Spent: 40m  (was: 0.5h)

> Upgrade DataNucleus dependency to 5.2
> -
>
> Key: HIVE-23363
> URL: https://issues.apache.org/jira/browse/HIVE-23363
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 4.0.0
>Reporter: Zoltan Chovan
>Assignee: Zoltan Chovan
>Priority: Critical
>  Labels: pull-request-available
> Attachments: HIVE-23363.2.patch, HIVE-23363.patch
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Upgrade Datanucleus from 4.2 to 5.2 as based on it's docs 4.2 has been 
> retired:
> [http://www.datanucleus.org/documentation/products.html]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-23795) Add Additional Debugging Help for Import SQL

2020-07-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-23795:
--
Labels: pull-request-available  (was: )

> Add Additional Debugging Help for Import SQL
> 
>
> Key: HIVE-23795
> URL: https://issues.apache.org/jira/browse/HIVE-23795
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Add some things that were helpful to me when I was recently debugging an 
> issue with importing SQL.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-23795) Add Additional Debugging Help for Import SQL

2020-07-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23795?focusedWorklogId=453504&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-453504
 ]

ASF GitHub Bot logged work on HIVE-23795:
-

Author: ASF GitHub Bot
Created on: 01/Jul/20 16:26
Start Date: 01/Jul/20 16:26
Worklog Time Spent: 10m 
  Work Description: belugabehr opened a new pull request #1199:
URL: https://github.com/apache/hive/pull/1199


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 453504)
Remaining Estimate: 0h
Time Spent: 10m

> Add Additional Debugging Help for Import SQL
> 
>
> Key: HIVE-23795
> URL: https://issues.apache.org/jira/browse/HIVE-23795
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Add some things that were helpful to me when I was recently debugging an 
> issue with importing SQL.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-23795) Add Additional Debugging Help for Import SQL

2020-07-01 Thread David Mollitor (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor updated HIVE-23795:
--
Description: Add some things that were helpful to me when I was recently 
debugging an issue with importing SQL.

> Add Additional Debugging Help for Import SQL
> 
>
> Key: HIVE-23795
> URL: https://issues.apache.org/jira/browse/HIVE-23795
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
>
> Add some things that were helpful to me when I was recently debugging an 
> issue with importing SQL.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (HIVE-23795) Add Additional Debugging Help for Import SQL

2020-07-01 Thread David Mollitor (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor reassigned HIVE-23795:
-


> Add Additional Debugging Help for Import SQL
> 
>
> Key: HIVE-23795
> URL: https://issues.apache.org/jira/browse/HIVE-23795
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-23791) Optimize ACID stats generation

2020-07-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23791?focusedWorklogId=453499&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-453499
 ]

ASF GitHub Bot logged work on HIVE-23791:
-

Author: ASF GitHub Bot
Created on: 01/Jul/20 16:19
Start Date: 01/Jul/20 16:19
Worklog Time Spent: 10m 
  Work Description: pvargacl commented on a change in pull request #1196:
URL: https://github.com/apache/hive/pull/1196#discussion_r448474859



##
File path: ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java
##
@@ -1305,7 +1322,9 @@ public static Directory getAcidState(FileSystem 
fileSystem, Path candidateDirect
 bestBase, ignoreEmptyFiles, abortedDirectories, fs, validTxnList);
   }
 } else {
-  dirSnapshots = getHdfsDirSnapshots(fs, candidateDirectory);
+  if (dirSnapshots == null) {

Review comment:
   There is a slight problem here, if we are on hdfs and the file listing 
with id is supported. Few lines below there is a check for dirsnapshot == null, 
that was running every time for this case, but now it won't run if you call 
getacidstate with nonnull dirsnapshot





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 453499)
Time Spent: 20m  (was: 10m)

> Optimize ACID stats generation
> --
>
> Key: HIVE-23791
> URL: https://issues.apache.org/jira/browse/HIVE-23791
> Project: Hive
>  Issue Type: Improvement
>  Components: Statistics, Transactions
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Currently basic stats generation uses file listing for getting statistics, 
> and also uses a file listing for getting the acid state. We should optimize 
> this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-23726) Create table may throw MetaException(message:java.lang.IllegalArgumentException: Can not create a Path from a null string)

2020-07-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-23726:
--
Labels: pull-request-available  (was: )

> Create table may throw 
> MetaException(message:java.lang.IllegalArgumentException: Can not create a 
> Path from a null string)
> --
>
> Key: HIVE-23726
> URL: https://issues.apache.org/jira/browse/HIVE-23726
> Project: Hive
>  Issue Type: Bug
>Reporter: Istvan Fajth
>Assignee: Naveen Gangam
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> - Given:
>  metastore.warehouse.tenant.colocation is set to true
>  a test database was created as {{create database test location '/data'}}
>  - When:
>  I try to create a table as {{create table t1 (a int) location '/data/t1'}}
>  - Then:
> The create table fails with the following exception:
> {code}
> org.apache.hadoop.hive.ql.metadata.HiveException: 
> MetaException(message:java.lang.IllegalArgumentException: Can not create a 
> Path from a null string)
>   at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:1138)
>   at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:1143)
>   at 
> org.apache.hadoop.hive.ql.ddl.table.create.CreateTableOperation.createTableNonReplaceMode(CreateTableOperation.java:148)
>   at 
> org.apache.hadoop.hive.ql.ddl.table.create.CreateTableOperation.execute(CreateTableOperation.java:98)
>   at org.apache.hadoop.hive.ql.ddl.DDLTask.execute(DDLTask.java:80)
>   at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:213)
>   at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:105) 
>   at org.apache.hadoop.hive.ql.Executor.launchTask(Executor.java:359)
>   at org.apache.hadoop.hive.ql.Executor.launchTasks(Executor.java:330)
>   at org.apache.hadoop.hive.ql.Executor.runTasks(Executor.java:246)
>   at org.apache.hadoop.hive.ql.Executor.execute(Executor.java:109)
>   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:721)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:488)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:482)
>   at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:166)
>   at 
> org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:225)
>   at 
> org.apache.hive.service.cli.operation.SQLOperation.access$700(SQLOperation.java:87)
>   at 
> org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:322)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1876)
>   at 
> org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:340)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: org.apache.hadoop.hive.metastore.api.MetaException: 
> java.lang.IllegalArgumentException: Can not create a Path from a null string
>   at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$create_table_req_result$create_table_req_resultStandardScheme.read(ThriftHiveMetastore.java:63325)
>   at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$create_table_req_result$create_table_req_resultStandardScheme.read(ThriftHiveMetastore.java:63293)
>   at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$create_table_req_result.read(ThriftHiveMetastore.java:63219)
>   at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:86)
>   at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_create_table_req(ThriftHiveMetastore.java:1780)
>   at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.create_table_req(ThriftHiveMetastore.java:1767)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.create_table_with_environment_context(HiveMetaStoreClient.java:3518)
>   at 
> org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.create_table_with_environment_context(SessionHiveMetaStoreClient.java:145)
>

[jira] [Work logged] (HIVE-23726) Create table may throw MetaException(message:java.lang.IllegalArgumentException: Can not create a Path from a null string)

2020-07-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23726?focusedWorklogId=453496&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-453496
 ]

ASF GitHub Bot logged work on HIVE-23726:
-

Author: ASF GitHub Bot
Created on: 01/Jul/20 16:16
Start Date: 01/Jul/20 16:16
Worklog Time Spent: 10m 
  Work Description: nrg4878 opened a new pull request #1198:
URL: https://github.com/apache/hive/pull/1198


   …ll with colocation enabled (Naveen Gangam)
   
   ## NOTICE
   
   Please create an issue in ASF JIRA before opening a pull request,
   and you need to set the title of the pull request which starts with
   the corresponding JIRA issue number. (e.g. HIVE-X: Fix a typo in YYY)
   For more details, please see 
https://cwiki.apache.org/confluence/display/Hive/HowToContribute
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 453496)
Remaining Estimate: 0h
Time Spent: 10m

> Create table may throw 
> MetaException(message:java.lang.IllegalArgumentException: Can not create a 
> Path from a null string)
> --
>
> Key: HIVE-23726
> URL: https://issues.apache.org/jira/browse/HIVE-23726
> Project: Hive
>  Issue Type: Bug
>Reporter: Istvan Fajth
>Assignee: Naveen Gangam
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> - Given:
>  metastore.warehouse.tenant.colocation is set to true
>  a test database was created as {{create database test location '/data'}}
>  - When:
>  I try to create a table as {{create table t1 (a int) location '/data/t1'}}
>  - Then:
> The create table fails with the following exception:
> {code}
> org.apache.hadoop.hive.ql.metadata.HiveException: 
> MetaException(message:java.lang.IllegalArgumentException: Can not create a 
> Path from a null string)
>   at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:1138)
>   at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:1143)
>   at 
> org.apache.hadoop.hive.ql.ddl.table.create.CreateTableOperation.createTableNonReplaceMode(CreateTableOperation.java:148)
>   at 
> org.apache.hadoop.hive.ql.ddl.table.create.CreateTableOperation.execute(CreateTableOperation.java:98)
>   at org.apache.hadoop.hive.ql.ddl.DDLTask.execute(DDLTask.java:80)
>   at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:213)
>   at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:105) 
>   at org.apache.hadoop.hive.ql.Executor.launchTask(Executor.java:359)
>   at org.apache.hadoop.hive.ql.Executor.launchTasks(Executor.java:330)
>   at org.apache.hadoop.hive.ql.Executor.runTasks(Executor.java:246)
>   at org.apache.hadoop.hive.ql.Executor.execute(Executor.java:109)
>   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:721)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:488)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:482)
>   at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:166)
>   at 
> org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:225)
>   at 
> org.apache.hive.service.cli.operation.SQLOperation.access$700(SQLOperation.java:87)
>   at 
> org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:322)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1876)
>   at 
> org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:340)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: org.apache.hadoop.hive.metastore.api.MetaException: 
> java.lang.IllegalArgumentException: Can not create a Path from a null string
>   at 
> org.apac

[jira] [Work logged] (HIVE-23751) QTest: Override #mkdirs() method in ProxyFileSystem To Align After HADOOP-16582

2020-07-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23751?focusedWorklogId=453497&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-453497
 ]

ASF GitHub Bot logged work on HIVE-23751:
-

Author: ASF GitHub Bot
Created on: 01/Jul/20 16:16
Start Date: 01/Jul/20 16:16
Worklog Time Spent: 10m 
  Work Description: kgyrtkirk commented on pull request #1167:
URL: https://github.com/apache/hive/pull/1167#issuecomment-652514691


   @shameersss1 : I think your email address is "sra?m...@qubole.com" but 
github wants to add it only as a "Co-Authored" thing 
and when it used to do this - it usually changes the author's email address 
to "someth...@users.noreply.github.com" 
   
   there are 2 things which could cause this at 
https://github.com/settings/emails :
   * you don't have your email address associated with your github account
   * you have the "keep my address private" checked
   
   but...if you want me to merge it with "@users.noreply.github.com" just let 
me know :D



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 453497)
Time Spent: 20m  (was: 10m)

> QTest: Override #mkdirs() method in ProxyFileSystem To Align After 
> HADOOP-16582
> ---
>
> Key: HIVE-23751
> URL: https://issues.apache.org/jira/browse/HIVE-23751
> Project: Hive
>  Issue Type: Task
>Reporter: Syed Shameerur Rahman
>Assignee: Syed Shameerur Rahman
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0, 3.2.0
>
> Attachments: HIVE-23751.01.patch
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> HADOOP-16582 have changed the way how mkdirs() work:
> *Before HADOOP-16582:*
> All calls to mkdirs(p) were fast-tracked to FileSystem.mkdirs which were then 
> re-routed to mkdirs(p, permission) method. For ProxyFileSytem the call would 
> look like
> {code:java}
> FileUtiles.mkdir(p)  ->  FileSystem.mkdirs(p) ---> 
> ProxyFileSytem.mkdirs(p,permission)
> {code}
> An implementation of FileSystem have only needed implement mkdirs(p, 
> permission)
> *After HADOOP-16582:*
> Since FilterFileSystem overrides mkdirs(p) method the new call to 
> ProxyFileSystem would look like
> {code:java}
> FileUtiles.mkdir(p) ---> FilterFileSystem.mkdirs(p) -->
> {code}
> This will make all the qtests fails with the below exception 
> {code:java}
> Caused by: java.lang.IllegalArgumentException: Wrong FS: 
> pfile:/media/ebs1/workspace/hive-3.1-qtest/group/5/label/HiveQTest/hive-1.2.0/itests/qtest/target/warehouse/dest1,
>  expected: file:///
> {code}
> Note: We will hit this issue when we bump up hadoop version in hive.
> So as per the discussion in HADOOP-16963 ProxyFileSystem would need to 
> override the mkdirs(p) method inorder to solve the above problem. So now the 
> new flow would look like
> {code:java}
> FileUtiles.mkdir(p)  >   ProxyFileSytem.mkdirs(p) ---> 
> ProxyFileSytem.mkdirs(p, permission) --->
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-23793) Review of QueryInfo Class

2020-07-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23793?focusedWorklogId=453493&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-453493
 ]

ASF GitHub Bot logged work on HIVE-23793:
-

Author: ASF GitHub Bot
Created on: 01/Jul/20 16:11
Start Date: 01/Jul/20 16:11
Worklog Time Spent: 10m 
  Work Description: belugabehr opened a new pull request #1197:
URL: https://github.com/apache/hive/pull/1197


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 453493)
Remaining Estimate: 0h
Time Spent: 10m

> Review of QueryInfo Class
> -
>
> Key: HIVE-23793
> URL: https://issues.apache.org/jira/browse/HIVE-23793
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-23793) Review of QueryInfo Class

2020-07-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-23793:
--
Labels: pull-request-available  (was: )

> Review of QueryInfo Class
> -
>
> Key: HIVE-23793
> URL: https://issues.apache.org/jira/browse/HIVE-23793
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (HIVE-23793) Review of QueryInfo Class

2020-07-01 Thread David Mollitor (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor reassigned HIVE-23793:
-


> Review of QueryInfo Class
> -
>
> Key: HIVE-23793
> URL: https://issues.apache.org/jira/browse/HIVE-23793
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-23789) Merge ValidTxnManager into DriverTxnHandler

2020-07-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23789?focusedWorklogId=453446&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-453446
 ]

ASF GitHub Bot logged work on HIVE-23789:
-

Author: ASF GitHub Bot
Created on: 01/Jul/20 15:00
Start Date: 01/Jul/20 15:00
Worklog Time Spent: 10m 
  Work Description: miklosgergely merged pull request #1194:
URL: https://github.com/apache/hive/pull/1194


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 453446)
Time Spent: 50m  (was: 40m)

> Merge ValidTxnManager into DriverTxnHandler
> ---
>
> Key: HIVE-23789
> URL: https://issues.apache.org/jira/browse/HIVE-23789
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive
>Reporter: Miklos Gergely
>Assignee: Miklos Gergely
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Comment Edited] (HIVE-23727) Improve SQLOperation log handling when cleanup

2020-07-01 Thread Zhihua Deng (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-23727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17140552#comment-17140552
 ] 

Zhihua Deng edited comment on HIVE-23727 at 7/1/20, 2:52 PM:
-

In a busy env, the operation may be pended(asyncPrepare is enabled),  so it's 
better to change the condition from if (shouldRunAsync() && state != 
OperationState.CANCELED && state != OperationState.TIMEDOUT) to if 
(shouldRunAsync() &&  oldState == OperationState.PENDING). 


was (Author: dengzh):
In a busy env, the operation may be pended(asyncPrepare is enabled),  so it's 
better to change the condition from _if (shouldRunAsync() && state != 
OperationState.CANCELED && state != OperationState.TIMEDOUT)_ to _if 
(shouldRunAsync() &&  oldState == OperationState.PENDING__)._ 

> Improve SQLOperation log handling when cleanup
> --
>
> Key: HIVE-23727
> URL: https://issues.apache.org/jira/browse/HIVE-23727
> Project: Hive
>  Issue Type: Improvement
>Reporter: Zhihua Deng
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> The SQLOperation checks _if (shouldRunAsync() && state != 
> OperationState.CANCELED && state != OperationState.TIMEDOUT)_ to cancel the 
> background task. If true, the state should not be OperationState.CANCELED, so 
> logging under the state == OperationState.CANCELED should never happen.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Comment Edited] (HIVE-23727) Improve SQLOperation log handling when cleanup

2020-07-01 Thread Zhihua Deng (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-23727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17140479#comment-17140479
 ] 

Zhihua Deng edited comment on HIVE-23727 at 7/1/20, 2:51 PM:
-

I'm wondering if we can improve the whole branch if (shouldRunAsync() && state 
!= OperationState.CANCELED && state != OperationState.TIMEDOUT)  here.  The 
codes here make some confusing to me, as state = OperationState.CLOSED will be 
the only case that the canceling background will take effect, in this case the 
operation may be finished, closed, failed, running(ctrl+c or session timeout) 
or pended. There is no need to cancel the finished, closed, failed operations, 
the running operations can be treated as the timeout operations, which are 
cleaned up by driver::close.


was (Author: dengzh):
I'm wondering if we can improve the whole branch ```_if (shouldRunAsync() && 
state != OperationState.CANCELED && state != OperationState.TIMEDOUT)_ ``` 
here.  The codes make some confusing to me, as if the driver::close has done 
the case when the operation being canceled or timeout and there is no need to 
cancel the backgound of the operation being closed, finished or failed(error). 
The cases that the canceling background will take effect are operations being 
RUNNING and PENDING(_state=__OperationState._CLOSED is passed to _cleanup, ctrl 
+ c)_ but canceling the background of a running operation can be treated as the 
timeout operation(cause they are running operation before timeout).

> Improve SQLOperation log handling when cleanup
> --
>
> Key: HIVE-23727
> URL: https://issues.apache.org/jira/browse/HIVE-23727
> Project: Hive
>  Issue Type: Improvement
>Reporter: Zhihua Deng
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> The SQLOperation checks _if (shouldRunAsync() && state != 
> OperationState.CANCELED && state != OperationState.TIMEDOUT)_ to cancel the 
> background task. If true, the state should not be OperationState.CANCELED, so 
> logging under the state == OperationState.CANCELED should never happen.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HIVE-23764) Remove unnecessary getLastFlushLength when checking delete delta files

2020-07-01 Thread Peter Vary (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-23764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17149471#comment-17149471
 ] 

Peter Vary commented on HIVE-23764:
---

[~rajesh.balamohan]: I see that in HIVE-23597 we have issues with some tests.
Also caching the OrcTail might be better placed in LLAP IO, and [~szita] is 
working on a possible solution.
What do you think about pushing this change, and if we hit some road-block with 
the LLAP IO solution then we might pick up HIVE-23597 again?

Thanks,
Peter 

> Remove unnecessary getLastFlushLength when checking delete delta files
> --
>
> Key: HIVE-23764
> URL: https://issues.apache.org/jira/browse/HIVE-23764
> Project: Hive
>  Issue Type: Improvement
>  Components: Transactions
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> VectorizedOrcAcidRowBatchReader$ColumnizedDeleteEventRegistry calls 
> OrcAcidUtils.getLastFlushLength for every delete delta file.
> Even the comment says:
> {code}
>   // NOTE: Calling last flush length below is more for 
> future-proofing when we have
>   // streaming deletes. But currently we don't support streaming 
> deletes, and this can
>   // be removed if this becomes a performance issue.
> {code}
> If we have a table with 5 updates (1 base + 5 delta + 5 delete_delta), then 
> for every base + delta dir we will check all of the delete_delta directories, 
> and check the getLastFlushLength method which will result in 6*5=30 
> unnecessary NN/S3 calls.
> We should remove the check as already proposed in the comment.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-20447) Add JSON Outputformat support

2020-07-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-20447?focusedWorklogId=453406&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-453406
 ]

ASF GitHub Bot logged work on HIVE-20447:
-

Author: ASF GitHub Bot
Created on: 01/Jul/20 12:57
Start Date: 01/Jul/20 12:57
Worklog Time Spent: 10m 
  Work Description: belugabehr edited a comment on pull request #1169:
URL: https://github.com/apache/hive/pull/1169#issuecomment-652401570


   Please also add a second JSON formatter called "jsonfile" that does not 
output an standard JSON array structure, but does one JSON record per line, 
just like Hive accepts for reading JSON.
   
   
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-StorageFormatsStorageFormatsRowFormat,StorageFormat,andSerDe
   
   It most likely can just `extend` this `JSONOutputFormat` class and override 
the `printHeader`/`printFooter` method to be no-ops



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 453406)
Time Spent: 1h 40m  (was: 1.5h)

> Add JSON Outputformat support
> -
>
> Key: HIVE-20447
> URL: https://issues.apache.org/jira/browse/HIVE-20447
> Project: Hive
>  Issue Type: Task
>  Components: Beeline
>Reporter: Max Efremov
>Assignee: Hunter Logan
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-20447.01.patch
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> This function is present in SQLLine. We need add it to beeline too.
> https://github.com/julianhyde/sqlline/pull/84



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-20447) Add JSON Outputformat support

2020-07-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-20447?focusedWorklogId=453405&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-453405
 ]

ASF GitHub Bot logged work on HIVE-20447:
-

Author: ASF GitHub Bot
Created on: 01/Jul/20 12:56
Start Date: 01/Jul/20 12:56
Worklog Time Spent: 10m 
  Work Description: belugabehr commented on pull request #1169:
URL: https://github.com/apache/hive/pull/1169#issuecomment-652401570


   Please also add a second JSON formatter called "jsonfile" that does not 
output an standard JSON array structure, but does one JSON record per line, 
just like Hive accepts for reading JSON.
   
   
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-StorageFormatsStorageFormatsRowFormat,StorageFormat,andSerDe



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 453405)
Time Spent: 1.5h  (was: 1h 20m)

> Add JSON Outputformat support
> -
>
> Key: HIVE-20447
> URL: https://issues.apache.org/jira/browse/HIVE-20447
> Project: Hive
>  Issue Type: Task
>  Components: Beeline
>Reporter: Max Efremov
>Assignee: Hunter Logan
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-20447.01.patch
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> This function is present in SQLLine. We need add it to beeline too.
> https://github.com/julianhyde/sqlline/pull/84



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-20447) Add JSON Outputformat support

2020-07-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-20447?focusedWorklogId=453404&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-453404
 ]

ASF GitHub Bot logged work on HIVE-20447:
-

Author: ASF GitHub Bot
Created on: 01/Jul/20 12:54
Start Date: 01/Jul/20 12:54
Worklog Time Spent: 10m 
  Work Description: belugabehr commented on a change in pull request #1169:
URL: https://github.com/apache/hive/pull/1169#discussion_r448341358



##
File path: pom.xml
##
@@ -1486,6 +1486,7 @@
   **/patchprocess/**
   **/metastore_db/**
   **/test/resources/**/*.ldif
+  .vscode/**

Review comment:
   Since this has nothing to do with JSON formatting in beeline, save it 
for another JIRA.

##
File path: beeline/src/test/org/apache/hive/beeline/TestJSONOutputFormat.java
##
@@ -0,0 +1,137 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hive.beeline;
+
+import static org.mockito.ArgumentMatchers.anyInt;
+
+import org.junit.Before;
+import org.junit.Test;
+import org.mockito.invocation.InvocationOnMock;
+import org.mockito.stubbing.Answer;
+
+import java.io.PrintStream;
+import java.sql.ResultSet;
+import java.sql.ResultSetMetaData;
+import java.sql.SQLException;
+import java.sql.Types;
+import java.util.ArrayList;
+
+import static org.junit.Assert.assertArrayEquals;
+import static org.mockito.Mockito.mock;
+import static org.mockito.Mockito.when;
+
+public class TestJSONOutputFormat {
+
+  private final Object[][] mockRowData = {
+  {"aaa", true, null, Double.valueOf(3.14), "\\/\b\f\n\r\t"}
+  };
+  private TestJSONOutputFormat.BeelineMock mockBeeline;
+  private ResultSet mockResultSet;
+  private MockRow mockRow;
+
+  @Before
+  public void setupMockData() throws SQLException {
+mockBeeline = new TestJSONOutputFormat.BeelineMock();
+mockResultSet = mock(ResultSet.class);
+
+ResultSetMetaData mockResultSetMetaData = mock(ResultSetMetaData.class);
+when(mockResultSetMetaData.getColumnCount()).thenReturn(5);
+when(mockResultSetMetaData.getColumnLabel(1)).thenReturn("string");
+when(mockResultSetMetaData.getColumnLabel(2)).thenReturn("boolean");
+when(mockResultSetMetaData.getColumnLabel(3)).thenReturn("null");
+when(mockResultSetMetaData.getColumnLabel(4)).thenReturn("double");
+when(mockResultSetMetaData.getColumnLabel(5)).thenReturn("special 
symbols");
+when(mockResultSetMetaData.getColumnType(1)).thenReturn(Types.VARCHAR);
+when(mockResultSetMetaData.getColumnType(2)).thenReturn(Types.BOOLEAN);
+when(mockResultSetMetaData.getColumnType(3)).thenReturn(Types.NULL);
+when(mockResultSetMetaData.getColumnType(4)).thenReturn(Types.DOUBLE);
+when(mockResultSetMetaData.getColumnType(5)).thenReturn(Types.VARCHAR);
+when(mockResultSet.getMetaData()).thenReturn(mockResultSetMetaData);
+
+mockRow = new MockRow();
+// returns true as long as there is more data in mockResultData array
+when(mockResultSet.next()).thenAnswer(new Answer() {
+  private int mockRowDataIndex = 0;
+
+  @Override
+  public Boolean answer(final InvocationOnMock invocation) {
+if (mockRowDataIndex < mockRowData.length) {
+  mockRow.setCurrentRowData(mockRowData[mockRowDataIndex]);
+  mockRowDataIndex++;
+  return true;
+} else {
+  return false;
+}
+  }
+});
+
+when(mockResultSet.getObject(anyInt())).thenAnswer(new Answer() {
+  @Override
+  public Object answer(final InvocationOnMock invocation) {
+Object[] args = invocation.getArguments();
+int index = ((Integer) args[0]);
+return mockRow.getColumn(index);
+  }
+});
+  }
+
+  /**
+   * Test printing output data with JsonOutputFormat
+   */
+  @Test
+  public final void testPrint() throws SQLException {
+setupMockData();
+BufferedRows bfRows = new BufferedRows(mockBeeline, mockResultSet);
+JSONOutputFormat instance = new JSONOutputFormat(mockBeeline);
+instance.print(bfRows);
+ArrayList actualOutput = mockBeeline.getLines();
+ArrayList expectedO

[jira] [Updated] (HIVE-23737) LLAP: Reuse dagDelete Feature Of Tez Custom Shuffle Handler Instead Of LLAP's dagDelete

2020-07-01 Thread Syed Shameerur Rahman (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Syed Shameerur Rahman updated HIVE-23737:
-
Attachment: HIVE-23737.02.patch

> LLAP: Reuse dagDelete Feature Of Tez Custom Shuffle Handler Instead Of LLAP's 
> dagDelete
> ---
>
> Key: HIVE-23737
> URL: https://issues.apache.org/jira/browse/HIVE-23737
> Project: Hive
>  Issue Type: Improvement
>  Components: llap
>Reporter: Syed Shameerur Rahman
>Assignee: Syed Shameerur Rahman
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23737.01.patch, HIVE-23737.02.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> LLAP have a dagDelete feature added as part of HIVE-9911, But now that Tez 
> have added support for dagDelete in custom shuffle handler (TEZ-3362) we 
> could re-use that feature in LLAP. 
> There are some added advantages of using Tez's dagDelete feature rather than 
> the current LLAP's dagDelete feature.
> 1) We can easily extend this feature to accommodate the upcoming features 
> such as vertex and failed task attempt shuffle data clean up. Refer TEZ-3363 
> and TEZ-4129
> 2) It will be more easier to maintain this feature by separating it out from 
> the Hive's code path. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HIVE-23737) LLAP: Reuse dagDelete Feature Of Tez Custom Shuffle Handler Instead Of LLAP's dagDelete

2020-07-01 Thread Syed Shameerur Rahman (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-23737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17149414#comment-17149414
 ] 

Syed Shameerur Rahman commented on HIVE-23737:
--

Added unit tests in HIVE-23737.02.patch

> LLAP: Reuse dagDelete Feature Of Tez Custom Shuffle Handler Instead Of LLAP's 
> dagDelete
> ---
>
> Key: HIVE-23737
> URL: https://issues.apache.org/jira/browse/HIVE-23737
> Project: Hive
>  Issue Type: Improvement
>  Components: llap
>Reporter: Syed Shameerur Rahman
>Assignee: Syed Shameerur Rahman
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23737.01.patch, HIVE-23737.02.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> LLAP have a dagDelete feature added as part of HIVE-9911, But now that Tez 
> have added support for dagDelete in custom shuffle handler (TEZ-3362) we 
> could re-use that feature in LLAP. 
> There are some added advantages of using Tez's dagDelete feature rather than 
> the current LLAP's dagDelete feature.
> 1) We can easily extend this feature to accommodate the upcoming features 
> such as vertex and failed task attempt shuffle data clean up. Refer TEZ-3363 
> and TEZ-4129
> 2) It will be more easier to maintain this feature by separating it out from 
> the Hive's code path. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-23760) Upgrading to Kafka 2.5 Clients

2020-07-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23760?focusedWorklogId=453398&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-453398
 ]

ASF GitHub Bot logged work on HIVE-23760:
-

Author: ASF GitHub Bot
Created on: 01/Jul/20 12:42
Start Date: 01/Jul/20 12:42
Worklog Time Spent: 10m 
  Work Description: akatona84 opened a new pull request #1175:
URL: https://github.com/apache/hive/pull/1175


   ## NOTICE
   
   Please create an issue in ASF JIRA before opening a pull request,
   and you need to set the title of the pull request which starts with
   the corresponding JIRA issue number. (e.g. HIVE-X: Fix a typo in YYY)
   For more details, please see 
https://cwiki.apache.org/confluence/display/Hive/HowToContribute
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 453398)
Time Spent: 50m  (was: 40m)

> Upgrading to Kafka 2.5 Clients
> --
>
> Key: HIVE-23760
> URL: https://issues.apache.org/jira/browse/HIVE-23760
> Project: Hive
>  Issue Type: Improvement
>  Components: kafka integration
>Reporter: Andras Katona
>Assignee: Karen Coppage
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-23638) Fix FindBug issues in hive-common

2020-07-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23638?focusedWorklogId=453395&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-453395
 ]

ASF GitHub Bot logged work on HIVE-23638:
-

Author: ASF GitHub Bot
Created on: 01/Jul/20 12:40
Start Date: 01/Jul/20 12:40
Worklog Time Spent: 10m 
  Work Description: pgaref commented on pull request #1161:
URL: https://github.com/apache/hive/pull/1161#issuecomment-652394311


   @belugabehr  thanks for the review! 
   Addressed your comments in the latest commit -- could you please take 
another look?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 453395)
Time Spent: 50m  (was: 40m)

> Fix FindBug issues in hive-common
> -
>
> Key: HIVE-23638
> URL: https://issues.apache.org/jira/browse/HIVE-23638
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Panagiotis Garefalakis
>Assignee: Panagiotis Garefalakis
>Priority: Major
>  Labels: pull-request-available
> Attachments: spotbugsXml.xml
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> mvn -Pspotbugs 
> -Dorg.slf4j.simpleLogger.log.org.apache.maven.plugin.surefire.SurefirePlugin=INFO
>  -pl :hive-common test-compile 
> com.github.spotbugs:spotbugs-maven-plugin:4.0.0:check



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-23760) Upgrading to Kafka 2.5 Clients

2020-07-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23760?focusedWorklogId=453393&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-453393
 ]

ASF GitHub Bot logged work on HIVE-23760:
-

Author: ASF GitHub Bot
Created on: 01/Jul/20 12:39
Start Date: 01/Jul/20 12:39
Worklog Time Spent: 10m 
  Work Description: belugabehr commented on pull request #1175:
URL: https://github.com/apache/hive/pull/1175#issuecomment-652393923


   Will close/open to re-launch tests



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 453393)
Time Spent: 0.5h  (was: 20m)

> Upgrading to Kafka 2.5 Clients
> --
>
> Key: HIVE-23760
> URL: https://issues.apache.org/jira/browse/HIVE-23760
> Project: Hive
>  Issue Type: Improvement
>  Components: kafka integration
>Reporter: Andras Katona
>Assignee: Karen Coppage
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-23760) Upgrading to Kafka 2.5 Clients

2020-07-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23760?focusedWorklogId=453394&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-453394
 ]

ASF GitHub Bot logged work on HIVE-23760:
-

Author: ASF GitHub Bot
Created on: 01/Jul/20 12:39
Start Date: 01/Jul/20 12:39
Worklog Time Spent: 10m 
  Work Description: belugabehr closed pull request #1175:
URL: https://github.com/apache/hive/pull/1175


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 453394)
Time Spent: 40m  (was: 0.5h)

> Upgrading to Kafka 2.5 Clients
> --
>
> Key: HIVE-23760
> URL: https://issues.apache.org/jira/browse/HIVE-23760
> Project: Hive
>  Issue Type: Improvement
>  Components: kafka integration
>Reporter: Andras Katona
>Assignee: Karen Coppage
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-23598) Add option to rewrite NTILE and RANK to sketch functions

2020-07-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23598?focusedWorklogId=453384&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-453384
 ]

ASF GitHub Bot logged work on HIVE-23598:
-

Author: ASF GitHub Bot
Created on: 01/Jul/20 12:24
Start Date: 01/Jul/20 12:24
Worklog Time Spent: 10m 
  Work Description: kgyrtkirk merged pull request #1126:
URL: https://github.com/apache/hive/pull/1126


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 453384)
Time Spent: 1h 10m  (was: 1h)

> Add option to rewrite NTILE and RANK to sketch functions
> 
>
> Key: HIVE-23598
> URL: https://issues.apache.org/jira/browse/HIVE-23598
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (HIVE-23598) Add option to rewrite NTILE and RANK to sketch functions

2020-07-01 Thread Zoltan Haindrich (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich resolved HIVE-23598.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

merged into master. Thank you Jesus for reviewing the changes!

> Add option to rewrite NTILE and RANK to sketch functions
> 
>
> Key: HIVE-23598
> URL: https://issues.apache.org/jira/browse/HIVE-23598
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-23791) Optimize ACID stats generation

2020-07-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-23791:
--
Labels: pull-request-available  (was: )

> Optimize ACID stats generation
> --
>
> Key: HIVE-23791
> URL: https://issues.apache.org/jira/browse/HIVE-23791
> Project: Hive
>  Issue Type: Improvement
>  Components: Statistics, Transactions
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently basic stats generation uses file listing for getting statistics, 
> and also uses a file listing for getting the acid state. We should optimize 
> this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-23791) Optimize ACID stats generation

2020-07-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23791?focusedWorklogId=453382&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-453382
 ]

ASF GitHub Bot logged work on HIVE-23791:
-

Author: ASF GitHub Bot
Created on: 01/Jul/20 12:12
Start Date: 01/Jul/20 12:12
Worklog Time Spent: 10m 
  Work Description: pvary opened a new pull request #1196:
URL: https://github.com/apache/hive/pull/1196


   Run AcidUtils.getHdfsDirSnapshots to collect the relevant files, and use 
that for stats generation



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 453382)
Remaining Estimate: 0h
Time Spent: 10m

> Optimize ACID stats generation
> --
>
> Key: HIVE-23791
> URL: https://issues.apache.org/jira/browse/HIVE-23791
> Project: Hive
>  Issue Type: Improvement
>  Components: Statistics, Transactions
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently basic stats generation uses file listing for getting statistics, 
> and also uses a file listing for getting the acid state. We should optimize 
> this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-23722) Show the operation's drilldown link to client

2020-07-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23722?focusedWorklogId=453376&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-453376
 ]

ASF GitHub Bot logged work on HIVE-23722:
-

Author: ASF GitHub Bot
Created on: 01/Jul/20 11:53
Start Date: 01/Jul/20 11:53
Worklog Time Spent: 10m 
  Work Description: dengzhhu653 edited a comment on pull request #1145:
URL: https://github.com/apache/hive/pull/1145#issuecomment-652294571


   @kgyrtkirk  could you please take another look at the changes? thanks



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 453376)
Time Spent: 1.5h  (was: 1h 20m)

> Show the operation's drilldown link to client
> -
>
> Key: HIVE-23722
> URL: https://issues.apache.org/jira/browse/HIVE-23722
> Project: Hive
>  Issue Type: Improvement
>Reporter: Zhihua Deng
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Now the HiveServer2 webui provides a drilldown link for many collected 
> metrics or messages about a operation, but it's not easy for a end user to 
> find the target url of his submitted query. Less knowledge on the deployment, 
> HA based environment, and the multiple running queries can make things more 
> difficult. The jira provides a way to show the link to the interested end 
> user when enabled.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (HIVE-23774) Reduce log level at aggrColStatsForPartitions in MetaStoreDirectSql.java

2020-07-01 Thread Peter Vary (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Vary resolved HIVE-23774.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Pushed to master.
Thanks for the patch [~b.maidics]!

> Reduce log level at aggrColStatsForPartitions in MetaStoreDirectSql.java
> 
>
> Key: HIVE-23774
> URL: https://issues.apache.org/jira/browse/HIVE-23774
> Project: Hive
>  Issue Type: Improvement
>Reporter: Barnabas Maidics
>Assignee: Barnabas Maidics
>Priority: Trivial
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> [https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java#L1589]
> This log is not needed at INFO log level.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-23774) Reduce log level at aggrColStatsForPartitions in MetaStoreDirectSql.java

2020-07-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23774?focusedWorklogId=453372&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-453372
 ]

ASF GitHub Bot logged work on HIVE-23774:
-

Author: ASF GitHub Bot
Created on: 01/Jul/20 11:34
Start Date: 01/Jul/20 11:34
Worklog Time Spent: 10m 
  Work Description: pvary merged pull request #1189:
URL: https://github.com/apache/hive/pull/1189


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 453372)
Time Spent: 20m  (was: 10m)

> Reduce log level at aggrColStatsForPartitions in MetaStoreDirectSql.java
> 
>
> Key: HIVE-23774
> URL: https://issues.apache.org/jira/browse/HIVE-23774
> Project: Hive
>  Issue Type: Improvement
>Reporter: Barnabas Maidics
>Assignee: Barnabas Maidics
>Priority: Trivial
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> [https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java#L1589]
> This log is not needed at INFO log level.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HIVE-23737) LLAP: Reuse dagDelete Feature Of Tez Custom Shuffle Handler Instead Of LLAP's dagDelete

2020-07-01 Thread Syed Shameerur Rahman (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-23737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17149309#comment-17149309
 ] 

Syed Shameerur Rahman commented on HIVE-23737:
--

HIVE-23737.01.patch is the first cut WIP patch. Need to add tests around the 
feature and do some clean up of old code. 
[~rajesh.balamohan] [~prasanth_j] [~gopalv] Could you guys please share your 
thoughts on the this initial patch.

> LLAP: Reuse dagDelete Feature Of Tez Custom Shuffle Handler Instead Of LLAP's 
> dagDelete
> ---
>
> Key: HIVE-23737
> URL: https://issues.apache.org/jira/browse/HIVE-23737
> Project: Hive
>  Issue Type: Improvement
>  Components: llap
>Reporter: Syed Shameerur Rahman
>Assignee: Syed Shameerur Rahman
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23737.01.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> LLAP have a dagDelete feature added as part of HIVE-9911, But now that Tez 
> have added support for dagDelete in custom shuffle handler (TEZ-3362) we 
> could re-use that feature in LLAP. 
> There are some added advantages of using Tez's dagDelete feature rather than 
> the current LLAP's dagDelete feature.
> 1) We can easily extend this feature to accommodate the upcoming features 
> such as vertex and failed task attempt shuffle data clean up. Refer TEZ-3363 
> and TEZ-4129
> 2) It will be more easier to maintain this feature by separating it out from 
> the Hive's code path. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-23737) LLAP: Reuse dagDelete Feature Of Tez Custom Shuffle Handler Instead Of LLAP's dagDelete

2020-07-01 Thread Syed Shameerur Rahman (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Syed Shameerur Rahman updated HIVE-23737:
-
Attachment: HIVE-23737.01.patch
Status: Patch Available  (was: Open)

> LLAP: Reuse dagDelete Feature Of Tez Custom Shuffle Handler Instead Of LLAP's 
> dagDelete
> ---
>
> Key: HIVE-23737
> URL: https://issues.apache.org/jira/browse/HIVE-23737
> Project: Hive
>  Issue Type: Improvement
>  Components: llap
>Reporter: Syed Shameerur Rahman
>Assignee: Syed Shameerur Rahman
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23737.01.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> LLAP have a dagDelete feature added as part of HIVE-9911, But now that Tez 
> have added support for dagDelete in custom shuffle handler (TEZ-3362) we 
> could re-use that feature in LLAP. 
> There are some added advantages of using Tez's dagDelete feature rather than 
> the current LLAP's dagDelete feature.
> 1) We can easily extend this feature to accommodate the upcoming features 
> such as vertex and failed task attempt shuffle data clean up. Refer TEZ-3363 
> and TEZ-4129
> 2) It will be more easier to maintain this feature by separating it out from 
> the Hive's code path. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-23737) LLAP: Reuse dagDelete Feature Of Tez Custom Shuffle Handler Instead Of LLAP's dagDelete

2020-07-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-23737:
--
Labels: pull-request-available  (was: )

> LLAP: Reuse dagDelete Feature Of Tez Custom Shuffle Handler Instead Of LLAP's 
> dagDelete
> ---
>
> Key: HIVE-23737
> URL: https://issues.apache.org/jira/browse/HIVE-23737
> Project: Hive
>  Issue Type: Improvement
>  Components: llap
>Reporter: Syed Shameerur Rahman
>Assignee: Syed Shameerur Rahman
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> LLAP have a dagDelete feature added as part of HIVE-9911, But now that Tez 
> have added support for dagDelete in custom shuffle handler (TEZ-3362) we 
> could re-use that feature in LLAP. 
> There are some added advantages of using Tez's dagDelete feature rather than 
> the current LLAP's dagDelete feature.
> 1) We can easily extend this feature to accommodate the upcoming features 
> such as vertex and failed task attempt shuffle data clean up. Refer TEZ-3363 
> and TEZ-4129
> 2) It will be more easier to maintain this feature by separating it out from 
> the Hive's code path. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-23737) LLAP: Reuse dagDelete Feature Of Tez Custom Shuffle Handler Instead Of LLAP's dagDelete

2020-07-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23737?focusedWorklogId=453359&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-453359
 ]

ASF GitHub Bot logged work on HIVE-23737:
-

Author: ASF GitHub Bot
Created on: 01/Jul/20 10:33
Start Date: 01/Jul/20 10:33
Worklog Time Spent: 10m 
  Work Description: shameersss1 opened a new pull request #1195:
URL: https://github.com/apache/hive/pull/1195


   LLAP have a dagDelete feature added as part of HIVE-9911, But now that Tez 
have added support for dagDelete in custom shuffle handler (TEZ-3362) we could 
re-use that feature in LLAP.
   There are some added advantages of using Tez's dagDelete feature rather than 
the current LLAP's dagDelete feature.
   
   1) We can easily extend this feature to accommodate the upcoming features 
such as vertex and failed task attempt shuffle data clean up. Refer TEZ-3363 
and TEZ-4129
   
   2) It will be more easier to maintain this feature by separating it out from 
the Hive's code path.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 453359)
Remaining Estimate: 0h
Time Spent: 10m

> LLAP: Reuse dagDelete Feature Of Tez Custom Shuffle Handler Instead Of LLAP's 
> dagDelete
> ---
>
> Key: HIVE-23737
> URL: https://issues.apache.org/jira/browse/HIVE-23737
> Project: Hive
>  Issue Type: Improvement
>  Components: llap
>Reporter: Syed Shameerur Rahman
>Assignee: Syed Shameerur Rahman
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> LLAP have a dagDelete feature added as part of HIVE-9911, But now that Tez 
> have added support for dagDelete in custom shuffle handler (TEZ-3362) we 
> could re-use that feature in LLAP. 
> There are some added advantages of using Tez's dagDelete feature rather than 
> the current LLAP's dagDelete feature.
> 1) We can easily extend this feature to accommodate the upcoming features 
> such as vertex and failed task attempt shuffle data clean up. Refer TEZ-3363 
> and TEZ-4129
> 2) It will be more easier to maintain this feature by separating it out from 
> the Hive's code path. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-23787) Write all the events present in a task_queue in a single file.

2020-07-01 Thread Anishek Agarwal (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anishek Agarwal updated HIVE-23787:
---
Description: 
Events are not written to file when the queue becomes full, and it ignores the 
post_exec_hook / pre_exec_hook event. The default capacity is 64 in 
hive.hook.proto.queue.capacity config for hs2.

Now, we will increase the queue-capacity (let's say upto 256).
Also for the optimisation, need to run all the events present in a task_queue, 
and write in a single file.

  was:
DAS does not get the event when the queue becomes full, and it ignores the 
post_exec_hook / pre_exec_hook event. The default capacity is 64 in 
hive.hook.proto.queue.capacity config for hs2.

Now, we will increase the queue-capacity (let's say upto 256).
Also for the optimisation, need to run all the events present in a task_queue, 
and write in a single file.


> Write all the events present in a task_queue in a single file.
> --
>
> Key: HIVE-23787
> URL: https://issues.apache.org/jira/browse/HIVE-23787
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Reporter: Amlesh Kumar
>Assignee: Amlesh Kumar
>Priority: Major
>
> Events are not written to file when the queue becomes full, and it ignores 
> the post_exec_hook / pre_exec_hook event. The default capacity is 64 in 
> hive.hook.proto.queue.capacity config for hs2.
> Now, we will increase the queue-capacity (let's say upto 256).
> Also for the optimisation, need to run all the events present in a 
> task_queue, and write in a single file.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-23722) Show the operation's drilldown link to client

2020-07-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23722?focusedWorklogId=453313&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-453313
 ]

ASF GitHub Bot logged work on HIVE-23722:
-

Author: ASF GitHub Bot
Created on: 01/Jul/20 09:07
Start Date: 01/Jul/20 09:07
Worklog Time Spent: 10m 
  Work Description: dengzhhu653 commented on pull request #1145:
URL: https://github.com/apache/hive/pull/1145#issuecomment-652294571


   @kgyrtkirk  could you take a look at the changes? thanks



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 453313)
Time Spent: 1h 20m  (was: 1h 10m)

> Show the operation's drilldown link to client
> -
>
> Key: HIVE-23722
> URL: https://issues.apache.org/jira/browse/HIVE-23722
> Project: Hive
>  Issue Type: Improvement
>Reporter: Zhihua Deng
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Now the HiveServer2 webui provides a drilldown link for many collected 
> metrics or messages about a operation, but it's not easy for a end user to 
> find the target url of his submitted query. Less knowledge on the deployment, 
> HA based environment, and the multiple running queries can make things more 
> difficult. The jira provides a way to show the link to the interested end 
> user when enabled.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (HIVE-23791) Optimize ACID stats generation

2020-07-01 Thread Peter Vary (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Vary reassigned HIVE-23791:
-


> Optimize ACID stats generation
> --
>
> Key: HIVE-23791
> URL: https://issues.apache.org/jira/browse/HIVE-23791
> Project: Hive
>  Issue Type: Improvement
>  Components: Statistics, Transactions
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>
> Currently basic stats generation uses file listing for getting statistics, 
> and also uses a file listing for getting the acid state. We should optimize 
> this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (HIVE-23790) The error message length of 2000 is exceeded for scheduled query

2020-07-01 Thread Aasha Medhi (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aasha Medhi reassigned HIVE-23790:
--

Assignee: Zoltan Haindrich  (was: Aasha Medhi)

> The error message length of 2000 is exceeded for scheduled query
> 
>
> Key: HIVE-23790
> URL: https://issues.apache.org/jira/browse/HIVE-23790
> Project: Hive
>  Issue Type: Task
>Reporter: Aasha Medhi
>Assignee: Zoltan Haindrich
>Priority: Major
>
> {code:java}
> 2020-07-01 08:24:23,916 ERROR org.apache.thrift.server.TThreadPoolServer: 
> [pool-7-thread-189]: Error occurred during processing of message.
> org.datanucleus.exceptions.NucleusUserException: Attempt to store value 
> "FAILED: Execution Error, return code 30045 from 
> org.apache.hadoop.hive.ql.exec.repl.DirCopyTask. Permission denied: 
> user=hive, access=WRITE, inode="/":hdfs:supergroup:drwxr-xr-x
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:496)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:336)
>   at 
> org.apache.ranger.authorization.hadoop.RangerHdfsAuthorizer$RangerAccessControlEnforcer.checkDefaultEnforcer(RangerHdfsAuthorizer.java:626)
>   at 
> org.apache.ranger.authorization.hadoop.RangerHdfsAuthorizer$RangerAccessControlEnforcer.checkRangerPermission(RangerHdfsAuthorizer.java:388)
>   at 
> org.apache.ranger.authorization.hadoop.RangerHdfsAuthorizer$RangerAccessControlEnforcer.checkPermissionWithContext(RangerHdfsAuthorizer.java:229)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:239)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1908)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1892)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkAncestorAccess(FSDirectory.java:1851)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSDirMkdirOp.mkdirs(FSDirMkdirOp.java:60)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:3226)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNodeRpcServer.java:1130)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.mkdirs(ClientNamenodeProtocolServerSideTranslatorPB.java:729)
>   at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:528)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1070)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:985)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:913)
>   at java.base/java.security.AccessController.doPrivileged(Native Method)
>   at java.base/javax.security.auth.Subject.doAs(Subject.java:423)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1876)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2882)
> " in column ""ERROR_MESSAGE"" that has maximum length of 2000. Please correct 
> your data!
>   at 
> org.datanucleus.store.rdbms.mapping.datastore.CharRDBMSMapping.setString(CharRDBMSMapping.java:254)
>  ~[datanucleus-rdbms-4.1.19.jar:?]
>   at 
> org.datanucleus.store.rdbms.mapping.java.SingleFieldMapping.setString(SingleFieldMapping.java:180)
>  ~[datanucleus-rdbms-4.1.19.jar:?]
>   at 
> org.datanucleus.store.rdbms.fieldmanager.ParameterSetter.storeStringField(ParameterSetter.java:158)
>  ~[datanucleus-rdbms-4.1.19.jar:?]
>   at 
> org.datanucleus.state.AbstractStateManager.providedStringField(AbstractStateManager.java:1448)
>  ~[datanucleus-core-4.1.17.jar:?]
>   at 
> org.datanucleus.state.StateManagerImpl.providedStringField(StateManagerImpl.java:120)
>  ~[datanucleus-core-4.1.17.jar:?]
>   at 
> org.apache.hadoop.hive.metastore.model.MScheduledExecution.dnProvideField(MScheduledExecution.java)
>  ~[hive-exec-3.1.3000.7.2.1.0-246.jar:3.1.3000.7.2.1.0-246]
>   at 
> org.apache.hadoop.hive.metastore.model.MScheduledExecution.dnProvideFields(MScheduledExecution.java)
>  ~[hive-exec-3.1.3000.7.2.1.0-246.jar:3.1.3000.7.2.1.0-246]
>   at 
> org.datanucleus.state.StateManagerImpl.provideFields(StateManagerImpl.java:1170)
>  ~[datanucleus-core-4.1.17.jar:?]
>   at 
> org.datanucleus.store.rdbms.request.UpdateRequest.execute(UpdateRequest.java:326)
>  ~[datanucleus-rdbms-4.1.19.jar:?]
>   at 
> org.datanucleus.store.rdbms.RDBMSPersistenceHandler.updateObjectInTable(RDBMSPersi

[jira] [Assigned] (HIVE-23790) The error message length of 2000 is exceeded for scheduled query

2020-07-01 Thread Aasha Medhi (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aasha Medhi reassigned HIVE-23790:
--


> The error message length of 2000 is exceeded for scheduled query
> 
>
> Key: HIVE-23790
> URL: https://issues.apache.org/jira/browse/HIVE-23790
> Project: Hive
>  Issue Type: Task
>Reporter: Aasha Medhi
>Assignee: Aasha Medhi
>Priority: Major
>
> {code:java}
> 2020-07-01 08:24:23,916 ERROR org.apache.thrift.server.TThreadPoolServer: 
> [pool-7-thread-189]: Error occurred during processing of message.
> org.datanucleus.exceptions.NucleusUserException: Attempt to store value 
> "FAILED: Execution Error, return code 30045 from 
> org.apache.hadoop.hive.ql.exec.repl.DirCopyTask. Permission denied: 
> user=hive, access=WRITE, inode="/":hdfs:supergroup:drwxr-xr-x
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:496)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:336)
>   at 
> org.apache.ranger.authorization.hadoop.RangerHdfsAuthorizer$RangerAccessControlEnforcer.checkDefaultEnforcer(RangerHdfsAuthorizer.java:626)
>   at 
> org.apache.ranger.authorization.hadoop.RangerHdfsAuthorizer$RangerAccessControlEnforcer.checkRangerPermission(RangerHdfsAuthorizer.java:388)
>   at 
> org.apache.ranger.authorization.hadoop.RangerHdfsAuthorizer$RangerAccessControlEnforcer.checkPermissionWithContext(RangerHdfsAuthorizer.java:229)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:239)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1908)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1892)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkAncestorAccess(FSDirectory.java:1851)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSDirMkdirOp.mkdirs(FSDirMkdirOp.java:60)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:3226)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNodeRpcServer.java:1130)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.mkdirs(ClientNamenodeProtocolServerSideTranslatorPB.java:729)
>   at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:528)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1070)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:985)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:913)
>   at java.base/java.security.AccessController.doPrivileged(Native Method)
>   at java.base/javax.security.auth.Subject.doAs(Subject.java:423)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1876)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2882)
> " in column ""ERROR_MESSAGE"" that has maximum length of 2000. Please correct 
> your data!
>   at 
> org.datanucleus.store.rdbms.mapping.datastore.CharRDBMSMapping.setString(CharRDBMSMapping.java:254)
>  ~[datanucleus-rdbms-4.1.19.jar:?]
>   at 
> org.datanucleus.store.rdbms.mapping.java.SingleFieldMapping.setString(SingleFieldMapping.java:180)
>  ~[datanucleus-rdbms-4.1.19.jar:?]
>   at 
> org.datanucleus.store.rdbms.fieldmanager.ParameterSetter.storeStringField(ParameterSetter.java:158)
>  ~[datanucleus-rdbms-4.1.19.jar:?]
>   at 
> org.datanucleus.state.AbstractStateManager.providedStringField(AbstractStateManager.java:1448)
>  ~[datanucleus-core-4.1.17.jar:?]
>   at 
> org.datanucleus.state.StateManagerImpl.providedStringField(StateManagerImpl.java:120)
>  ~[datanucleus-core-4.1.17.jar:?]
>   at 
> org.apache.hadoop.hive.metastore.model.MScheduledExecution.dnProvideField(MScheduledExecution.java)
>  ~[hive-exec-3.1.3000.7.2.1.0-246.jar:3.1.3000.7.2.1.0-246]
>   at 
> org.apache.hadoop.hive.metastore.model.MScheduledExecution.dnProvideFields(MScheduledExecution.java)
>  ~[hive-exec-3.1.3000.7.2.1.0-246.jar:3.1.3000.7.2.1.0-246]
>   at 
> org.datanucleus.state.StateManagerImpl.provideFields(StateManagerImpl.java:1170)
>  ~[datanucleus-core-4.1.17.jar:?]
>   at 
> org.datanucleus.store.rdbms.request.UpdateRequest.execute(UpdateRequest.java:326)
>  ~[datanucleus-rdbms-4.1.19.jar:?]
>   at 
> org.datanucleus.store.rdbms.RDBMSPersistenceHandler.updateObjectInTable(RDBMSPersistenceHandler.java:409)
>  ~[datanucleus-rdbms-4.1.19.ja

[jira] [Work logged] (HIVE-23789) Merge ValidTxnManager into DriverTxnHandler

2020-07-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23789?focusedWorklogId=453291&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-453291
 ]

ASF GitHub Bot logged work on HIVE-23789:
-

Author: ASF GitHub Bot
Created on: 01/Jul/20 08:09
Start Date: 01/Jul/20 08:09
Worklog Time Spent: 10m 
  Work Description: miklosgergely commented on a change in pull request 
#1194:
URL: https://github.com/apache/hive/pull/1194#discussion_r448191052



##
File path: ql/src/java/org/apache/hadoop/hive/ql/DriverTxnHandler.java
##
@@ -288,15 +313,231 @@ private void acquireLocksInternal() throws 
CommandProcessorException, LockExcept
 }
   }
 
-  public void addHiveLocksFromContext() {
+  /**
+   *  Write the current set of valid write ids for the operated acid tables 
into the configuration so
+   *  that it can be read by the input format.
+   */
+  private ValidTxnWriteIdList recordValidWriteIds() throws LockException {
+String txnString = 
driverContext.getConf().get(ValidTxnList.VALID_TXNS_KEY);
+if (Strings.isNullOrEmpty(txnString)) {
+  throw new IllegalStateException("calling recordValidWritsIdss() without 
initializing ValidTxnList " +
+  
JavaUtils.txnIdToString(driverContext.getTxnManager().getCurrentTxnId()));
+}
+
+ValidTxnWriteIdList txnWriteIds = getTxnWriteIds(txnString);
+setValidWriteIds(txnWriteIds);
+
+LOG.debug("Encoding valid txn write ids info {} txnid: {}", 
txnWriteIds.toString(),
+driverContext.getTxnManager().getCurrentTxnId());
+return txnWriteIds;
+  }
+
+  private ValidTxnWriteIdList getTxnWriteIds(String txnString) throws 
LockException {
+List txnTables = getTransactionalTables(getTables(true, true));
+ValidTxnWriteIdList txnWriteIds = null;
+if (driverContext.getCompactionWriteIds() != null) {
+  // This is kludgy: here we need to read with Compactor's snapshot/txn 
rather than the snapshot of the current
+  // {@code txnMgr}, in effect simulating a "flashback query" but can't 
actually share compactor's txn since it
+  // would run multiple statements.  See more comments in {@link 
org.apache.hadoop.hive.ql.txn.compactor.Worker}
+  // where it start the compactor txn*/
+  if (txnTables.size() != 1) {
+throw new LockException("Unexpected tables in compaction: " + 
txnTables);
+  }
+  txnWriteIds = new ValidTxnWriteIdList(driverContext.getCompactorTxnId());
+  
txnWriteIds.addTableValidWriteIdList(driverContext.getCompactionWriteIds());
+} else {
+  txnWriteIds = driverContext.getTxnManager().getValidWriteIds(txnTables, 
txnString);
+}
+if (driverContext.getTxnType() == TxnType.READ_ONLY && !getTables(false, 
true).isEmpty()) {
+  throw new IllegalStateException(String.format(
+  "Inferred transaction type '%s' doesn't conform to the actual query 
string '%s'",
+  driverContext.getTxnType(), 
driverContext.getQueryState().getQueryString()));
+}
+return txnWriteIds;
+  }
+
+  private void setValidWriteIds(ValidTxnWriteIdList txnWriteIds) {
+driverContext.getConf().set(ValidTxnWriteIdList.VALID_TABLES_WRITEIDS_KEY, 
txnWriteIds.toString());
+if (driverContext.getPlan().getFetchTask() != null) {
+  // This is needed for {@link HiveConf.ConfVars.HIVEFETCHTASKCONVERSION} 
optimization which initializes JobConf
+  // in FetchOperator before recordValidTxns() but this has to be done 
after locks are acquired to avoid race
+  // conditions in ACID. This case is supported only for single source 
query.
+  Operator source = 
driverContext.getPlan().getFetchTask().getWork().getSource();
+  if (source instanceof TableScanOperator) {
+TableScanOperator tsOp = (TableScanOperator)source;
+String fullTableName = 
AcidUtils.getFullTableName(tsOp.getConf().getDatabaseName(),
+tsOp.getConf().getTableName());
+ValidWriteIdList writeIdList = 
txnWriteIds.getTableValidWriteIdList(fullTableName);
+if (tsOp.getConf().isTranscationalTable() && (writeIdList == null)) {
+  throw new IllegalStateException(String.format(
+  "ACID table: %s is missing from the ValidWriteIdList config: 
%s", fullTableName, txnWriteIds.toString()));
+}
+if (writeIdList != null) {
+  
driverContext.getPlan().getFetchTask().setValidWriteIdList(writeIdList.toString());
+}
+  }
+}
+  }
+
+  /**
+   * Checks whether txn list has been invalidated while planning the query.
+   * This would happen if query requires exclusive/semi-shared lock, and there 
has been a committed transaction
+   * on the table over which the lock is required.
+   */
+  boolean isValidTxnListState() throws LockException {
+// 1) Get valid txn list.
+String txnString = 
driverContext.getConf().get(ValidTxnList.VALID_TXNS_KEY);
+if (txnString == null) {
+

[jira] [Work logged] (HIVE-23789) Merge ValidTxnManager into DriverTxnHandler

2020-07-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23789?focusedWorklogId=453289&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-453289
 ]

ASF GitHub Bot logged work on HIVE-23789:
-

Author: ASF GitHub Bot
Created on: 01/Jul/20 08:07
Start Date: 01/Jul/20 08:07
Worklog Time Spent: 10m 
  Work Description: miklosgergely commented on a change in pull request 
#1194:
URL: https://github.com/apache/hive/pull/1194#discussion_r448190290



##
File path: ql/src/java/org/apache/hadoop/hive/ql/Compiler.java
##
@@ -188,7 +188,6 @@ private BaseSemanticAnalyzer analyze() throws Exception {
 // because at that point we need access to the objects.
 Hive.get().getMSC().flushCache();
 
-driverContext.setBackupContext(new Context(context));

Review comment:
   The usage of backupContext was removed by Peter Varga recently 
(https://github.com/apache/hive/commit/e2a02f1b43cba657d4d1c16ead091072be5fe834#diff-71a166c053d9c698f9cb64eaef832aff),
 I've asked him to confirm, and it was intentional. After this change we are 
only setting the backup context here, but it is never used.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 453289)
Time Spent: 0.5h  (was: 20m)

> Merge ValidTxnManager into DriverTxnHandler
> ---
>
> Key: HIVE-23789
> URL: https://issues.apache.org/jira/browse/HIVE-23789
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive
>Reporter: Miklos Gergely
>Assignee: Miklos Gergely
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-23789) Merge ValidTxnManager into DriverTxnHandler

2020-07-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23789?focusedWorklogId=453281&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-453281
 ]

ASF GitHub Bot logged work on HIVE-23789:
-

Author: ASF GitHub Bot
Created on: 01/Jul/20 07:54
Start Date: 01/Jul/20 07:54
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #1194:
URL: https://github.com/apache/hive/pull/1194#discussion_r448183060



##
File path: ql/src/java/org/apache/hadoop/hive/ql/DriverTxnHandler.java
##
@@ -288,15 +313,231 @@ private void acquireLocksInternal() throws 
CommandProcessorException, LockExcept
 }
   }
 
-  public void addHiveLocksFromContext() {
+  /**
+   *  Write the current set of valid write ids for the operated acid tables 
into the configuration so
+   *  that it can be read by the input format.
+   */
+  private ValidTxnWriteIdList recordValidWriteIds() throws LockException {
+String txnString = 
driverContext.getConf().get(ValidTxnList.VALID_TXNS_KEY);
+if (Strings.isNullOrEmpty(txnString)) {
+  throw new IllegalStateException("calling recordValidWritsIdss() without 
initializing ValidTxnList " +
+  
JavaUtils.txnIdToString(driverContext.getTxnManager().getCurrentTxnId()));
+}
+
+ValidTxnWriteIdList txnWriteIds = getTxnWriteIds(txnString);
+setValidWriteIds(txnWriteIds);
+
+LOG.debug("Encoding valid txn write ids info {} txnid: {}", 
txnWriteIds.toString(),
+driverContext.getTxnManager().getCurrentTxnId());
+return txnWriteIds;
+  }
+
+  private ValidTxnWriteIdList getTxnWriteIds(String txnString) throws 
LockException {
+List txnTables = getTransactionalTables(getTables(true, true));
+ValidTxnWriteIdList txnWriteIds = null;
+if (driverContext.getCompactionWriteIds() != null) {
+  // This is kludgy: here we need to read with Compactor's snapshot/txn 
rather than the snapshot of the current
+  // {@code txnMgr}, in effect simulating a "flashback query" but can't 
actually share compactor's txn since it
+  // would run multiple statements.  See more comments in {@link 
org.apache.hadoop.hive.ql.txn.compactor.Worker}
+  // where it start the compactor txn*/
+  if (txnTables.size() != 1) {
+throw new LockException("Unexpected tables in compaction: " + 
txnTables);
+  }
+  txnWriteIds = new ValidTxnWriteIdList(driverContext.getCompactorTxnId());
+  
txnWriteIds.addTableValidWriteIdList(driverContext.getCompactionWriteIds());
+} else {
+  txnWriteIds = driverContext.getTxnManager().getValidWriteIds(txnTables, 
txnString);
+}
+if (driverContext.getTxnType() == TxnType.READ_ONLY && !getTables(false, 
true).isEmpty()) {
+  throw new IllegalStateException(String.format(
+  "Inferred transaction type '%s' doesn't conform to the actual query 
string '%s'",
+  driverContext.getTxnType(), 
driverContext.getQueryState().getQueryString()));
+}
+return txnWriteIds;
+  }
+
+  private void setValidWriteIds(ValidTxnWriteIdList txnWriteIds) {
+driverContext.getConf().set(ValidTxnWriteIdList.VALID_TABLES_WRITEIDS_KEY, 
txnWriteIds.toString());
+if (driverContext.getPlan().getFetchTask() != null) {
+  // This is needed for {@link HiveConf.ConfVars.HIVEFETCHTASKCONVERSION} 
optimization which initializes JobConf
+  // in FetchOperator before recordValidTxns() but this has to be done 
after locks are acquired to avoid race
+  // conditions in ACID. This case is supported only for single source 
query.
+  Operator source = 
driverContext.getPlan().getFetchTask().getWork().getSource();
+  if (source instanceof TableScanOperator) {
+TableScanOperator tsOp = (TableScanOperator)source;
+String fullTableName = 
AcidUtils.getFullTableName(tsOp.getConf().getDatabaseName(),
+tsOp.getConf().getTableName());
+ValidWriteIdList writeIdList = 
txnWriteIds.getTableValidWriteIdList(fullTableName);
+if (tsOp.getConf().isTranscationalTable() && (writeIdList == null)) {
+  throw new IllegalStateException(String.format(
+  "ACID table: %s is missing from the ValidWriteIdList config: 
%s", fullTableName, txnWriteIds.toString()));
+}
+if (writeIdList != null) {
+  
driverContext.getPlan().getFetchTask().setValidWriteIdList(writeIdList.toString());
+}
+  }
+}
+  }
+
+  /**
+   * Checks whether txn list has been invalidated while planning the query.
+   * This would happen if query requires exclusive/semi-shared lock, and there 
has been a committed transaction
+   * on the table over which the lock is required.
+   */
+  boolean isValidTxnListState() throws LockException {
+// 1) Get valid txn list.
+String txnString = 
driverContext.getConf().get(ValidTxnList.VALID_TXNS_KEY);
+if (txnString == null) {
+  retu

[jira] [Updated] (HIVE-23789) Merge ValidTxnManager into DriverTxnHandler

2020-07-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-23789:
--
Labels: pull-request-available  (was: )

> Merge ValidTxnManager into DriverTxnHandler
> ---
>
> Key: HIVE-23789
> URL: https://issues.apache.org/jira/browse/HIVE-23789
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive
>Reporter: Miklos Gergely
>Assignee: Miklos Gergely
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-23789) Merge ValidTxnManager into DriverTxnHandler

2020-07-01 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23789?focusedWorklogId=453278&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-453278
 ]

ASF GitHub Bot logged work on HIVE-23789:
-

Author: ASF GitHub Bot
Created on: 01/Jul/20 07:51
Start Date: 01/Jul/20 07:51
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #1194:
URL: https://github.com/apache/hive/pull/1194#discussion_r448181689



##
File path: ql/src/java/org/apache/hadoop/hive/ql/Compiler.java
##
@@ -188,7 +188,6 @@ private BaseSemanticAnalyzer analyze() throws Exception {
 // because at that point we need access to the objects.
 Hive.get().getMSC().flushCache();
 
-driverContext.setBackupContext(new Context(context));

Review comment:
   Are we sure about this?
   We create a backup context so if we have to reexecute the query then we have 
a context at hand (for removing temporary files etc)





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 453278)
Remaining Estimate: 0h
Time Spent: 10m

> Merge ValidTxnManager into DriverTxnHandler
> ---
>
> Key: HIVE-23789
> URL: https://issues.apache.org/jira/browse/HIVE-23789
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive
>Reporter: Miklos Gergely
>Assignee: Miklos Gergely
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HIVE-23703) Major QB compaction with multiple FileSinkOperators results in data loss and one original file

2020-07-01 Thread Peter Vary (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-23703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17149194#comment-17149194
 ] 

Peter Vary commented on HIVE-23703:
---

[~pvargacl]: Could you please check with the flaky test jenkins: 
http://ci.hive.apache.org/job/hive-flaky-check/55/
If it is flaky, then we should disable it until it is fixed.

Thanks,
Peter

> Major QB compaction with multiple FileSinkOperators results in data loss and 
> one original file
> --
>
> Key: HIVE-23703
> URL: https://issues.apache.org/jira/browse/HIVE-23703
> Project: Hive
>  Issue Type: Bug
>Reporter: Karen Coppage
>Assignee: Karen Coppage
>Priority: Critical
>  Labels: compaction, pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> h4. Problems
> Example:
> {code:java}
> drop table if exists tbl2;
> create transactional table tbl2 (a int, b int) clustered by (a) into 4 
> buckets stored as ORC 
> TBLPROPERTIES('transactional'='true','transactional_properties'='default');
> insert into tbl2 values(1,2),(1,3),(1,4),(2,2),(2,3),(2,4);
> insert into tbl2 values(3,2),(3,3),(3,4),(4,2),(4,3),(4,4);
> insert into tbl2 values(5,2),(5,3),(5,4),(6,2),(6,3),(6,4);{code}
> E.g. in the example above, bucketId=0 when a=2 and a=6.
> 1. Data loss 
>  In non-acid tables, an operator's temp files are named with their task id. 
> Because of this snippet, temp files in the FileSinkOperator for compaction 
> tables are identified by their bucket_id.
> {code:java}
> if (conf.isCompactionTable()) {
>  fsp.initializeBucketPaths(filesIdx, AcidUtils.BUCKET_PREFIX + 
> String.format(AcidUtils.BUCKET_DIGITS, bucketId),
>  isNativeTable(), isSkewedStoredAsSubDirectories);
>  } else {
>  fsp.initializeBucketPaths(filesIdx, taskId, isNativeTable(), 
> isSkewedStoredAsSubDirectories);
>  }
> {code}
> So 2 temp files containing data with a=2 and a=6 will be named bucket_0 and 
> not 00_0 and 00_1 as they would normally.
>  In FileSinkOperator.commit, when data with a=2, filename: bucket_0 is moved 
> from _task_tmp.-ext-10002 to _tmp.-ext-10002, it overwrites the files already 
> there with a=6 data, because it too is named bucket_0. You can see in the 
> logs:
> {code:java}
>  WARN [LocalJobRunner Map Task Executor #0] exec.FileSinkOperator: Target 
> path 
> file:.../hive/ql/target/tmp/org.apache.hadoop.hive.ql.TestTxnNoBuckets-1591107230237/warehouse/testmajorcompaction/base_002_v013/.hive-staging_hive_2020-06-02_07-15-21_771_8551447285061957908-1/_tmp.-ext-10002/bucket_0
>  with a size 610 exists. Trying to delete it.
> {code}
> 2. Results in one original file
>  OrcFileMergeOperator merges the results of the FSOp into 1 file named 
> 00_0.
> h4. Fix
> 1. FSOp will store data as: taskid/bucketId. e.g. 0_0/bucket_0
> 2. OrcMergeFileOp, instead of merging a bunch of files into 1 file named 
> 00_0, will merge all files named bucket_0 into one file named bucket_0, 
> and so on.
> 3. MoveTask will get rid of the taskId directories if present and only move 
> the bucket files in them, in case OrcMergeFileOp is not run.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (HIVE-23718) Extract transaction handling from Driver

2020-07-01 Thread Miklos Gergely (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Miklos Gergely resolved HIVE-23718.
---
Resolution: Fixed

> Extract transaction handling from Driver
> 
>
> Key: HIVE-23718
> URL: https://issues.apache.org/jira/browse/HIVE-23718
> Project: Hive
>  Issue Type: Sub-task
>  Components: HiveServer2
>Reporter: Miklos Gergely
>Assignee: Miklos Gergely
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (HIVE-23789) Merge ValidTxnManager into DriverTxnHandler

2020-07-01 Thread Miklos Gergely (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Miklos Gergely reassigned HIVE-23789:
-


> Merge ValidTxnManager into DriverTxnHandler
> ---
>
> Key: HIVE-23789
> URL: https://issues.apache.org/jira/browse/HIVE-23789
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive
>Reporter: Miklos Gergely
>Assignee: Miklos Gergely
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HIVE-23703) Major QB compaction with multiple FileSinkOperators results in data loss and one original file

2020-07-01 Thread Peter Varga (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-23703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17149167#comment-17149167
 ] 

Peter Varga commented on HIVE-23703:


[~klcopp] looks like one of the new test is flaky:
[http://ci.hive.apache.org/blue/organizations/jenkins/hive-precommit/detail/PR-1143/6/tests]

> Major QB compaction with multiple FileSinkOperators results in data loss and 
> one original file
> --
>
> Key: HIVE-23703
> URL: https://issues.apache.org/jira/browse/HIVE-23703
> Project: Hive
>  Issue Type: Bug
>Reporter: Karen Coppage
>Assignee: Karen Coppage
>Priority: Critical
>  Labels: compaction, pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> h4. Problems
> Example:
> {code:java}
> drop table if exists tbl2;
> create transactional table tbl2 (a int, b int) clustered by (a) into 4 
> buckets stored as ORC 
> TBLPROPERTIES('transactional'='true','transactional_properties'='default');
> insert into tbl2 values(1,2),(1,3),(1,4),(2,2),(2,3),(2,4);
> insert into tbl2 values(3,2),(3,3),(3,4),(4,2),(4,3),(4,4);
> insert into tbl2 values(5,2),(5,3),(5,4),(6,2),(6,3),(6,4);{code}
> E.g. in the example above, bucketId=0 when a=2 and a=6.
> 1. Data loss 
>  In non-acid tables, an operator's temp files are named with their task id. 
> Because of this snippet, temp files in the FileSinkOperator for compaction 
> tables are identified by their bucket_id.
> {code:java}
> if (conf.isCompactionTable()) {
>  fsp.initializeBucketPaths(filesIdx, AcidUtils.BUCKET_PREFIX + 
> String.format(AcidUtils.BUCKET_DIGITS, bucketId),
>  isNativeTable(), isSkewedStoredAsSubDirectories);
>  } else {
>  fsp.initializeBucketPaths(filesIdx, taskId, isNativeTable(), 
> isSkewedStoredAsSubDirectories);
>  }
> {code}
> So 2 temp files containing data with a=2 and a=6 will be named bucket_0 and 
> not 00_0 and 00_1 as they would normally.
>  In FileSinkOperator.commit, when data with a=2, filename: bucket_0 is moved 
> from _task_tmp.-ext-10002 to _tmp.-ext-10002, it overwrites the files already 
> there with a=6 data, because it too is named bucket_0. You can see in the 
> logs:
> {code:java}
>  WARN [LocalJobRunner Map Task Executor #0] exec.FileSinkOperator: Target 
> path 
> file:.../hive/ql/target/tmp/org.apache.hadoop.hive.ql.TestTxnNoBuckets-1591107230237/warehouse/testmajorcompaction/base_002_v013/.hive-staging_hive_2020-06-02_07-15-21_771_8551447285061957908-1/_tmp.-ext-10002/bucket_0
>  with a size 610 exists. Trying to delete it.
> {code}
> 2. Results in one original file
>  OrcFileMergeOperator merges the results of the FSOp into 1 file named 
> 00_0.
> h4. Fix
> 1. FSOp will store data as: taskid/bucketId. e.g. 0_0/bucket_0
> 2. OrcMergeFileOp, instead of merging a bunch of files into 1 file named 
> 00_0, will merge all files named bucket_0 into one file named bucket_0, 
> and so on.
> 3. MoveTask will get rid of the taskId directories if present and only move 
> the bucket files in them, in case OrcMergeFileOp is not run.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

79 matches

Mail list logo