[jira] [Assigned] (HIVE-23721) MetaStoreDirectSql.ensureDbInit() need to optimize QuerySQL
[ https://issues.apache.org/jira/browse/HIVE-23721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhangbutao reassigned HIVE-23721: - Assignee: zhangbutao > MetaStoreDirectSql.ensureDbInit() need to optimize QuerySQL > --- > > Key: HIVE-23721 > URL: https://issues.apache.org/jira/browse/HIVE-23721 > Project: Hive > Issue Type: Bug >Affects Versions: 3.1.2 > Environment: Hadoop 3.1(1700+ nodes) > YARN 3.1 (with timelineserver enabled,https enabled) > Hive 3.1 (15 HS2 instance) > 6+ YARN Applications every day >Reporter: YulongZ >Assignee: zhangbutao >Priority: Critical > Attachments: HIVE-23721.01.patch > > > From Hive3.0,catalog added to hivemeta,many schema of metastore added column > “catName”,and index for table added column “catName”。 > In MetaStoreDirectSql.ensureDbInit() ,two queries below > “ > initQueries.add(pm.newQuery(MTableColumnStatistics.class, "dbName == > ''")); > initQueries.add(pm.newQuery(MPartitionColumnStatistics.class, "dbName > == ''")); > ” > should use "catName == ''" instead of "dbName == ''",because “catName” is the > first index column。 > When data of metastore become large,for example, table of > MPartitionColumnStatistics have millions of lines。The > “newQuery(MPartitionColumnStatistics.class, "dbName == ''")” for metastore > executed very slowly,and the query “show tables“ for hiveserver2 executed > very slowly too。 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-23721) MetaStoreDirectSql.ensureDbInit() need to optimize QuerySQL
[ https://issues.apache.org/jira/browse/HIVE-23721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhangbutao updated HIVE-23721: -- Attachment: HIVE-23721.01.patch Fix Version/s: 4.0.0 Status: Patch Available (was: Open) > MetaStoreDirectSql.ensureDbInit() need to optimize QuerySQL > --- > > Key: HIVE-23721 > URL: https://issues.apache.org/jira/browse/HIVE-23721 > Project: Hive > Issue Type: Bug >Affects Versions: 3.1.2 > Environment: Hadoop 3.1(1700+ nodes) > YARN 3.1 (with timelineserver enabled,https enabled) > Hive 3.1 (15 HS2 instance) > 6+ YARN Applications every day >Reporter: YulongZ >Assignee: zhangbutao >Priority: Critical > Fix For: 4.0.0 > > Attachments: HIVE-23721.01.patch > > > From Hive3.0,catalog added to hivemeta,many schema of metastore added column > “catName”,and index for table added column “catName”。 > In MetaStoreDirectSql.ensureDbInit() ,two queries below > “ > initQueries.add(pm.newQuery(MTableColumnStatistics.class, "dbName == > ''")); > initQueries.add(pm.newQuery(MPartitionColumnStatistics.class, "dbName > == ''")); > ” > should use "catName == ''" instead of "dbName == ''",because “catName” is the > first index column。 > When data of metastore become large,for example, table of > MPartitionColumnStatistics have millions of lines。The > “newQuery(MPartitionColumnStatistics.class, "dbName == ''")” for metastore > executed very slowly,and the query “show tables“ for hiveserver2 executed > very slowly too。 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (HIVE-23737) LLAP: Reuse dagDelete Feature Of Tez Custom Shuffle Handler Instead Of LLAP's dagDelete
[ https://issues.apache.org/jira/browse/HIVE-23737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17149896#comment-17149896 ] Syed Shameerur Rahman edited comment on HIVE-23737 at 7/2/20, 5:32 AM: --- So now whenever a new feature gets added in Tez, We can simply override that in LlapContainerLauncher.java and do the required changes in Shuffle Handler (LLAP) to support it. cc: [~ashutoshc] was (Author: srahman): So now whenever a new feature gets added in Tez, We can simply override that in LlapContainerLauncher.java and do the required changes in Shuffle Handler (LLAP) to support it. > LLAP: Reuse dagDelete Feature Of Tez Custom Shuffle Handler Instead Of LLAP's > dagDelete > --- > > Key: HIVE-23737 > URL: https://issues.apache.org/jira/browse/HIVE-23737 > Project: Hive > Issue Type: Improvement > Components: llap >Reporter: Syed Shameerur Rahman >Assignee: Syed Shameerur Rahman >Priority: Major > Labels: pull-request-available > Attachments: HIVE-23737.01.patch, HIVE-23737.02.patch > > Time Spent: 10m > Remaining Estimate: 0h > > LLAP have a dagDelete feature added as part of HIVE-9911, But now that Tez > have added support for dagDelete in custom shuffle handler (TEZ-3362) we > could re-use that feature in LLAP. > There are some added advantages of using Tez's dagDelete feature rather than > the current LLAP's dagDelete feature. > 1) We can easily extend this feature to accommodate the upcoming features > such as vertex and failed task attempt shuffle data clean up. Refer TEZ-3363 > and TEZ-4129 > 2) It will be more easier to maintain this feature by separating it out from > the Hive's code path. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-23737) LLAP: Reuse dagDelete Feature Of Tez Custom Shuffle Handler Instead Of LLAP's dagDelete
[ https://issues.apache.org/jira/browse/HIVE-23737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17149896#comment-17149896 ] Syed Shameerur Rahman commented on HIVE-23737: -- So now whenever a new feature gets added in Tez, We can simply override that in LlapContainerLauncher.java and do the required changes in Shuffle Handler (LLAP) to support it. > LLAP: Reuse dagDelete Feature Of Tez Custom Shuffle Handler Instead Of LLAP's > dagDelete > --- > > Key: HIVE-23737 > URL: https://issues.apache.org/jira/browse/HIVE-23737 > Project: Hive > Issue Type: Improvement > Components: llap >Reporter: Syed Shameerur Rahman >Assignee: Syed Shameerur Rahman >Priority: Major > Labels: pull-request-available > Attachments: HIVE-23737.01.patch, HIVE-23737.02.patch > > Time Spent: 10m > Remaining Estimate: 0h > > LLAP have a dagDelete feature added as part of HIVE-9911, But now that Tez > have added support for dagDelete in custom shuffle handler (TEZ-3362) we > could re-use that feature in LLAP. > There are some added advantages of using Tez's dagDelete feature rather than > the current LLAP's dagDelete feature. > 1) We can easily extend this feature to accommodate the upcoming features > such as vertex and failed task attempt shuffle data clean up. Refer TEZ-3363 > and TEZ-4129 > 2) It will be more easier to maintain this feature by separating it out from > the Hive's code path. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-13781) Tez Job failed with FileNotFoundException when partition dir doesnt exists
[ https://issues.apache.org/jira/browse/HIVE-13781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17149788#comment-17149788 ] Peter Vary commented on HIVE-13781: --- I would prefer to handle this on execution side. Having an extra check during compilation could cause unnecessary delays with normal queries scanning multiple S3 files. And this should be a rare edge case where something out of hive modified the underlying FS, and which can be repaired with a correct msck repair comand. As a user I would prefer to be notified that something went wrong and would not like the system swallow this error. Thanks, Peter > Tez Job failed with FileNotFoundException when partition dir doesnt exists > --- > > Key: HIVE-13781 > URL: https://issues.apache.org/jira/browse/HIVE-13781 > Project: Hive > Issue Type: Bug > Components: HiveServer2, Query Planning >Affects Versions: 0.14.0, 2.0.0, 3.1.1 >Reporter: Feng Yuan >Assignee: zhangbutao >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Attachments: HIVE-13781.1.patch > > Time Spent: 0.5h > Remaining Estimate: 0h > > when i have a partitioned table a with partition "day",in metadata a have > partition day: 20160501,20160502,but partition 20160501's dir didnt exits. > so when i use tez engine to run hive -e "select day,count(*) from a where > xx=xx group by day" > hive throws FileNotFoundException. > but mr work. > repo eg: > CREATE EXTERNAL TABLE `a`( > `a` string) > PARTITIONED BY ( > `l_date` string); > insert overwrite table a partition(l_date='2016-04-08') values (1),(2); > insert overwrite table a partition(l_date='2016-04-09') values (1),(2); > hadoop dfs -rm -r -f /warehouse/a/l_date=2016-04-09 > select l_date,count(*) from a where a='1' group by l_date; > error: > ut: a initializer failed, vertex=vertex_1463493135662_10445_1_00 [Map 1], > org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: > hdfs://bfdhadoopcool/warehouse/test.db/a/l_date=2015-04-09 > at > org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:285) > at > org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:228) > at > org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:313) > at > org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:300) > at > org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:402) > at > org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:129) > at > org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:245) > at > org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:239) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) > at > org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:239) > at > org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:226) > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23727) Improve SQLOperation log handling when cleanup
[ https://issues.apache.org/jira/browse/HIVE-23727?focusedWorklogId=453752&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-453752 ] ASF GitHub Bot logged work on HIVE-23727: - Author: ASF GitHub Bot Created on: 02/Jul/20 02:41 Start Date: 02/Jul/20 02:41 Worklog Time Spent: 10m Work Description: dengzhhu653 edited a comment on pull request #1149: URL: https://github.com/apache/hive/pull/1149#issuecomment-648507858 @belugabehr could you please take a look? thanks This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 453752) Time Spent: 1h (was: 50m) > Improve SQLOperation log handling when cleanup > -- > > Key: HIVE-23727 > URL: https://issues.apache.org/jira/browse/HIVE-23727 > Project: Hive > Issue Type: Improvement >Reporter: Zhihua Deng >Priority: Major > Labels: pull-request-available > Time Spent: 1h > Remaining Estimate: 0h > > The SQLOperation checks _if (shouldRunAsync() && state != > OperationState.CANCELED && state != OperationState.TIMEDOUT)_ to cancel the > background task. If true, the state should not be OperationState.CANCELED, so > logging under the state == OperationState.CANCELED should never happen. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-23797) Throwing exception when no metastore spec found in zookeeper
[ https://issues.apache.org/jira/browse/HIVE-23797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-23797: -- Labels: pull-request-available (was: ) > Throwing exception when no metastore spec found in zookeeper > > > Key: HIVE-23797 > URL: https://issues.apache.org/jira/browse/HIVE-23797 > Project: Hive > Issue Type: Bug >Reporter: Zhihua Deng >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > When enable service discovery for metastore, there is a chance that the > client may find no metastore uris available in zookeeper, such as during > metastores startup or the client wrongly configured the path. This results to > redundant retries and finally MetaException with "Unknown exception" message. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23797) Throwing exception when no metastore spec found in zookeeper
[ https://issues.apache.org/jira/browse/HIVE-23797?focusedWorklogId=453738&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-453738 ] ASF GitHub Bot logged work on HIVE-23797: - Author: ASF GitHub Bot Created on: 02/Jul/20 01:29 Start Date: 02/Jul/20 01:29 Worklog Time Spent: 10m Work Description: dengzhhu653 opened a new pull request #1201: URL: https://github.com/apache/hive/pull/1201 ## NOTICE Please create an issue in ASF JIRA before opening a pull request, and you need to set the title of the pull request which starts with the corresponding JIRA issue number. (e.g. HIVE-X: Fix a typo in YYY) For more details, please see https://cwiki.apache.org/confluence/display/Hive/HowToContribute This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 453738) Remaining Estimate: 0h Time Spent: 10m > Throwing exception when no metastore spec found in zookeeper > > > Key: HIVE-23797 > URL: https://issues.apache.org/jira/browse/HIVE-23797 > Project: Hive > Issue Type: Bug >Reporter: Zhihua Deng >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > When enable service discovery for metastore, there is a chance that the > client may find no metastore uris available in zookeeper, such as during > metastores startup or the client wrongly configured the path. This results to > redundant retries and finally MetaException with "Unknown exception" message. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-3236) allow column names to be prefixed by table alias in select all queries
[ https://issues.apache.org/jira/browse/HIVE-3236?focusedWorklogId=453731&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-453731 ] ASF GitHub Bot logged work on HIVE-3236: Author: ASF GitHub Bot Created on: 02/Jul/20 00:31 Start Date: 02/Jul/20 00:31 Worklog Time Spent: 10m Work Description: github-actions[bot] closed pull request #5: URL: https://github.com/apache/hive/pull/5 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 453731) Time Spent: 20m (was: 10m) > allow column names to be prefixed by table alias in select all queries > -- > > Key: HIVE-3236 > URL: https://issues.apache.org/jira/browse/HIVE-3236 > Project: Hive > Issue Type: Improvement > Components: Query Processor >Affects Versions: 0.9.1, 0.10.0 >Reporter: Keegan Mosley >Priority: Minor > Labels: pull-request-available > Attachments: HIVE-3236.1.patch.txt > > Time Spent: 20m > Remaining Estimate: 0h > > When using "CREATE TABLE x AS SELECT ..." where the select joins tables with > hundreds of columns it is not a simple task to resolve duplicate column name > exceptions (particularly with self-joins). The user must either manually > specify aliases for all duplicate columns (potentially hundreds) or write a > script to generate the data set in a separate select query, then create the > table and load the data in. > There should be some conf flag that would allow queries like > "create table joined as select one.\*, two.\* from mytable one join mytable > two on (one.duplicate_field = two.duplicate_field1);" > to create a table with columns one_duplicate_field and two_duplicate_field. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23347) MSCK REPAIR cannot discover partitions with upper case directory names.
[ https://issues.apache.org/jira/browse/HIVE-23347?focusedWorklogId=453730&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-453730 ] ASF GitHub Bot logged work on HIVE-23347: - Author: ASF GitHub Bot Created on: 02/Jul/20 00:31 Start Date: 02/Jul/20 00:31 Worklog Time Spent: 10m Work Description: github-actions[bot] commented on pull request #1003: URL: https://github.com/apache/hive/pull/1003#issuecomment-652711587 This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Feel free to reach out on the d...@hive.apache.org list if the patch is in need of reviews. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 453730) Time Spent: 20m (was: 10m) > MSCK REPAIR cannot discover partitions with upper case directory names. > --- > > Key: HIVE-23347 > URL: https://issues.apache.org/jira/browse/HIVE-23347 > Project: Hive > Issue Type: Bug > Components: Standalone Metastore >Affects Versions: 3.1.0 >Reporter: Sankar Hariappan >Assignee: Adesh Kumar Rao >Priority: Minor > Labels: pull-request-available > Fix For: 4.0.0 > > Attachments: HIVE-23347.01.patch, HIVE-23347.10.patch, > HIVE-23347.2.patch, HIVE-23347.3.patch, HIVE-23347.4.patch, > HIVE-23347.5.patch, HIVE-23347.6.patch, HIVE-23347.7.patch, > HIVE-23347.8.patch, HIVE-23347.9.patch > > Time Spent: 20m > Remaining Estimate: 0h > > For the following scenario, we expect MSCK REPAIR to discover partitions but > it couldn't. > 1. Have partitioned data path as follows. > hdfs://mycluster/datapath/t1/Year=2020/Month=03/Day=10 > hdfs://mycluster/datapath/t1/Year=2020/Month=03/Day=11 > 2. create external table t1 (key int, value string) partitioned by (Year int, > Month int, Day int) stored as orc location hdfs://mycluster/datapath/t1''; > 3. msck repair table t1; > 4. show partitions t1; --> Returns zero partitions > 5. select * from t1; --> Returns empty data. > When the partition directory names are changed to lower case, this works fine. > hdfs://mycluster/datapath/t1/year=2020/month=03/day=10 > hdfs://mycluster/datapath/t1/year=2020/month=03/day=11 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-5596) hive-default.xml.template is invalid
[ https://issues.apache.org/jira/browse/HIVE-5596?focusedWorklogId=453729&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-453729 ] ASF GitHub Bot logged work on HIVE-5596: Author: ASF GitHub Bot Created on: 02/Jul/20 00:30 Start Date: 02/Jul/20 00:30 Worklog Time Spent: 10m Work Description: github-actions[bot] closed pull request #12: URL: https://github.com/apache/hive/pull/12 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 453729) Remaining Estimate: 23h 40m (was: 23h 50m) Time Spent: 20m (was: 10m) > hive-default.xml.template is invalid > - > > Key: HIVE-5596 > URL: https://issues.apache.org/jira/browse/HIVE-5596 > Project: Hive > Issue Type: Bug > Components: Configuration >Affects Versions: 0.12.0 > Environment: OS: Oracle Linux 6 > JDK:1.6 > Hadoop: 2.2.0 >Reporter: Kevin Huang >Assignee: Kevin Huang >Priority: Critical > Labels: patch, pull-request-available > Fix For: 0.13.0 > > Attachments: HIVE-5596.patch > > Original Estimate: 24h > Time Spent: 20m > Remaining Estimate: 23h 40m > > Line 2000:16 in hive-default.xml.template is > auth > I think is invalid and it will lead Hive crash if you use this template. The > error message is as followed: > [Fatal Error] hive-site.xml:2000:16: The element type "value" must be > terminated by the matching end-tag "". -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-23363) Upgrade DataNucleus dependency to 5.2
[ https://issues.apache.org/jira/browse/HIVE-23363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17149685#comment-17149685 ] Ashutosh Chauhan commented on HIVE-23363: - +1 LGTM. > Upgrade DataNucleus dependency to 5.2 > - > > Key: HIVE-23363 > URL: https://issues.apache.org/jira/browse/HIVE-23363 > Project: Hive > Issue Type: Improvement >Affects Versions: 4.0.0 >Reporter: Zoltan Chovan >Assignee: Zoltan Chovan >Priority: Critical > Labels: pull-request-available > Attachments: HIVE-23363.2.patch, HIVE-23363.patch > > Time Spent: 1h 40m > Remaining Estimate: 0h > > Upgrade Datanucleus from 4.2 to 5.2 as based on it's docs 4.2 has been > retired: > [http://www.datanucleus.org/documentation/products.html] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-22934) Hive server interactive log counters to error stream
[ https://issues.apache.org/jira/browse/HIVE-22934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-22934: -- Labels: pull-request-available (was: ) > Hive server interactive log counters to error stream > > > Key: HIVE-22934 > URL: https://issues.apache.org/jira/browse/HIVE-22934 > Project: Hive > Issue Type: Bug >Reporter: Slim Bouguerra >Assignee: Antal Sinkovits >Priority: Major > Labels: pull-request-available > Attachments: HIVE-22934.01.patch, HIVE-22934.02.patch, > HIVE-22934.03.patch, HIVE-22934.04.patch, HIVE-22934.patch > > Time Spent: 10m > Remaining Estimate: 0h > > Hive server is logging the console output to system error stream. > This need to be fixed because > First we do not roll the file. > Second writing to such file is done sequential and can lead to throttle/poor > perf. > {code} > -rw-r--r-- 1 hive hadoop 9.5G Feb 26 17:22 hive-server2-interactive.err > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-22934) Hive server interactive log counters to error stream
[ https://issues.apache.org/jira/browse/HIVE-22934?focusedWorklogId=453618&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-453618 ] ASF GitHub Bot logged work on HIVE-22934: - Author: ASF GitHub Bot Created on: 01/Jul/20 19:27 Start Date: 01/Jul/20 19:27 Worklog Time Spent: 10m Work Description: ramesh0201 opened a new pull request #1200: URL: https://github.com/apache/hive/pull/1200 ## NOTICE Please create an issue in ASF JIRA before opening a pull request, and you need to set the title of the pull request which starts with the corresponding JIRA issue number. (e.g. HIVE-X: Fix a typo in YYY) For more details, please see https://cwiki.apache.org/confluence/display/Hive/HowToContribute This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 453618) Remaining Estimate: 0h Time Spent: 10m > Hive server interactive log counters to error stream > > > Key: HIVE-22934 > URL: https://issues.apache.org/jira/browse/HIVE-22934 > Project: Hive > Issue Type: Bug >Reporter: Slim Bouguerra >Assignee: Antal Sinkovits >Priority: Major > Attachments: HIVE-22934.01.patch, HIVE-22934.02.patch, > HIVE-22934.03.patch, HIVE-22934.04.patch, HIVE-22934.patch > > Time Spent: 10m > Remaining Estimate: 0h > > Hive server is logging the console output to system error stream. > This need to be fixed because > First we do not roll the file. > Second writing to such file is done sequential and can lead to throttle/poor > perf. > {code} > -rw-r--r-- 1 hive hadoop 9.5G Feb 26 17:22 hive-server2-interactive.err > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23363) Upgrade DataNucleus dependency to 5.2
[ https://issues.apache.org/jira/browse/HIVE-23363?focusedWorklogId=453612&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-453612 ] ASF GitHub Bot logged work on HIVE-23363: - Author: ASF GitHub Bot Created on: 01/Jul/20 19:14 Start Date: 01/Jul/20 19:14 Worklog Time Spent: 10m Work Description: belugabehr edited a comment on pull request #1118: URL: https://github.com/apache/hive/pull/1118#issuecomment-652546609 @ashutoshc Let me see if I can address all of your questions with some background and context. It took me a long time to get these changes to pass the unit tests. So, these mappings, in some respect, don't really matter. When HMS is started, users use the `schema-tool` to create the HMS schema for real. Some of these mappings in the `jdo` file (like indexes) are only applied when unit testing because the unit tests build the schema via DN and `datanucleus.schema.autoCreateAll`. For unit testing, the database backend is Apache Derby. I changed the name of the index to match the Derby schema more closely. In trying to debug these various errors, I was very confused at first about it complaining about "COLUMNS_PK". https://github.com/apache/hive/blob/4942a7c0b4be3a5b0c889a89b903e9a70c57d494/standalone-metastore/metastore-server/src/main/sql/derby/hive-schema-4.0.0.derby.sql#L364 With that said, when I upgraded to DN 5.x, the unit tests would not pass. I narrowed the issue down to this one table definition. I tried several iterations to get success, but this is the one that worked. I derived this solution by closely examining the docs on this topic. It has an example that very closely aligns with this use case: http://www.datanucleus.org/products/accessplatform/jdo/mapping.html#embedded_collection It is a bit of a wonder looking at the existing JDO definition how this ever worked. ``` ``` This is not correct, this should be a compound primary key of CD_ID *and* COLUMN_NAME. This exact scenario is covered in the second half of: http://www.datanucleus.org/products/accessplatform/jdo/mapping.html#embedded_collection In the official schema (hive-schema-4.0.0.derby.sql), the primary key is enforced by the `SQL110922153006740` index. As things currently stand, the COLUMN_NAME definition in the `jdo` file says that the COLUMN_NAME is not defined to be non-null. This caused an error with Derby as it didn't allow creating a PRIMARY KEY on a field that could be null. So, putting it all together, I came to the current solution. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 453612) Time Spent: 1h 40m (was: 1.5h) > Upgrade DataNucleus dependency to 5.2 > - > > Key: HIVE-23363 > URL: https://issues.apache.org/jira/browse/HIVE-23363 > Project: Hive > Issue Type: Improvement >Affects Versions: 4.0.0 >Reporter: Zoltan Chovan >Assignee: Zoltan Chovan >Priority: Critical > Labels: pull-request-available > Attachments: HIVE-23363.2.patch, HIVE-23363.patch > > Time Spent: 1h 40m > Remaining Estimate: 0h > > Upgrade Datanucleus from 4.2 to 5.2 as based on it's docs 4.2 has been > retired: > [http://www.datanucleus.org/documentation/products.html] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23363) Upgrade DataNucleus dependency to 5.2
[ https://issues.apache.org/jira/browse/HIVE-23363?focusedWorklogId=453611&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-453611 ] ASF GitHub Bot logged work on HIVE-23363: - Author: ASF GitHub Bot Created on: 01/Jul/20 19:11 Start Date: 01/Jul/20 19:11 Worklog Time Spent: 10m Work Description: belugabehr edited a comment on pull request #1118: URL: https://github.com/apache/hive/pull/1118#issuecomment-652546609 @ashutoshc Let me see if I can address all of your questions with some background and context. It took me a long time to get these changes to pass the unit tests. So, these mappings, in some respect, don't really matter. When HMS is started, users use the `schema-tool` to create the HMS schema for real. Some of these mappings in the `jdo` file (like indexes) are only applied when unit testing because the unit tests build the schema via DN and `datanucleus.schema.autoCreateAll`. For unit testing, the database backend is Apache Derby. I changed the name of the index to match the Derby schema more closely. In trying to debug these various errors, I was very confused at first about it complaining about "COLUMNS_PK". https://github.com/apache/hive/blob/4942a7c0b4be3a5b0c889a89b903e9a70c57d494/standalone-metastore/metastore-server/src/main/sql/derby/hive-schema-4.0.0.derby.sql#L364 With that said, when I upgraded to DN 5.x, the unit tests would not pass. I narrowed the issue down to this one table definition. I tried several iterations to get success, but this is the one that worked. I derived this solution by closely examining the docs on this topic. It has an example that very closely aligns with this use case: http://www.datanucleus.org/products/accessplatform/jdo/mapping.html#embedded_collection It is a bit of a wonder looking at the existing JDO definition how this ever worked. ``` ``` This is not correct, this should be a compound primary key of CD_ID *and* COLUMN_NAME. This is enforced by `SQL110922153006740` in the full schema. As things currently stand, the COLUMN_NAME definition in the `jdo` file says that the COLUMN_NAME is not defined to be non-null. This caused an error with Derby as it didn't allow creating a PRIMARY KEY on a field that could be null. So, putting it all together, I came to the current solution. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 453611) Time Spent: 1.5h (was: 1h 20m) > Upgrade DataNucleus dependency to 5.2 > - > > Key: HIVE-23363 > URL: https://issues.apache.org/jira/browse/HIVE-23363 > Project: Hive > Issue Type: Improvement >Affects Versions: 4.0.0 >Reporter: Zoltan Chovan >Assignee: Zoltan Chovan >Priority: Critical > Labels: pull-request-available > Attachments: HIVE-23363.2.patch, HIVE-23363.patch > > Time Spent: 1.5h > Remaining Estimate: 0h > > Upgrade Datanucleus from 4.2 to 5.2 as based on it's docs 4.2 has been > retired: > [http://www.datanucleus.org/documentation/products.html] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23363) Upgrade DataNucleus dependency to 5.2
[ https://issues.apache.org/jira/browse/HIVE-23363?focusedWorklogId=453610&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-453610 ] ASF GitHub Bot logged work on HIVE-23363: - Author: ASF GitHub Bot Created on: 01/Jul/20 19:10 Start Date: 01/Jul/20 19:10 Worklog Time Spent: 10m Work Description: belugabehr commented on a change in pull request #1118: URL: https://github.com/apache/hive/pull/1118#discussion_r448563837 ## File path: standalone-metastore/metastore-server/src/main/resources/package.jdo ## @@ -345,20 +345,20 @@ - Review comment: Just following the directions here: http://www.datanucleus.org/products/accessplatform/jdo/mapping.html#embedded_collection This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 453610) Time Spent: 1h 20m (was: 1h 10m) > Upgrade DataNucleus dependency to 5.2 > - > > Key: HIVE-23363 > URL: https://issues.apache.org/jira/browse/HIVE-23363 > Project: Hive > Issue Type: Improvement >Affects Versions: 4.0.0 >Reporter: Zoltan Chovan >Assignee: Zoltan Chovan >Priority: Critical > Labels: pull-request-available > Attachments: HIVE-23363.2.patch, HIVE-23363.patch > > Time Spent: 1h 20m > Remaining Estimate: 0h > > Upgrade Datanucleus from 4.2 to 5.2 as based on it's docs 4.2 has been > retired: > [http://www.datanucleus.org/documentation/products.html] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23363) Upgrade DataNucleus dependency to 5.2
[ https://issues.apache.org/jira/browse/HIVE-23363?focusedWorklogId=453602&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-453602 ] ASF GitHub Bot logged work on HIVE-23363: - Author: ASF GitHub Bot Created on: 01/Jul/20 19:04 Start Date: 01/Jul/20 19:04 Worklog Time Spent: 10m Work Description: belugabehr commented on a change in pull request #1118: URL: https://github.com/apache/hive/pull/1118#discussion_r448560561 ## File path: standalone-metastore/metastore-server/src/main/resources/package.jdo ## @@ -345,20 +345,20 @@ - Review comment: Changed to: ``` ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 453602) Time Spent: 1h 10m (was: 1h) > Upgrade DataNucleus dependency to 5.2 > - > > Key: HIVE-23363 > URL: https://issues.apache.org/jira/browse/HIVE-23363 > Project: Hive > Issue Type: Improvement >Affects Versions: 4.0.0 >Reporter: Zoltan Chovan >Assignee: Zoltan Chovan >Priority: Critical > Labels: pull-request-available > Attachments: HIVE-23363.2.patch, HIVE-23363.patch > > Time Spent: 1h 10m > Remaining Estimate: 0h > > Upgrade Datanucleus from 4.2 to 5.2 as based on it's docs 4.2 has been > retired: > [http://www.datanucleus.org/documentation/products.html] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23363) Upgrade DataNucleus dependency to 5.2
[ https://issues.apache.org/jira/browse/HIVE-23363?focusedWorklogId=453599&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-453599 ] ASF GitHub Bot logged work on HIVE-23363: - Author: ASF GitHub Bot Created on: 01/Jul/20 18:59 Start Date: 01/Jul/20 18:59 Worklog Time Spent: 10m Work Description: belugabehr commented on a change in pull request #1118: URL: https://github.com/apache/hive/pull/1118#discussion_r448558398 ## File path: standalone-metastore/metastore-server/src/main/resources/package.jdo ## @@ -345,20 +345,20 @@ - + - Review comment: ``` If a foreign-key is specified (in MetaData) for the relation field then leave any deletion to the datastore to perform ``` I don't see any such relationship cascading relationship defined in the schema for Derby or MySQL, so DN should be doing it. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 453599) Time Spent: 1h (was: 50m) > Upgrade DataNucleus dependency to 5.2 > - > > Key: HIVE-23363 > URL: https://issues.apache.org/jira/browse/HIVE-23363 > Project: Hive > Issue Type: Improvement >Affects Versions: 4.0.0 >Reporter: Zoltan Chovan >Assignee: Zoltan Chovan >Priority: Critical > Labels: pull-request-available > Attachments: HIVE-23363.2.patch, HIVE-23363.patch > > Time Spent: 1h > Remaining Estimate: 0h > > Upgrade Datanucleus from 4.2 to 5.2 as based on it's docs 4.2 has been > retired: > [http://www.datanucleus.org/documentation/products.html] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23791) Optimize ACID stats generation
[ https://issues.apache.org/jira/browse/HIVE-23791?focusedWorklogId=453565&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-453565 ] ASF GitHub Bot logged work on HIVE-23791: - Author: ASF GitHub Bot Created on: 01/Jul/20 17:56 Start Date: 01/Jul/20 17:56 Worklog Time Spent: 10m Work Description: pvary commented on a change in pull request #1196: URL: https://github.com/apache/hive/pull/1196#discussion_r448527274 ## File path: ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java ## @@ -2614,28 +2633,25 @@ public static Path getVersionFilePath(Path deltaOrBase) { + " from " + jc.get(ValidTxnWriteIdList.VALID_TABLES_WRITEIDS_KEY)); return null; } -Directory acidInfo = AcidUtils.getAcidState(fs, dir, jc, idList, null, false); +if (fs == null) { + fs = dir.getFileSystem(jc); +} +// Collect the all of the files/dirs +Map hdfsDirSnapshots = AcidUtils.getHdfsDirSnapshots(fs, dir); Review comment: Ohh.. I think I get it now. * You are right that this will do stuff which is not really needed in this case - namely creating objects which are not needed at here (dirSnapshot.metaDataFile/dirSnapshot.acidFormatFile), also we might list and create objects which are not needed in this snapshot. On the other hand the costly part on S3 (and on HDFS as well) is the number of remote calls, which is reduced to a single listing instead of doing the listing for every directory 1-by-1. * It is not possible that it does not scan some location which needed. If this happens then this is a bug in AcidUtils.getAcidState, as it has to return every directory which is readable What I do not understand in your comment is "this method would return a something (it could still be a map) which could fill in stuff from hdfs if its not cached already" - the main thing we would like to avoid here is the need of reading the HDFS again and again. The only way to realize that something is missing is reading the directory again... or I miss something :) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 453565) Time Spent: 40m (was: 0.5h) > Optimize ACID stats generation > -- > > Key: HIVE-23791 > URL: https://issues.apache.org/jira/browse/HIVE-23791 > Project: Hive > Issue Type: Improvement > Components: Statistics, Transactions >Reporter: Peter Vary >Assignee: Peter Vary >Priority: Major > Labels: pull-request-available > Time Spent: 40m > Remaining Estimate: 0h > > Currently basic stats generation uses file listing for getting statistics, > and also uses a file listing for getting the acid state. We should optimize > this. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-23794) HiveConnection.rollback always throws a "Method not supported" exception
[ https://issues.apache.org/jira/browse/HIVE-23794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Amol Dixit reassigned HIVE-23794: - Assignee: Amol Dixit > HiveConnection.rollback always throws a "Method not supported" exception > > > Key: HIVE-23794 > URL: https://issues.apache.org/jira/browse/HIVE-23794 > Project: Hive > Issue Type: Bug >Reporter: Amol Dixit >Assignee: Amol Dixit >Priority: Major > > HiveConnection.rollback automatically generated implementation always throws > a generic "Method not supported" exception and thus is not compliant with the > JDBC spec. For HiveConnection autoCommit mode is always on and this > connection do not allow to set the autoCommit mode to false. If setAutoCommit > is called and the auto-commit mode is not changed, the call is a no-op. > Per JDBC spec, an exception can be thrown only if the connection is closed, > DB access error occurs or the method is called during a transaction (which is > not a case for HiveConnection). > JDBC spec does not say a word about not supporting the method by the driver. > The most correct behavior could be to throw only if the request tries to > explicitly call rollback (as HiveConnection.getAutoCommit always returns true > and setAutoCommit call is no-op). > This issue is a blocker for JDBC connection pools (i.e. HikariCP) that expect > JDBC-compliant behavior from the driver. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23363) Upgrade DataNucleus dependency to 5.2
[ https://issues.apache.org/jira/browse/HIVE-23363?focusedWorklogId=453544&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-453544 ] ASF GitHub Bot logged work on HIVE-23363: - Author: ASF GitHub Bot Created on: 01/Jul/20 17:18 Start Date: 01/Jul/20 17:18 Worklog Time Spent: 10m Work Description: belugabehr commented on pull request #1118: URL: https://github.com/apache/hive/pull/1118#issuecomment-652546609 @ashutoshc Let me see if I can address all of your questions with some background and context. It took me a long time to get these changes to pass the unit tests. So, these mappings, in some respect, don't really matter. When HMS is started, users use the `schema-tool` to create the HMS schema for real. Some of these mappings in the `jdo` file (like indexes) are only applied when unit testing because the unit tests build the schema via DN and `datanucleus.schema.autoCreateAll`. For unit testing, the database backend is Apache Derby. I changed the name of the index to match the Derby schema more closely. In trying to debug these various errors, I was very confused at first about it complaining about "COLUMNS_PK". https://github.com/apache/hive/blob/4942a7c0b4be3a5b0c889a89b903e9a70c57d494/standalone-metastore/metastore-server/src/main/sql/derby/hive-schema-4.0.0.derby.sql#L364 With that said, when I upgraded to DN 5.x, the unit tests would not pass. I narrowed the issue down to this one table definition. I tried several iterations to get success, but this is the one that worked. I derived this solution by closely examining the docs on this topic. It has an example that very closely aligns with this use case: http://www.datanucleus.org/products/accessplatform/jpa/mapping.html#embedded_collection It is a bit of a wonder looking at the existing JDO definition how this ever worked. ``` ``` This is not correct, this should be a compound primary key of CD_ID *and* COLUMN_NAME. This is enforced by `SQL110922153006740` in the full schema. As things currently stand, the COLUMN_NAME definition in the `jdo` file says that the COLUMN_NAME is not defined to be non-null. This caused an error with Derby as it didn't allow creating a PRIMARY KEY on a field that could be null. So, putting it all together, I came to the current solution. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 453544) Time Spent: 50m (was: 40m) > Upgrade DataNucleus dependency to 5.2 > - > > Key: HIVE-23363 > URL: https://issues.apache.org/jira/browse/HIVE-23363 > Project: Hive > Issue Type: Improvement >Affects Versions: 4.0.0 >Reporter: Zoltan Chovan >Assignee: Zoltan Chovan >Priority: Critical > Labels: pull-request-available > Attachments: HIVE-23363.2.patch, HIVE-23363.patch > > Time Spent: 50m > Remaining Estimate: 0h > > Upgrade Datanucleus from 4.2 to 5.2 as based on it's docs 4.2 has been > retired: > [http://www.datanucleus.org/documentation/products.html] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-23388) CTAS queries should use target's location for staging.
[ https://issues.apache.org/jira/browse/HIVE-23388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17149567#comment-17149567 ] Naveen Gangam commented on HIVE-23388: -- [~samuelan] [~jcamachorodriguez] Could you please review? Thank you > CTAS queries should use target's location for staging. > -- > > Key: HIVE-23388 > URL: https://issues.apache.org/jira/browse/HIVE-23388 > Project: Hive > Issue Type: Sub-task > Components: HiveServer2 >Affects Versions: 4.0.0 >Reporter: Naveen Gangam >Assignee: Naveen Gangam >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > In cloud based storage systems, renaming files across different root level > buckets seem to be disallowed. The S3AFileSystem throws the following > exception. This appears to be bug in S3FS impl. > Failed with exception Wrong FS > s3a://hive-managed/clusters/env-x/warehouse--/warehouse/tablespace/managed/hive/tpch.db/customer/delta_001_001_ > -expected s3a://hive-external > 2020-04-27T19:34:27,573 INFO [Thread-6] jdbc.TestDriver: > java.lang.IllegalArgumentException: Wrong FS > s3a://hive-managed//clusters/env-/warehouse--/warehouse/tablespace/managed/hive/tpch.db/customer/delta_001_001_ > -expected s3a://hive-external > But we should fix our query plans to use the target table's directory for > staging as well. That should resolve this issue and it is the right thing to > do as well (in case there are different encryption zones/keys for these > buckets). > Fix in HIVE-22995 probably changed this behavior. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HIVE-23722) Show the operation's drilldown link to client
[ https://issues.apache.org/jira/browse/HIVE-23722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltan Haindrich resolved HIVE-23722. - Assignee: Zhihua Deng Resolution: Fixed pushed to master. Thank you [~dengzh]! > Show the operation's drilldown link to client > - > > Key: HIVE-23722 > URL: https://issues.apache.org/jira/browse/HIVE-23722 > Project: Hive > Issue Type: Improvement >Reporter: Zhihua Deng >Assignee: Zhihua Deng >Priority: Major > Labels: pull-request-available > Time Spent: 1h 40m > Remaining Estimate: 0h > > Now the HiveServer2 webui provides a drilldown link for many collected > metrics or messages about a operation, but it's not easy for a end user to > find the target url of his submitted query. Less knowledge on the deployment, > HA based environment, and the multiple running queries can make things more > difficult. The jira provides a way to show the link to the interested end > user when enabled. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23722) Show the operation's drilldown link to client
[ https://issues.apache.org/jira/browse/HIVE-23722?focusedWorklogId=453518&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-453518 ] ASF GitHub Bot logged work on HIVE-23722: - Author: ASF GitHub Bot Created on: 01/Jul/20 16:42 Start Date: 01/Jul/20 16:42 Worklog Time Spent: 10m Work Description: kgyrtkirk merged pull request #1145: URL: https://github.com/apache/hive/pull/1145 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 453518) Time Spent: 1h 40m (was: 1.5h) > Show the operation's drilldown link to client > - > > Key: HIVE-23722 > URL: https://issues.apache.org/jira/browse/HIVE-23722 > Project: Hive > Issue Type: Improvement >Reporter: Zhihua Deng >Priority: Major > Labels: pull-request-available > Time Spent: 1h 40m > Remaining Estimate: 0h > > Now the HiveServer2 webui provides a drilldown link for many collected > metrics or messages about a operation, but it's not easy for a end user to > find the target url of his submitted query. Less knowledge on the deployment, > HA based environment, and the multiple running queries can make things more > difficult. The jira provides a way to show the link to the interested end > user when enabled. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-23751) QTest: Override #mkdirs() method in ProxyFileSystem To Align After HADOOP-16582
[ https://issues.apache.org/jira/browse/HIVE-23751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltan Haindrich updated HIVE-23751: Resolution: Fixed Status: Resolved (was: Patch Available) pushed to master. Thank you [~srahman]! > QTest: Override #mkdirs() method in ProxyFileSystem To Align After > HADOOP-16582 > --- > > Key: HIVE-23751 > URL: https://issues.apache.org/jira/browse/HIVE-23751 > Project: Hive > Issue Type: Task >Reporter: Syed Shameerur Rahman >Assignee: Syed Shameerur Rahman >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0, 3.2.0 > > Attachments: HIVE-23751.01.patch > > Time Spent: 40m > Remaining Estimate: 0h > > HADOOP-16582 have changed the way how mkdirs() work: > *Before HADOOP-16582:* > All calls to mkdirs(p) were fast-tracked to FileSystem.mkdirs which were then > re-routed to mkdirs(p, permission) method. For ProxyFileSytem the call would > look like > {code:java} > FileUtiles.mkdir(p) -> FileSystem.mkdirs(p) ---> > ProxyFileSytem.mkdirs(p,permission) > {code} > An implementation of FileSystem have only needed implement mkdirs(p, > permission) > *After HADOOP-16582:* > Since FilterFileSystem overrides mkdirs(p) method the new call to > ProxyFileSystem would look like > {code:java} > FileUtiles.mkdir(p) ---> FilterFileSystem.mkdirs(p) --> > {code} > This will make all the qtests fails with the below exception > {code:java} > Caused by: java.lang.IllegalArgumentException: Wrong FS: > pfile:/media/ebs1/workspace/hive-3.1-qtest/group/5/label/HiveQTest/hive-1.2.0/itests/qtest/target/warehouse/dest1, > expected: file:/// > {code} > Note: We will hit this issue when we bump up hadoop version in hive. > So as per the discussion in HADOOP-16963 ProxyFileSystem would need to > override the mkdirs(p) method inorder to solve the above problem. So now the > new flow would look like > {code:java} > FileUtiles.mkdir(p) > ProxyFileSytem.mkdirs(p) ---> > ProxyFileSytem.mkdirs(p, permission) ---> > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23751) QTest: Override #mkdirs() method in ProxyFileSystem To Align After HADOOP-16582
[ https://issues.apache.org/jira/browse/HIVE-23751?focusedWorklogId=453517&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-453517 ] ASF GitHub Bot logged work on HIVE-23751: - Author: ASF GitHub Bot Created on: 01/Jul/20 16:40 Start Date: 01/Jul/20 16:40 Worklog Time Spent: 10m Work Description: kgyrtkirk merged pull request #1167: URL: https://github.com/apache/hive/pull/1167 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 453517) Time Spent: 40m (was: 0.5h) > QTest: Override #mkdirs() method in ProxyFileSystem To Align After > HADOOP-16582 > --- > > Key: HIVE-23751 > URL: https://issues.apache.org/jira/browse/HIVE-23751 > Project: Hive > Issue Type: Task >Reporter: Syed Shameerur Rahman >Assignee: Syed Shameerur Rahman >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0, 3.2.0 > > Attachments: HIVE-23751.01.patch > > Time Spent: 40m > Remaining Estimate: 0h > > HADOOP-16582 have changed the way how mkdirs() work: > *Before HADOOP-16582:* > All calls to mkdirs(p) were fast-tracked to FileSystem.mkdirs which were then > re-routed to mkdirs(p, permission) method. For ProxyFileSytem the call would > look like > {code:java} > FileUtiles.mkdir(p) -> FileSystem.mkdirs(p) ---> > ProxyFileSytem.mkdirs(p,permission) > {code} > An implementation of FileSystem have only needed implement mkdirs(p, > permission) > *After HADOOP-16582:* > Since FilterFileSystem overrides mkdirs(p) method the new call to > ProxyFileSystem would look like > {code:java} > FileUtiles.mkdir(p) ---> FilterFileSystem.mkdirs(p) --> > {code} > This will make all the qtests fails with the below exception > {code:java} > Caused by: java.lang.IllegalArgumentException: Wrong FS: > pfile:/media/ebs1/workspace/hive-3.1-qtest/group/5/label/HiveQTest/hive-1.2.0/itests/qtest/target/warehouse/dest1, > expected: file:/// > {code} > Note: We will hit this issue when we bump up hadoop version in hive. > So as per the discussion in HADOOP-16963 ProxyFileSystem would need to > override the mkdirs(p) method inorder to solve the above problem. So now the > new flow would look like > {code:java} > FileUtiles.mkdir(p) > ProxyFileSytem.mkdirs(p) ---> > ProxyFileSytem.mkdirs(p, permission) ---> > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23791) Optimize ACID stats generation
[ https://issues.apache.org/jira/browse/HIVE-23791?focusedWorklogId=453516&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-453516 ] ASF GitHub Bot logged work on HIVE-23791: - Author: ASF GitHub Bot Created on: 01/Jul/20 16:36 Start Date: 01/Jul/20 16:36 Worklog Time Spent: 10m Work Description: kgyrtkirk commented on a change in pull request #1196: URL: https://github.com/apache/hive/pull/1196#discussion_r448482057 ## File path: ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java ## @@ -2614,28 +2633,25 @@ public static Path getVersionFilePath(Path deltaOrBase) { + " from " + jc.get(ValidTxnWriteIdList.VALID_TABLES_WRITEIDS_KEY)); return null; } -Directory acidInfo = AcidUtils.getAcidState(fs, dir, jc, idList, null, false); +if (fs == null) { + fs = dir.getFileSystem(jc); +} +// Collect the all of the files/dirs +Map hdfsDirSnapshots = AcidUtils.getHdfsDirSnapshots(fs, dir); Review comment: this might be out-of-scope for this change: but this *static* method in `AcidUtils` is trying to do all the work upfront... which might lead to: * that it does work which is not even needed * it doesn't scan some location - and the map just returns null ; so it might be not noticable I think it would be better if this method would return a something (it could still be a map) which could fill in stuff from hdfs if its not cached already... This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 453516) Time Spent: 0.5h (was: 20m) > Optimize ACID stats generation > -- > > Key: HIVE-23791 > URL: https://issues.apache.org/jira/browse/HIVE-23791 > Project: Hive > Issue Type: Improvement > Components: Statistics, Transactions >Reporter: Peter Vary >Assignee: Peter Vary >Priority: Major > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > Currently basic stats generation uses file listing for getting statistics, > and also uses a file listing for getting the acid state. We should optimize > this. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23751) QTest: Override #mkdirs() method in ProxyFileSystem To Align After HADOOP-16582
[ https://issues.apache.org/jira/browse/HIVE-23751?focusedWorklogId=453514&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-453514 ] ASF GitHub Bot logged work on HIVE-23751: - Author: ASF GitHub Bot Created on: 01/Jul/20 16:35 Start Date: 01/Jul/20 16:35 Worklog Time Spent: 10m Work Description: shameersss1 commented on pull request #1167: URL: https://github.com/apache/hive/pull/1167#issuecomment-652524782 @kgyrtkirk I am Okay with "@users.noreply.github.com" Please continue the merge! Thank You! for the review. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 453514) Time Spent: 0.5h (was: 20m) > QTest: Override #mkdirs() method in ProxyFileSystem To Align After > HADOOP-16582 > --- > > Key: HIVE-23751 > URL: https://issues.apache.org/jira/browse/HIVE-23751 > Project: Hive > Issue Type: Task >Reporter: Syed Shameerur Rahman >Assignee: Syed Shameerur Rahman >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0, 3.2.0 > > Attachments: HIVE-23751.01.patch > > Time Spent: 0.5h > Remaining Estimate: 0h > > HADOOP-16582 have changed the way how mkdirs() work: > *Before HADOOP-16582:* > All calls to mkdirs(p) were fast-tracked to FileSystem.mkdirs which were then > re-routed to mkdirs(p, permission) method. For ProxyFileSytem the call would > look like > {code:java} > FileUtiles.mkdir(p) -> FileSystem.mkdirs(p) ---> > ProxyFileSytem.mkdirs(p,permission) > {code} > An implementation of FileSystem have only needed implement mkdirs(p, > permission) > *After HADOOP-16582:* > Since FilterFileSystem overrides mkdirs(p) method the new call to > ProxyFileSystem would look like > {code:java} > FileUtiles.mkdir(p) ---> FilterFileSystem.mkdirs(p) --> > {code} > This will make all the qtests fails with the below exception > {code:java} > Caused by: java.lang.IllegalArgumentException: Wrong FS: > pfile:/media/ebs1/workspace/hive-3.1-qtest/group/5/label/HiveQTest/hive-1.2.0/itests/qtest/target/warehouse/dest1, > expected: file:/// > {code} > Note: We will hit this issue when we bump up hadoop version in hive. > So as per the discussion in HADOOP-16963 ProxyFileSystem would need to > override the mkdirs(p) method inorder to solve the above problem. So now the > new flow would look like > {code:java} > FileUtiles.mkdir(p) > ProxyFileSytem.mkdirs(p) ---> > ProxyFileSytem.mkdirs(p, permission) ---> > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23363) Upgrade DataNucleus dependency to 5.2
[ https://issues.apache.org/jira/browse/HIVE-23363?focusedWorklogId=453508&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-453508 ] ASF GitHub Bot logged work on HIVE-23363: - Author: ASF GitHub Bot Created on: 01/Jul/20 16:31 Start Date: 01/Jul/20 16:31 Worklog Time Spent: 10m Work Description: ashutoshc commented on a change in pull request #1118: URL: https://github.com/apache/hive/pull/1118#discussion_r448480980 ## File path: standalone-metastore/metastore-server/src/main/resources/package.jdo ## @@ -345,20 +345,20 @@ - Review comment: Can you describe the need for this change? ## File path: standalone-metastore/metastore-server/src/main/resources/package.jdo ## @@ -345,20 +345,20 @@ - + - - Review comment: Any reason to change the name here? ## File path: standalone-metastore/metastore-server/src/main/resources/package.jdo ## @@ -345,20 +345,20 @@ - + - Review comment: We do want cascade-delete here. Any reason to remove it? ## File path: standalone-metastore/metastore-server/src/main/resources/package.jdo ## @@ -345,20 +345,20 @@ - + - - + + Review comment: This probably is fine to do. Though, was it necessary? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 453508) Time Spent: 40m (was: 0.5h) > Upgrade DataNucleus dependency to 5.2 > - > > Key: HIVE-23363 > URL: https://issues.apache.org/jira/browse/HIVE-23363 > Project: Hive > Issue Type: Improvement >Affects Versions: 4.0.0 >Reporter: Zoltan Chovan >Assignee: Zoltan Chovan >Priority: Critical > Labels: pull-request-available > Attachments: HIVE-23363.2.patch, HIVE-23363.patch > > Time Spent: 40m > Remaining Estimate: 0h > > Upgrade Datanucleus from 4.2 to 5.2 as based on it's docs 4.2 has been > retired: > [http://www.datanucleus.org/documentation/products.html] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-23795) Add Additional Debugging Help for Import SQL
[ https://issues.apache.org/jira/browse/HIVE-23795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-23795: -- Labels: pull-request-available (was: ) > Add Additional Debugging Help for Import SQL > > > Key: HIVE-23795 > URL: https://issues.apache.org/jira/browse/HIVE-23795 > Project: Hive > Issue Type: Improvement >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Minor > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Add some things that were helpful to me when I was recently debugging an > issue with importing SQL. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23795) Add Additional Debugging Help for Import SQL
[ https://issues.apache.org/jira/browse/HIVE-23795?focusedWorklogId=453504&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-453504 ] ASF GitHub Bot logged work on HIVE-23795: - Author: ASF GitHub Bot Created on: 01/Jul/20 16:26 Start Date: 01/Jul/20 16:26 Worklog Time Spent: 10m Work Description: belugabehr opened a new pull request #1199: URL: https://github.com/apache/hive/pull/1199 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 453504) Remaining Estimate: 0h Time Spent: 10m > Add Additional Debugging Help for Import SQL > > > Key: HIVE-23795 > URL: https://issues.apache.org/jira/browse/HIVE-23795 > Project: Hive > Issue Type: Improvement >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Minor > Time Spent: 10m > Remaining Estimate: 0h > > Add some things that were helpful to me when I was recently debugging an > issue with importing SQL. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-23795) Add Additional Debugging Help for Import SQL
[ https://issues.apache.org/jira/browse/HIVE-23795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Mollitor updated HIVE-23795: -- Description: Add some things that were helpful to me when I was recently debugging an issue with importing SQL. > Add Additional Debugging Help for Import SQL > > > Key: HIVE-23795 > URL: https://issues.apache.org/jira/browse/HIVE-23795 > Project: Hive > Issue Type: Improvement >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Minor > > Add some things that were helpful to me when I was recently debugging an > issue with importing SQL. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-23795) Add Additional Debugging Help for Import SQL
[ https://issues.apache.org/jira/browse/HIVE-23795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Mollitor reassigned HIVE-23795: - > Add Additional Debugging Help for Import SQL > > > Key: HIVE-23795 > URL: https://issues.apache.org/jira/browse/HIVE-23795 > Project: Hive > Issue Type: Improvement >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Minor > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23791) Optimize ACID stats generation
[ https://issues.apache.org/jira/browse/HIVE-23791?focusedWorklogId=453499&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-453499 ] ASF GitHub Bot logged work on HIVE-23791: - Author: ASF GitHub Bot Created on: 01/Jul/20 16:19 Start Date: 01/Jul/20 16:19 Worklog Time Spent: 10m Work Description: pvargacl commented on a change in pull request #1196: URL: https://github.com/apache/hive/pull/1196#discussion_r448474859 ## File path: ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java ## @@ -1305,7 +1322,9 @@ public static Directory getAcidState(FileSystem fileSystem, Path candidateDirect bestBase, ignoreEmptyFiles, abortedDirectories, fs, validTxnList); } } else { - dirSnapshots = getHdfsDirSnapshots(fs, candidateDirectory); + if (dirSnapshots == null) { Review comment: There is a slight problem here, if we are on hdfs and the file listing with id is supported. Few lines below there is a check for dirsnapshot == null, that was running every time for this case, but now it won't run if you call getacidstate with nonnull dirsnapshot This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 453499) Time Spent: 20m (was: 10m) > Optimize ACID stats generation > -- > > Key: HIVE-23791 > URL: https://issues.apache.org/jira/browse/HIVE-23791 > Project: Hive > Issue Type: Improvement > Components: Statistics, Transactions >Reporter: Peter Vary >Assignee: Peter Vary >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > Currently basic stats generation uses file listing for getting statistics, > and also uses a file listing for getting the acid state. We should optimize > this. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-23726) Create table may throw MetaException(message:java.lang.IllegalArgumentException: Can not create a Path from a null string)
[ https://issues.apache.org/jira/browse/HIVE-23726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-23726: -- Labels: pull-request-available (was: ) > Create table may throw > MetaException(message:java.lang.IllegalArgumentException: Can not create a > Path from a null string) > -- > > Key: HIVE-23726 > URL: https://issues.apache.org/jira/browse/HIVE-23726 > Project: Hive > Issue Type: Bug >Reporter: Istvan Fajth >Assignee: Naveen Gangam >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > - Given: > metastore.warehouse.tenant.colocation is set to true > a test database was created as {{create database test location '/data'}} > - When: > I try to create a table as {{create table t1 (a int) location '/data/t1'}} > - Then: > The create table fails with the following exception: > {code} > org.apache.hadoop.hive.ql.metadata.HiveException: > MetaException(message:java.lang.IllegalArgumentException: Can not create a > Path from a null string) > at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:1138) > at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:1143) > at > org.apache.hadoop.hive.ql.ddl.table.create.CreateTableOperation.createTableNonReplaceMode(CreateTableOperation.java:148) > at > org.apache.hadoop.hive.ql.ddl.table.create.CreateTableOperation.execute(CreateTableOperation.java:98) > at org.apache.hadoop.hive.ql.ddl.DDLTask.execute(DDLTask.java:80) > at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:213) > at > org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:105) > at org.apache.hadoop.hive.ql.Executor.launchTask(Executor.java:359) > at org.apache.hadoop.hive.ql.Executor.launchTasks(Executor.java:330) > at org.apache.hadoop.hive.ql.Executor.runTasks(Executor.java:246) > at org.apache.hadoop.hive.ql.Executor.execute(Executor.java:109) > at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:721) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:488) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:482) > at > org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:166) > at > org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:225) > at > org.apache.hive.service.cli.operation.SQLOperation.access$700(SQLOperation.java:87) > at > org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:322) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1876) > at > org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:340) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: org.apache.hadoop.hive.metastore.api.MetaException: > java.lang.IllegalArgumentException: Can not create a Path from a null string > at > org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$create_table_req_result$create_table_req_resultStandardScheme.read(ThriftHiveMetastore.java:63325) > at > org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$create_table_req_result$create_table_req_resultStandardScheme.read(ThriftHiveMetastore.java:63293) > at > org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$create_table_req_result.read(ThriftHiveMetastore.java:63219) > at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:86) > at > org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_create_table_req(ThriftHiveMetastore.java:1780) > at > org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.create_table_req(ThriftHiveMetastore.java:1767) > at > org.apache.hadoop.hive.metastore.HiveMetaStoreClient.create_table_with_environment_context(HiveMetaStoreClient.java:3518) > at > org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.create_table_with_environment_context(SessionHiveMetaStoreClient.java:145) >
[jira] [Work logged] (HIVE-23726) Create table may throw MetaException(message:java.lang.IllegalArgumentException: Can not create a Path from a null string)
[ https://issues.apache.org/jira/browse/HIVE-23726?focusedWorklogId=453496&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-453496 ] ASF GitHub Bot logged work on HIVE-23726: - Author: ASF GitHub Bot Created on: 01/Jul/20 16:16 Start Date: 01/Jul/20 16:16 Worklog Time Spent: 10m Work Description: nrg4878 opened a new pull request #1198: URL: https://github.com/apache/hive/pull/1198 …ll with colocation enabled (Naveen Gangam) ## NOTICE Please create an issue in ASF JIRA before opening a pull request, and you need to set the title of the pull request which starts with the corresponding JIRA issue number. (e.g. HIVE-X: Fix a typo in YYY) For more details, please see https://cwiki.apache.org/confluence/display/Hive/HowToContribute This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 453496) Remaining Estimate: 0h Time Spent: 10m > Create table may throw > MetaException(message:java.lang.IllegalArgumentException: Can not create a > Path from a null string) > -- > > Key: HIVE-23726 > URL: https://issues.apache.org/jira/browse/HIVE-23726 > Project: Hive > Issue Type: Bug >Reporter: Istvan Fajth >Assignee: Naveen Gangam >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > - Given: > metastore.warehouse.tenant.colocation is set to true > a test database was created as {{create database test location '/data'}} > - When: > I try to create a table as {{create table t1 (a int) location '/data/t1'}} > - Then: > The create table fails with the following exception: > {code} > org.apache.hadoop.hive.ql.metadata.HiveException: > MetaException(message:java.lang.IllegalArgumentException: Can not create a > Path from a null string) > at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:1138) > at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:1143) > at > org.apache.hadoop.hive.ql.ddl.table.create.CreateTableOperation.createTableNonReplaceMode(CreateTableOperation.java:148) > at > org.apache.hadoop.hive.ql.ddl.table.create.CreateTableOperation.execute(CreateTableOperation.java:98) > at org.apache.hadoop.hive.ql.ddl.DDLTask.execute(DDLTask.java:80) > at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:213) > at > org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:105) > at org.apache.hadoop.hive.ql.Executor.launchTask(Executor.java:359) > at org.apache.hadoop.hive.ql.Executor.launchTasks(Executor.java:330) > at org.apache.hadoop.hive.ql.Executor.runTasks(Executor.java:246) > at org.apache.hadoop.hive.ql.Executor.execute(Executor.java:109) > at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:721) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:488) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:482) > at > org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:166) > at > org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:225) > at > org.apache.hive.service.cli.operation.SQLOperation.access$700(SQLOperation.java:87) > at > org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:322) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1876) > at > org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:340) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: org.apache.hadoop.hive.metastore.api.MetaException: > java.lang.IllegalArgumentException: Can not create a Path from a null string > at > org.apac
[jira] [Work logged] (HIVE-23751) QTest: Override #mkdirs() method in ProxyFileSystem To Align After HADOOP-16582
[ https://issues.apache.org/jira/browse/HIVE-23751?focusedWorklogId=453497&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-453497 ] ASF GitHub Bot logged work on HIVE-23751: - Author: ASF GitHub Bot Created on: 01/Jul/20 16:16 Start Date: 01/Jul/20 16:16 Worklog Time Spent: 10m Work Description: kgyrtkirk commented on pull request #1167: URL: https://github.com/apache/hive/pull/1167#issuecomment-652514691 @shameersss1 : I think your email address is "sra?m...@qubole.com" but github wants to add it only as a "Co-Authored" thing and when it used to do this - it usually changes the author's email address to "someth...@users.noreply.github.com" there are 2 things which could cause this at https://github.com/settings/emails : * you don't have your email address associated with your github account * you have the "keep my address private" checked but...if you want me to merge it with "@users.noreply.github.com" just let me know :D This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 453497) Time Spent: 20m (was: 10m) > QTest: Override #mkdirs() method in ProxyFileSystem To Align After > HADOOP-16582 > --- > > Key: HIVE-23751 > URL: https://issues.apache.org/jira/browse/HIVE-23751 > Project: Hive > Issue Type: Task >Reporter: Syed Shameerur Rahman >Assignee: Syed Shameerur Rahman >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0, 3.2.0 > > Attachments: HIVE-23751.01.patch > > Time Spent: 20m > Remaining Estimate: 0h > > HADOOP-16582 have changed the way how mkdirs() work: > *Before HADOOP-16582:* > All calls to mkdirs(p) were fast-tracked to FileSystem.mkdirs which were then > re-routed to mkdirs(p, permission) method. For ProxyFileSytem the call would > look like > {code:java} > FileUtiles.mkdir(p) -> FileSystem.mkdirs(p) ---> > ProxyFileSytem.mkdirs(p,permission) > {code} > An implementation of FileSystem have only needed implement mkdirs(p, > permission) > *After HADOOP-16582:* > Since FilterFileSystem overrides mkdirs(p) method the new call to > ProxyFileSystem would look like > {code:java} > FileUtiles.mkdir(p) ---> FilterFileSystem.mkdirs(p) --> > {code} > This will make all the qtests fails with the below exception > {code:java} > Caused by: java.lang.IllegalArgumentException: Wrong FS: > pfile:/media/ebs1/workspace/hive-3.1-qtest/group/5/label/HiveQTest/hive-1.2.0/itests/qtest/target/warehouse/dest1, > expected: file:/// > {code} > Note: We will hit this issue when we bump up hadoop version in hive. > So as per the discussion in HADOOP-16963 ProxyFileSystem would need to > override the mkdirs(p) method inorder to solve the above problem. So now the > new flow would look like > {code:java} > FileUtiles.mkdir(p) > ProxyFileSytem.mkdirs(p) ---> > ProxyFileSytem.mkdirs(p, permission) ---> > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23793) Review of QueryInfo Class
[ https://issues.apache.org/jira/browse/HIVE-23793?focusedWorklogId=453493&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-453493 ] ASF GitHub Bot logged work on HIVE-23793: - Author: ASF GitHub Bot Created on: 01/Jul/20 16:11 Start Date: 01/Jul/20 16:11 Worklog Time Spent: 10m Work Description: belugabehr opened a new pull request #1197: URL: https://github.com/apache/hive/pull/1197 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 453493) Remaining Estimate: 0h Time Spent: 10m > Review of QueryInfo Class > - > > Key: HIVE-23793 > URL: https://issues.apache.org/jira/browse/HIVE-23793 > Project: Hive > Issue Type: Improvement >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Minor > Time Spent: 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-23793) Review of QueryInfo Class
[ https://issues.apache.org/jira/browse/HIVE-23793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-23793: -- Labels: pull-request-available (was: ) > Review of QueryInfo Class > - > > Key: HIVE-23793 > URL: https://issues.apache.org/jira/browse/HIVE-23793 > Project: Hive > Issue Type: Improvement >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Minor > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-23793) Review of QueryInfo Class
[ https://issues.apache.org/jira/browse/HIVE-23793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Mollitor reassigned HIVE-23793: - > Review of QueryInfo Class > - > > Key: HIVE-23793 > URL: https://issues.apache.org/jira/browse/HIVE-23793 > Project: Hive > Issue Type: Improvement >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Minor > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23789) Merge ValidTxnManager into DriverTxnHandler
[ https://issues.apache.org/jira/browse/HIVE-23789?focusedWorklogId=453446&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-453446 ] ASF GitHub Bot logged work on HIVE-23789: - Author: ASF GitHub Bot Created on: 01/Jul/20 15:00 Start Date: 01/Jul/20 15:00 Worklog Time Spent: 10m Work Description: miklosgergely merged pull request #1194: URL: https://github.com/apache/hive/pull/1194 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 453446) Time Spent: 50m (was: 40m) > Merge ValidTxnManager into DriverTxnHandler > --- > > Key: HIVE-23789 > URL: https://issues.apache.org/jira/browse/HIVE-23789 > Project: Hive > Issue Type: Sub-task > Components: Hive >Reporter: Miklos Gergely >Assignee: Miklos Gergely >Priority: Major > Labels: pull-request-available > Time Spent: 50m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (HIVE-23727) Improve SQLOperation log handling when cleanup
[ https://issues.apache.org/jira/browse/HIVE-23727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17140552#comment-17140552 ] Zhihua Deng edited comment on HIVE-23727 at 7/1/20, 2:52 PM: - In a busy env, the operation may be pended(asyncPrepare is enabled), so it's better to change the condition from if (shouldRunAsync() && state != OperationState.CANCELED && state != OperationState.TIMEDOUT) to if (shouldRunAsync() && oldState == OperationState.PENDING). was (Author: dengzh): In a busy env, the operation may be pended(asyncPrepare is enabled), so it's better to change the condition from _if (shouldRunAsync() && state != OperationState.CANCELED && state != OperationState.TIMEDOUT)_ to _if (shouldRunAsync() && oldState == OperationState.PENDING__)._ > Improve SQLOperation log handling when cleanup > -- > > Key: HIVE-23727 > URL: https://issues.apache.org/jira/browse/HIVE-23727 > Project: Hive > Issue Type: Improvement >Reporter: Zhihua Deng >Priority: Major > Labels: pull-request-available > Time Spent: 50m > Remaining Estimate: 0h > > The SQLOperation checks _if (shouldRunAsync() && state != > OperationState.CANCELED && state != OperationState.TIMEDOUT)_ to cancel the > background task. If true, the state should not be OperationState.CANCELED, so > logging under the state == OperationState.CANCELED should never happen. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (HIVE-23727) Improve SQLOperation log handling when cleanup
[ https://issues.apache.org/jira/browse/HIVE-23727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17140479#comment-17140479 ] Zhihua Deng edited comment on HIVE-23727 at 7/1/20, 2:51 PM: - I'm wondering if we can improve the whole branch if (shouldRunAsync() && state != OperationState.CANCELED && state != OperationState.TIMEDOUT) here. The codes here make some confusing to me, as state = OperationState.CLOSED will be the only case that the canceling background will take effect, in this case the operation may be finished, closed, failed, running(ctrl+c or session timeout) or pended. There is no need to cancel the finished, closed, failed operations, the running operations can be treated as the timeout operations, which are cleaned up by driver::close. was (Author: dengzh): I'm wondering if we can improve the whole branch ```_if (shouldRunAsync() && state != OperationState.CANCELED && state != OperationState.TIMEDOUT)_ ``` here. The codes make some confusing to me, as if the driver::close has done the case when the operation being canceled or timeout and there is no need to cancel the backgound of the operation being closed, finished or failed(error). The cases that the canceling background will take effect are operations being RUNNING and PENDING(_state=__OperationState._CLOSED is passed to _cleanup, ctrl + c)_ but canceling the background of a running operation can be treated as the timeout operation(cause they are running operation before timeout). > Improve SQLOperation log handling when cleanup > -- > > Key: HIVE-23727 > URL: https://issues.apache.org/jira/browse/HIVE-23727 > Project: Hive > Issue Type: Improvement >Reporter: Zhihua Deng >Priority: Major > Labels: pull-request-available > Time Spent: 50m > Remaining Estimate: 0h > > The SQLOperation checks _if (shouldRunAsync() && state != > OperationState.CANCELED && state != OperationState.TIMEDOUT)_ to cancel the > background task. If true, the state should not be OperationState.CANCELED, so > logging under the state == OperationState.CANCELED should never happen. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-23764) Remove unnecessary getLastFlushLength when checking delete delta files
[ https://issues.apache.org/jira/browse/HIVE-23764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17149471#comment-17149471 ] Peter Vary commented on HIVE-23764: --- [~rajesh.balamohan]: I see that in HIVE-23597 we have issues with some tests. Also caching the OrcTail might be better placed in LLAP IO, and [~szita] is working on a possible solution. What do you think about pushing this change, and if we hit some road-block with the LLAP IO solution then we might pick up HIVE-23597 again? Thanks, Peter > Remove unnecessary getLastFlushLength when checking delete delta files > -- > > Key: HIVE-23764 > URL: https://issues.apache.org/jira/browse/HIVE-23764 > Project: Hive > Issue Type: Improvement > Components: Transactions >Reporter: Peter Vary >Assignee: Peter Vary >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > VectorizedOrcAcidRowBatchReader$ColumnizedDeleteEventRegistry calls > OrcAcidUtils.getLastFlushLength for every delete delta file. > Even the comment says: > {code} > // NOTE: Calling last flush length below is more for > future-proofing when we have > // streaming deletes. But currently we don't support streaming > deletes, and this can > // be removed if this becomes a performance issue. > {code} > If we have a table with 5 updates (1 base + 5 delta + 5 delete_delta), then > for every base + delta dir we will check all of the delete_delta directories, > and check the getLastFlushLength method which will result in 6*5=30 > unnecessary NN/S3 calls. > We should remove the check as already proposed in the comment. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-20447) Add JSON Outputformat support
[ https://issues.apache.org/jira/browse/HIVE-20447?focusedWorklogId=453406&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-453406 ] ASF GitHub Bot logged work on HIVE-20447: - Author: ASF GitHub Bot Created on: 01/Jul/20 12:57 Start Date: 01/Jul/20 12:57 Worklog Time Spent: 10m Work Description: belugabehr edited a comment on pull request #1169: URL: https://github.com/apache/hive/pull/1169#issuecomment-652401570 Please also add a second JSON formatter called "jsonfile" that does not output an standard JSON array structure, but does one JSON record per line, just like Hive accepts for reading JSON. https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-StorageFormatsStorageFormatsRowFormat,StorageFormat,andSerDe It most likely can just `extend` this `JSONOutputFormat` class and override the `printHeader`/`printFooter` method to be no-ops This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 453406) Time Spent: 1h 40m (was: 1.5h) > Add JSON Outputformat support > - > > Key: HIVE-20447 > URL: https://issues.apache.org/jira/browse/HIVE-20447 > Project: Hive > Issue Type: Task > Components: Beeline >Reporter: Max Efremov >Assignee: Hunter Logan >Priority: Major > Labels: pull-request-available > Attachments: HIVE-20447.01.patch > > Time Spent: 1h 40m > Remaining Estimate: 0h > > This function is present in SQLLine. We need add it to beeline too. > https://github.com/julianhyde/sqlline/pull/84 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-20447) Add JSON Outputformat support
[ https://issues.apache.org/jira/browse/HIVE-20447?focusedWorklogId=453405&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-453405 ] ASF GitHub Bot logged work on HIVE-20447: - Author: ASF GitHub Bot Created on: 01/Jul/20 12:56 Start Date: 01/Jul/20 12:56 Worklog Time Spent: 10m Work Description: belugabehr commented on pull request #1169: URL: https://github.com/apache/hive/pull/1169#issuecomment-652401570 Please also add a second JSON formatter called "jsonfile" that does not output an standard JSON array structure, but does one JSON record per line, just like Hive accepts for reading JSON. https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-StorageFormatsStorageFormatsRowFormat,StorageFormat,andSerDe This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 453405) Time Spent: 1.5h (was: 1h 20m) > Add JSON Outputformat support > - > > Key: HIVE-20447 > URL: https://issues.apache.org/jira/browse/HIVE-20447 > Project: Hive > Issue Type: Task > Components: Beeline >Reporter: Max Efremov >Assignee: Hunter Logan >Priority: Major > Labels: pull-request-available > Attachments: HIVE-20447.01.patch > > Time Spent: 1.5h > Remaining Estimate: 0h > > This function is present in SQLLine. We need add it to beeline too. > https://github.com/julianhyde/sqlline/pull/84 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-20447) Add JSON Outputformat support
[ https://issues.apache.org/jira/browse/HIVE-20447?focusedWorklogId=453404&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-453404 ] ASF GitHub Bot logged work on HIVE-20447: - Author: ASF GitHub Bot Created on: 01/Jul/20 12:54 Start Date: 01/Jul/20 12:54 Worklog Time Spent: 10m Work Description: belugabehr commented on a change in pull request #1169: URL: https://github.com/apache/hive/pull/1169#discussion_r448341358 ## File path: pom.xml ## @@ -1486,6 +1486,7 @@ **/patchprocess/** **/metastore_db/** **/test/resources/**/*.ldif + .vscode/** Review comment: Since this has nothing to do with JSON formatting in beeline, save it for another JIRA. ## File path: beeline/src/test/org/apache/hive/beeline/TestJSONOutputFormat.java ## @@ -0,0 +1,137 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.hive.beeline; + +import static org.mockito.ArgumentMatchers.anyInt; + +import org.junit.Before; +import org.junit.Test; +import org.mockito.invocation.InvocationOnMock; +import org.mockito.stubbing.Answer; + +import java.io.PrintStream; +import java.sql.ResultSet; +import java.sql.ResultSetMetaData; +import java.sql.SQLException; +import java.sql.Types; +import java.util.ArrayList; + +import static org.junit.Assert.assertArrayEquals; +import static org.mockito.Mockito.mock; +import static org.mockito.Mockito.when; + +public class TestJSONOutputFormat { + + private final Object[][] mockRowData = { + {"aaa", true, null, Double.valueOf(3.14), "\\/\b\f\n\r\t"} + }; + private TestJSONOutputFormat.BeelineMock mockBeeline; + private ResultSet mockResultSet; + private MockRow mockRow; + + @Before + public void setupMockData() throws SQLException { +mockBeeline = new TestJSONOutputFormat.BeelineMock(); +mockResultSet = mock(ResultSet.class); + +ResultSetMetaData mockResultSetMetaData = mock(ResultSetMetaData.class); +when(mockResultSetMetaData.getColumnCount()).thenReturn(5); +when(mockResultSetMetaData.getColumnLabel(1)).thenReturn("string"); +when(mockResultSetMetaData.getColumnLabel(2)).thenReturn("boolean"); +when(mockResultSetMetaData.getColumnLabel(3)).thenReturn("null"); +when(mockResultSetMetaData.getColumnLabel(4)).thenReturn("double"); +when(mockResultSetMetaData.getColumnLabel(5)).thenReturn("special symbols"); +when(mockResultSetMetaData.getColumnType(1)).thenReturn(Types.VARCHAR); +when(mockResultSetMetaData.getColumnType(2)).thenReturn(Types.BOOLEAN); +when(mockResultSetMetaData.getColumnType(3)).thenReturn(Types.NULL); +when(mockResultSetMetaData.getColumnType(4)).thenReturn(Types.DOUBLE); +when(mockResultSetMetaData.getColumnType(5)).thenReturn(Types.VARCHAR); +when(mockResultSet.getMetaData()).thenReturn(mockResultSetMetaData); + +mockRow = new MockRow(); +// returns true as long as there is more data in mockResultData array +when(mockResultSet.next()).thenAnswer(new Answer() { + private int mockRowDataIndex = 0; + + @Override + public Boolean answer(final InvocationOnMock invocation) { +if (mockRowDataIndex < mockRowData.length) { + mockRow.setCurrentRowData(mockRowData[mockRowDataIndex]); + mockRowDataIndex++; + return true; +} else { + return false; +} + } +}); + +when(mockResultSet.getObject(anyInt())).thenAnswer(new Answer() { + @Override + public Object answer(final InvocationOnMock invocation) { +Object[] args = invocation.getArguments(); +int index = ((Integer) args[0]); +return mockRow.getColumn(index); + } +}); + } + + /** + * Test printing output data with JsonOutputFormat + */ + @Test + public final void testPrint() throws SQLException { +setupMockData(); +BufferedRows bfRows = new BufferedRows(mockBeeline, mockResultSet); +JSONOutputFormat instance = new JSONOutputFormat(mockBeeline); +instance.print(bfRows); +ArrayList actualOutput = mockBeeline.getLines(); +ArrayList expectedO
[jira] [Updated] (HIVE-23737) LLAP: Reuse dagDelete Feature Of Tez Custom Shuffle Handler Instead Of LLAP's dagDelete
[ https://issues.apache.org/jira/browse/HIVE-23737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Syed Shameerur Rahman updated HIVE-23737: - Attachment: HIVE-23737.02.patch > LLAP: Reuse dagDelete Feature Of Tez Custom Shuffle Handler Instead Of LLAP's > dagDelete > --- > > Key: HIVE-23737 > URL: https://issues.apache.org/jira/browse/HIVE-23737 > Project: Hive > Issue Type: Improvement > Components: llap >Reporter: Syed Shameerur Rahman >Assignee: Syed Shameerur Rahman >Priority: Major > Labels: pull-request-available > Attachments: HIVE-23737.01.patch, HIVE-23737.02.patch > > Time Spent: 10m > Remaining Estimate: 0h > > LLAP have a dagDelete feature added as part of HIVE-9911, But now that Tez > have added support for dagDelete in custom shuffle handler (TEZ-3362) we > could re-use that feature in LLAP. > There are some added advantages of using Tez's dagDelete feature rather than > the current LLAP's dagDelete feature. > 1) We can easily extend this feature to accommodate the upcoming features > such as vertex and failed task attempt shuffle data clean up. Refer TEZ-3363 > and TEZ-4129 > 2) It will be more easier to maintain this feature by separating it out from > the Hive's code path. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-23737) LLAP: Reuse dagDelete Feature Of Tez Custom Shuffle Handler Instead Of LLAP's dagDelete
[ https://issues.apache.org/jira/browse/HIVE-23737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17149414#comment-17149414 ] Syed Shameerur Rahman commented on HIVE-23737: -- Added unit tests in HIVE-23737.02.patch > LLAP: Reuse dagDelete Feature Of Tez Custom Shuffle Handler Instead Of LLAP's > dagDelete > --- > > Key: HIVE-23737 > URL: https://issues.apache.org/jira/browse/HIVE-23737 > Project: Hive > Issue Type: Improvement > Components: llap >Reporter: Syed Shameerur Rahman >Assignee: Syed Shameerur Rahman >Priority: Major > Labels: pull-request-available > Attachments: HIVE-23737.01.patch, HIVE-23737.02.patch > > Time Spent: 10m > Remaining Estimate: 0h > > LLAP have a dagDelete feature added as part of HIVE-9911, But now that Tez > have added support for dagDelete in custom shuffle handler (TEZ-3362) we > could re-use that feature in LLAP. > There are some added advantages of using Tez's dagDelete feature rather than > the current LLAP's dagDelete feature. > 1) We can easily extend this feature to accommodate the upcoming features > such as vertex and failed task attempt shuffle data clean up. Refer TEZ-3363 > and TEZ-4129 > 2) It will be more easier to maintain this feature by separating it out from > the Hive's code path. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23760) Upgrading to Kafka 2.5 Clients
[ https://issues.apache.org/jira/browse/HIVE-23760?focusedWorklogId=453398&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-453398 ] ASF GitHub Bot logged work on HIVE-23760: - Author: ASF GitHub Bot Created on: 01/Jul/20 12:42 Start Date: 01/Jul/20 12:42 Worklog Time Spent: 10m Work Description: akatona84 opened a new pull request #1175: URL: https://github.com/apache/hive/pull/1175 ## NOTICE Please create an issue in ASF JIRA before opening a pull request, and you need to set the title of the pull request which starts with the corresponding JIRA issue number. (e.g. HIVE-X: Fix a typo in YYY) For more details, please see https://cwiki.apache.org/confluence/display/Hive/HowToContribute This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 453398) Time Spent: 50m (was: 40m) > Upgrading to Kafka 2.5 Clients > -- > > Key: HIVE-23760 > URL: https://issues.apache.org/jira/browse/HIVE-23760 > Project: Hive > Issue Type: Improvement > Components: kafka integration >Reporter: Andras Katona >Assignee: Karen Coppage >Priority: Major > Labels: pull-request-available > Time Spent: 50m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23638) Fix FindBug issues in hive-common
[ https://issues.apache.org/jira/browse/HIVE-23638?focusedWorklogId=453395&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-453395 ] ASF GitHub Bot logged work on HIVE-23638: - Author: ASF GitHub Bot Created on: 01/Jul/20 12:40 Start Date: 01/Jul/20 12:40 Worklog Time Spent: 10m Work Description: pgaref commented on pull request #1161: URL: https://github.com/apache/hive/pull/1161#issuecomment-652394311 @belugabehr thanks for the review! Addressed your comments in the latest commit -- could you please take another look? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 453395) Time Spent: 50m (was: 40m) > Fix FindBug issues in hive-common > - > > Key: HIVE-23638 > URL: https://issues.apache.org/jira/browse/HIVE-23638 > Project: Hive > Issue Type: Sub-task >Reporter: Panagiotis Garefalakis >Assignee: Panagiotis Garefalakis >Priority: Major > Labels: pull-request-available > Attachments: spotbugsXml.xml > > Time Spent: 50m > Remaining Estimate: 0h > > mvn -Pspotbugs > -Dorg.slf4j.simpleLogger.log.org.apache.maven.plugin.surefire.SurefirePlugin=INFO > -pl :hive-common test-compile > com.github.spotbugs:spotbugs-maven-plugin:4.0.0:check -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23760) Upgrading to Kafka 2.5 Clients
[ https://issues.apache.org/jira/browse/HIVE-23760?focusedWorklogId=453393&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-453393 ] ASF GitHub Bot logged work on HIVE-23760: - Author: ASF GitHub Bot Created on: 01/Jul/20 12:39 Start Date: 01/Jul/20 12:39 Worklog Time Spent: 10m Work Description: belugabehr commented on pull request #1175: URL: https://github.com/apache/hive/pull/1175#issuecomment-652393923 Will close/open to re-launch tests This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 453393) Time Spent: 0.5h (was: 20m) > Upgrading to Kafka 2.5 Clients > -- > > Key: HIVE-23760 > URL: https://issues.apache.org/jira/browse/HIVE-23760 > Project: Hive > Issue Type: Improvement > Components: kafka integration >Reporter: Andras Katona >Assignee: Karen Coppage >Priority: Major > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23760) Upgrading to Kafka 2.5 Clients
[ https://issues.apache.org/jira/browse/HIVE-23760?focusedWorklogId=453394&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-453394 ] ASF GitHub Bot logged work on HIVE-23760: - Author: ASF GitHub Bot Created on: 01/Jul/20 12:39 Start Date: 01/Jul/20 12:39 Worklog Time Spent: 10m Work Description: belugabehr closed pull request #1175: URL: https://github.com/apache/hive/pull/1175 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 453394) Time Spent: 40m (was: 0.5h) > Upgrading to Kafka 2.5 Clients > -- > > Key: HIVE-23760 > URL: https://issues.apache.org/jira/browse/HIVE-23760 > Project: Hive > Issue Type: Improvement > Components: kafka integration >Reporter: Andras Katona >Assignee: Karen Coppage >Priority: Major > Labels: pull-request-available > Time Spent: 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23598) Add option to rewrite NTILE and RANK to sketch functions
[ https://issues.apache.org/jira/browse/HIVE-23598?focusedWorklogId=453384&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-453384 ] ASF GitHub Bot logged work on HIVE-23598: - Author: ASF GitHub Bot Created on: 01/Jul/20 12:24 Start Date: 01/Jul/20 12:24 Worklog Time Spent: 10m Work Description: kgyrtkirk merged pull request #1126: URL: https://github.com/apache/hive/pull/1126 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 453384) Time Spent: 1h 10m (was: 1h) > Add option to rewrite NTILE and RANK to sketch functions > > > Key: HIVE-23598 > URL: https://issues.apache.org/jira/browse/HIVE-23598 > Project: Hive > Issue Type: Sub-task >Reporter: Zoltan Haindrich >Assignee: Zoltan Haindrich >Priority: Major > Labels: pull-request-available > Time Spent: 1h 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HIVE-23598) Add option to rewrite NTILE and RANK to sketch functions
[ https://issues.apache.org/jira/browse/HIVE-23598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltan Haindrich resolved HIVE-23598. - Fix Version/s: 4.0.0 Resolution: Fixed merged into master. Thank you Jesus for reviewing the changes! > Add option to rewrite NTILE and RANK to sketch functions > > > Key: HIVE-23598 > URL: https://issues.apache.org/jira/browse/HIVE-23598 > Project: Hive > Issue Type: Sub-task >Reporter: Zoltan Haindrich >Assignee: Zoltan Haindrich >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 1h 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-23791) Optimize ACID stats generation
[ https://issues.apache.org/jira/browse/HIVE-23791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-23791: -- Labels: pull-request-available (was: ) > Optimize ACID stats generation > -- > > Key: HIVE-23791 > URL: https://issues.apache.org/jira/browse/HIVE-23791 > Project: Hive > Issue Type: Improvement > Components: Statistics, Transactions >Reporter: Peter Vary >Assignee: Peter Vary >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Currently basic stats generation uses file listing for getting statistics, > and also uses a file listing for getting the acid state. We should optimize > this. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23791) Optimize ACID stats generation
[ https://issues.apache.org/jira/browse/HIVE-23791?focusedWorklogId=453382&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-453382 ] ASF GitHub Bot logged work on HIVE-23791: - Author: ASF GitHub Bot Created on: 01/Jul/20 12:12 Start Date: 01/Jul/20 12:12 Worklog Time Spent: 10m Work Description: pvary opened a new pull request #1196: URL: https://github.com/apache/hive/pull/1196 Run AcidUtils.getHdfsDirSnapshots to collect the relevant files, and use that for stats generation This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 453382) Remaining Estimate: 0h Time Spent: 10m > Optimize ACID stats generation > -- > > Key: HIVE-23791 > URL: https://issues.apache.org/jira/browse/HIVE-23791 > Project: Hive > Issue Type: Improvement > Components: Statistics, Transactions >Reporter: Peter Vary >Assignee: Peter Vary >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > Currently basic stats generation uses file listing for getting statistics, > and also uses a file listing for getting the acid state. We should optimize > this. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23722) Show the operation's drilldown link to client
[ https://issues.apache.org/jira/browse/HIVE-23722?focusedWorklogId=453376&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-453376 ] ASF GitHub Bot logged work on HIVE-23722: - Author: ASF GitHub Bot Created on: 01/Jul/20 11:53 Start Date: 01/Jul/20 11:53 Worklog Time Spent: 10m Work Description: dengzhhu653 edited a comment on pull request #1145: URL: https://github.com/apache/hive/pull/1145#issuecomment-652294571 @kgyrtkirk could you please take another look at the changes? thanks This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 453376) Time Spent: 1.5h (was: 1h 20m) > Show the operation's drilldown link to client > - > > Key: HIVE-23722 > URL: https://issues.apache.org/jira/browse/HIVE-23722 > Project: Hive > Issue Type: Improvement >Reporter: Zhihua Deng >Priority: Major > Labels: pull-request-available > Time Spent: 1.5h > Remaining Estimate: 0h > > Now the HiveServer2 webui provides a drilldown link for many collected > metrics or messages about a operation, but it's not easy for a end user to > find the target url of his submitted query. Less knowledge on the deployment, > HA based environment, and the multiple running queries can make things more > difficult. The jira provides a way to show the link to the interested end > user when enabled. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HIVE-23774) Reduce log level at aggrColStatsForPartitions in MetaStoreDirectSql.java
[ https://issues.apache.org/jira/browse/HIVE-23774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Vary resolved HIVE-23774. --- Fix Version/s: 4.0.0 Resolution: Fixed Pushed to master. Thanks for the patch [~b.maidics]! > Reduce log level at aggrColStatsForPartitions in MetaStoreDirectSql.java > > > Key: HIVE-23774 > URL: https://issues.apache.org/jira/browse/HIVE-23774 > Project: Hive > Issue Type: Improvement >Reporter: Barnabas Maidics >Assignee: Barnabas Maidics >Priority: Trivial > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 20m > Remaining Estimate: 0h > > [https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java#L1589] > This log is not needed at INFO log level. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23774) Reduce log level at aggrColStatsForPartitions in MetaStoreDirectSql.java
[ https://issues.apache.org/jira/browse/HIVE-23774?focusedWorklogId=453372&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-453372 ] ASF GitHub Bot logged work on HIVE-23774: - Author: ASF GitHub Bot Created on: 01/Jul/20 11:34 Start Date: 01/Jul/20 11:34 Worklog Time Spent: 10m Work Description: pvary merged pull request #1189: URL: https://github.com/apache/hive/pull/1189 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 453372) Time Spent: 20m (was: 10m) > Reduce log level at aggrColStatsForPartitions in MetaStoreDirectSql.java > > > Key: HIVE-23774 > URL: https://issues.apache.org/jira/browse/HIVE-23774 > Project: Hive > Issue Type: Improvement >Reporter: Barnabas Maidics >Assignee: Barnabas Maidics >Priority: Trivial > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > [https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java#L1589] > This log is not needed at INFO log level. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-23737) LLAP: Reuse dagDelete Feature Of Tez Custom Shuffle Handler Instead Of LLAP's dagDelete
[ https://issues.apache.org/jira/browse/HIVE-23737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17149309#comment-17149309 ] Syed Shameerur Rahman commented on HIVE-23737: -- HIVE-23737.01.patch is the first cut WIP patch. Need to add tests around the feature and do some clean up of old code. [~rajesh.balamohan] [~prasanth_j] [~gopalv] Could you guys please share your thoughts on the this initial patch. > LLAP: Reuse dagDelete Feature Of Tez Custom Shuffle Handler Instead Of LLAP's > dagDelete > --- > > Key: HIVE-23737 > URL: https://issues.apache.org/jira/browse/HIVE-23737 > Project: Hive > Issue Type: Improvement > Components: llap >Reporter: Syed Shameerur Rahman >Assignee: Syed Shameerur Rahman >Priority: Major > Labels: pull-request-available > Attachments: HIVE-23737.01.patch > > Time Spent: 10m > Remaining Estimate: 0h > > LLAP have a dagDelete feature added as part of HIVE-9911, But now that Tez > have added support for dagDelete in custom shuffle handler (TEZ-3362) we > could re-use that feature in LLAP. > There are some added advantages of using Tez's dagDelete feature rather than > the current LLAP's dagDelete feature. > 1) We can easily extend this feature to accommodate the upcoming features > such as vertex and failed task attempt shuffle data clean up. Refer TEZ-3363 > and TEZ-4129 > 2) It will be more easier to maintain this feature by separating it out from > the Hive's code path. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-23737) LLAP: Reuse dagDelete Feature Of Tez Custom Shuffle Handler Instead Of LLAP's dagDelete
[ https://issues.apache.org/jira/browse/HIVE-23737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Syed Shameerur Rahman updated HIVE-23737: - Attachment: HIVE-23737.01.patch Status: Patch Available (was: Open) > LLAP: Reuse dagDelete Feature Of Tez Custom Shuffle Handler Instead Of LLAP's > dagDelete > --- > > Key: HIVE-23737 > URL: https://issues.apache.org/jira/browse/HIVE-23737 > Project: Hive > Issue Type: Improvement > Components: llap >Reporter: Syed Shameerur Rahman >Assignee: Syed Shameerur Rahman >Priority: Major > Labels: pull-request-available > Attachments: HIVE-23737.01.patch > > Time Spent: 10m > Remaining Estimate: 0h > > LLAP have a dagDelete feature added as part of HIVE-9911, But now that Tez > have added support for dagDelete in custom shuffle handler (TEZ-3362) we > could re-use that feature in LLAP. > There are some added advantages of using Tez's dagDelete feature rather than > the current LLAP's dagDelete feature. > 1) We can easily extend this feature to accommodate the upcoming features > such as vertex and failed task attempt shuffle data clean up. Refer TEZ-3363 > and TEZ-4129 > 2) It will be more easier to maintain this feature by separating it out from > the Hive's code path. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-23737) LLAP: Reuse dagDelete Feature Of Tez Custom Shuffle Handler Instead Of LLAP's dagDelete
[ https://issues.apache.org/jira/browse/HIVE-23737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-23737: -- Labels: pull-request-available (was: ) > LLAP: Reuse dagDelete Feature Of Tez Custom Shuffle Handler Instead Of LLAP's > dagDelete > --- > > Key: HIVE-23737 > URL: https://issues.apache.org/jira/browse/HIVE-23737 > Project: Hive > Issue Type: Improvement > Components: llap >Reporter: Syed Shameerur Rahman >Assignee: Syed Shameerur Rahman >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > LLAP have a dagDelete feature added as part of HIVE-9911, But now that Tez > have added support for dagDelete in custom shuffle handler (TEZ-3362) we > could re-use that feature in LLAP. > There are some added advantages of using Tez's dagDelete feature rather than > the current LLAP's dagDelete feature. > 1) We can easily extend this feature to accommodate the upcoming features > such as vertex and failed task attempt shuffle data clean up. Refer TEZ-3363 > and TEZ-4129 > 2) It will be more easier to maintain this feature by separating it out from > the Hive's code path. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23737) LLAP: Reuse dagDelete Feature Of Tez Custom Shuffle Handler Instead Of LLAP's dagDelete
[ https://issues.apache.org/jira/browse/HIVE-23737?focusedWorklogId=453359&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-453359 ] ASF GitHub Bot logged work on HIVE-23737: - Author: ASF GitHub Bot Created on: 01/Jul/20 10:33 Start Date: 01/Jul/20 10:33 Worklog Time Spent: 10m Work Description: shameersss1 opened a new pull request #1195: URL: https://github.com/apache/hive/pull/1195 LLAP have a dagDelete feature added as part of HIVE-9911, But now that Tez have added support for dagDelete in custom shuffle handler (TEZ-3362) we could re-use that feature in LLAP. There are some added advantages of using Tez's dagDelete feature rather than the current LLAP's dagDelete feature. 1) We can easily extend this feature to accommodate the upcoming features such as vertex and failed task attempt shuffle data clean up. Refer TEZ-3363 and TEZ-4129 2) It will be more easier to maintain this feature by separating it out from the Hive's code path. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 453359) Remaining Estimate: 0h Time Spent: 10m > LLAP: Reuse dagDelete Feature Of Tez Custom Shuffle Handler Instead Of LLAP's > dagDelete > --- > > Key: HIVE-23737 > URL: https://issues.apache.org/jira/browse/HIVE-23737 > Project: Hive > Issue Type: Improvement > Components: llap >Reporter: Syed Shameerur Rahman >Assignee: Syed Shameerur Rahman >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > LLAP have a dagDelete feature added as part of HIVE-9911, But now that Tez > have added support for dagDelete in custom shuffle handler (TEZ-3362) we > could re-use that feature in LLAP. > There are some added advantages of using Tez's dagDelete feature rather than > the current LLAP's dagDelete feature. > 1) We can easily extend this feature to accommodate the upcoming features > such as vertex and failed task attempt shuffle data clean up. Refer TEZ-3363 > and TEZ-4129 > 2) It will be more easier to maintain this feature by separating it out from > the Hive's code path. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-23787) Write all the events present in a task_queue in a single file.
[ https://issues.apache.org/jira/browse/HIVE-23787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anishek Agarwal updated HIVE-23787: --- Description: Events are not written to file when the queue becomes full, and it ignores the post_exec_hook / pre_exec_hook event. The default capacity is 64 in hive.hook.proto.queue.capacity config for hs2. Now, we will increase the queue-capacity (let's say upto 256). Also for the optimisation, need to run all the events present in a task_queue, and write in a single file. was: DAS does not get the event when the queue becomes full, and it ignores the post_exec_hook / pre_exec_hook event. The default capacity is 64 in hive.hook.proto.queue.capacity config for hs2. Now, we will increase the queue-capacity (let's say upto 256). Also for the optimisation, need to run all the events present in a task_queue, and write in a single file. > Write all the events present in a task_queue in a single file. > -- > > Key: HIVE-23787 > URL: https://issues.apache.org/jira/browse/HIVE-23787 > Project: Hive > Issue Type: Improvement > Components: Hive >Reporter: Amlesh Kumar >Assignee: Amlesh Kumar >Priority: Major > > Events are not written to file when the queue becomes full, and it ignores > the post_exec_hook / pre_exec_hook event. The default capacity is 64 in > hive.hook.proto.queue.capacity config for hs2. > Now, we will increase the queue-capacity (let's say upto 256). > Also for the optimisation, need to run all the events present in a > task_queue, and write in a single file. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23722) Show the operation's drilldown link to client
[ https://issues.apache.org/jira/browse/HIVE-23722?focusedWorklogId=453313&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-453313 ] ASF GitHub Bot logged work on HIVE-23722: - Author: ASF GitHub Bot Created on: 01/Jul/20 09:07 Start Date: 01/Jul/20 09:07 Worklog Time Spent: 10m Work Description: dengzhhu653 commented on pull request #1145: URL: https://github.com/apache/hive/pull/1145#issuecomment-652294571 @kgyrtkirk could you take a look at the changes? thanks This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 453313) Time Spent: 1h 20m (was: 1h 10m) > Show the operation's drilldown link to client > - > > Key: HIVE-23722 > URL: https://issues.apache.org/jira/browse/HIVE-23722 > Project: Hive > Issue Type: Improvement >Reporter: Zhihua Deng >Priority: Major > Labels: pull-request-available > Time Spent: 1h 20m > Remaining Estimate: 0h > > Now the HiveServer2 webui provides a drilldown link for many collected > metrics or messages about a operation, but it's not easy for a end user to > find the target url of his submitted query. Less knowledge on the deployment, > HA based environment, and the multiple running queries can make things more > difficult. The jira provides a way to show the link to the interested end > user when enabled. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-23791) Optimize ACID stats generation
[ https://issues.apache.org/jira/browse/HIVE-23791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Vary reassigned HIVE-23791: - > Optimize ACID stats generation > -- > > Key: HIVE-23791 > URL: https://issues.apache.org/jira/browse/HIVE-23791 > Project: Hive > Issue Type: Improvement > Components: Statistics, Transactions >Reporter: Peter Vary >Assignee: Peter Vary >Priority: Major > > Currently basic stats generation uses file listing for getting statistics, > and also uses a file listing for getting the acid state. We should optimize > this. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-23790) The error message length of 2000 is exceeded for scheduled query
[ https://issues.apache.org/jira/browse/HIVE-23790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aasha Medhi reassigned HIVE-23790: -- Assignee: Zoltan Haindrich (was: Aasha Medhi) > The error message length of 2000 is exceeded for scheduled query > > > Key: HIVE-23790 > URL: https://issues.apache.org/jira/browse/HIVE-23790 > Project: Hive > Issue Type: Task >Reporter: Aasha Medhi >Assignee: Zoltan Haindrich >Priority: Major > > {code:java} > 2020-07-01 08:24:23,916 ERROR org.apache.thrift.server.TThreadPoolServer: > [pool-7-thread-189]: Error occurred during processing of message. > org.datanucleus.exceptions.NucleusUserException: Attempt to store value > "FAILED: Execution Error, return code 30045 from > org.apache.hadoop.hive.ql.exec.repl.DirCopyTask. Permission denied: > user=hive, access=WRITE, inode="/":hdfs:supergroup:drwxr-xr-x > at > org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:496) > at > org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:336) > at > org.apache.ranger.authorization.hadoop.RangerHdfsAuthorizer$RangerAccessControlEnforcer.checkDefaultEnforcer(RangerHdfsAuthorizer.java:626) > at > org.apache.ranger.authorization.hadoop.RangerHdfsAuthorizer$RangerAccessControlEnforcer.checkRangerPermission(RangerHdfsAuthorizer.java:388) > at > org.apache.ranger.authorization.hadoop.RangerHdfsAuthorizer$RangerAccessControlEnforcer.checkPermissionWithContext(RangerHdfsAuthorizer.java:229) > at > org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:239) > at > org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1908) > at > org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1892) > at > org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkAncestorAccess(FSDirectory.java:1851) > at > org.apache.hadoop.hdfs.server.namenode.FSDirMkdirOp.mkdirs(FSDirMkdirOp.java:60) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:3226) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNodeRpcServer.java:1130) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.mkdirs(ClientNamenodeProtocolServerSideTranslatorPB.java:729) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:528) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1070) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:985) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:913) > at java.base/java.security.AccessController.doPrivileged(Native Method) > at java.base/javax.security.auth.Subject.doAs(Subject.java:423) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1876) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2882) > " in column ""ERROR_MESSAGE"" that has maximum length of 2000. Please correct > your data! > at > org.datanucleus.store.rdbms.mapping.datastore.CharRDBMSMapping.setString(CharRDBMSMapping.java:254) > ~[datanucleus-rdbms-4.1.19.jar:?] > at > org.datanucleus.store.rdbms.mapping.java.SingleFieldMapping.setString(SingleFieldMapping.java:180) > ~[datanucleus-rdbms-4.1.19.jar:?] > at > org.datanucleus.store.rdbms.fieldmanager.ParameterSetter.storeStringField(ParameterSetter.java:158) > ~[datanucleus-rdbms-4.1.19.jar:?] > at > org.datanucleus.state.AbstractStateManager.providedStringField(AbstractStateManager.java:1448) > ~[datanucleus-core-4.1.17.jar:?] > at > org.datanucleus.state.StateManagerImpl.providedStringField(StateManagerImpl.java:120) > ~[datanucleus-core-4.1.17.jar:?] > at > org.apache.hadoop.hive.metastore.model.MScheduledExecution.dnProvideField(MScheduledExecution.java) > ~[hive-exec-3.1.3000.7.2.1.0-246.jar:3.1.3000.7.2.1.0-246] > at > org.apache.hadoop.hive.metastore.model.MScheduledExecution.dnProvideFields(MScheduledExecution.java) > ~[hive-exec-3.1.3000.7.2.1.0-246.jar:3.1.3000.7.2.1.0-246] > at > org.datanucleus.state.StateManagerImpl.provideFields(StateManagerImpl.java:1170) > ~[datanucleus-core-4.1.17.jar:?] > at > org.datanucleus.store.rdbms.request.UpdateRequest.execute(UpdateRequest.java:326) > ~[datanucleus-rdbms-4.1.19.jar:?] > at > org.datanucleus.store.rdbms.RDBMSPersistenceHandler.updateObjectInTable(RDBMSPersi
[jira] [Assigned] (HIVE-23790) The error message length of 2000 is exceeded for scheduled query
[ https://issues.apache.org/jira/browse/HIVE-23790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aasha Medhi reassigned HIVE-23790: -- > The error message length of 2000 is exceeded for scheduled query > > > Key: HIVE-23790 > URL: https://issues.apache.org/jira/browse/HIVE-23790 > Project: Hive > Issue Type: Task >Reporter: Aasha Medhi >Assignee: Aasha Medhi >Priority: Major > > {code:java} > 2020-07-01 08:24:23,916 ERROR org.apache.thrift.server.TThreadPoolServer: > [pool-7-thread-189]: Error occurred during processing of message. > org.datanucleus.exceptions.NucleusUserException: Attempt to store value > "FAILED: Execution Error, return code 30045 from > org.apache.hadoop.hive.ql.exec.repl.DirCopyTask. Permission denied: > user=hive, access=WRITE, inode="/":hdfs:supergroup:drwxr-xr-x > at > org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:496) > at > org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:336) > at > org.apache.ranger.authorization.hadoop.RangerHdfsAuthorizer$RangerAccessControlEnforcer.checkDefaultEnforcer(RangerHdfsAuthorizer.java:626) > at > org.apache.ranger.authorization.hadoop.RangerHdfsAuthorizer$RangerAccessControlEnforcer.checkRangerPermission(RangerHdfsAuthorizer.java:388) > at > org.apache.ranger.authorization.hadoop.RangerHdfsAuthorizer$RangerAccessControlEnforcer.checkPermissionWithContext(RangerHdfsAuthorizer.java:229) > at > org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:239) > at > org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1908) > at > org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1892) > at > org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkAncestorAccess(FSDirectory.java:1851) > at > org.apache.hadoop.hdfs.server.namenode.FSDirMkdirOp.mkdirs(FSDirMkdirOp.java:60) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:3226) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNodeRpcServer.java:1130) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.mkdirs(ClientNamenodeProtocolServerSideTranslatorPB.java:729) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:528) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1070) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:985) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:913) > at java.base/java.security.AccessController.doPrivileged(Native Method) > at java.base/javax.security.auth.Subject.doAs(Subject.java:423) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1876) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2882) > " in column ""ERROR_MESSAGE"" that has maximum length of 2000. Please correct > your data! > at > org.datanucleus.store.rdbms.mapping.datastore.CharRDBMSMapping.setString(CharRDBMSMapping.java:254) > ~[datanucleus-rdbms-4.1.19.jar:?] > at > org.datanucleus.store.rdbms.mapping.java.SingleFieldMapping.setString(SingleFieldMapping.java:180) > ~[datanucleus-rdbms-4.1.19.jar:?] > at > org.datanucleus.store.rdbms.fieldmanager.ParameterSetter.storeStringField(ParameterSetter.java:158) > ~[datanucleus-rdbms-4.1.19.jar:?] > at > org.datanucleus.state.AbstractStateManager.providedStringField(AbstractStateManager.java:1448) > ~[datanucleus-core-4.1.17.jar:?] > at > org.datanucleus.state.StateManagerImpl.providedStringField(StateManagerImpl.java:120) > ~[datanucleus-core-4.1.17.jar:?] > at > org.apache.hadoop.hive.metastore.model.MScheduledExecution.dnProvideField(MScheduledExecution.java) > ~[hive-exec-3.1.3000.7.2.1.0-246.jar:3.1.3000.7.2.1.0-246] > at > org.apache.hadoop.hive.metastore.model.MScheduledExecution.dnProvideFields(MScheduledExecution.java) > ~[hive-exec-3.1.3000.7.2.1.0-246.jar:3.1.3000.7.2.1.0-246] > at > org.datanucleus.state.StateManagerImpl.provideFields(StateManagerImpl.java:1170) > ~[datanucleus-core-4.1.17.jar:?] > at > org.datanucleus.store.rdbms.request.UpdateRequest.execute(UpdateRequest.java:326) > ~[datanucleus-rdbms-4.1.19.jar:?] > at > org.datanucleus.store.rdbms.RDBMSPersistenceHandler.updateObjectInTable(RDBMSPersistenceHandler.java:409) > ~[datanucleus-rdbms-4.1.19.ja
[jira] [Work logged] (HIVE-23789) Merge ValidTxnManager into DriverTxnHandler
[ https://issues.apache.org/jira/browse/HIVE-23789?focusedWorklogId=453291&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-453291 ] ASF GitHub Bot logged work on HIVE-23789: - Author: ASF GitHub Bot Created on: 01/Jul/20 08:09 Start Date: 01/Jul/20 08:09 Worklog Time Spent: 10m Work Description: miklosgergely commented on a change in pull request #1194: URL: https://github.com/apache/hive/pull/1194#discussion_r448191052 ## File path: ql/src/java/org/apache/hadoop/hive/ql/DriverTxnHandler.java ## @@ -288,15 +313,231 @@ private void acquireLocksInternal() throws CommandProcessorException, LockExcept } } - public void addHiveLocksFromContext() { + /** + * Write the current set of valid write ids for the operated acid tables into the configuration so + * that it can be read by the input format. + */ + private ValidTxnWriteIdList recordValidWriteIds() throws LockException { +String txnString = driverContext.getConf().get(ValidTxnList.VALID_TXNS_KEY); +if (Strings.isNullOrEmpty(txnString)) { + throw new IllegalStateException("calling recordValidWritsIdss() without initializing ValidTxnList " + + JavaUtils.txnIdToString(driverContext.getTxnManager().getCurrentTxnId())); +} + +ValidTxnWriteIdList txnWriteIds = getTxnWriteIds(txnString); +setValidWriteIds(txnWriteIds); + +LOG.debug("Encoding valid txn write ids info {} txnid: {}", txnWriteIds.toString(), +driverContext.getTxnManager().getCurrentTxnId()); +return txnWriteIds; + } + + private ValidTxnWriteIdList getTxnWriteIds(String txnString) throws LockException { +List txnTables = getTransactionalTables(getTables(true, true)); +ValidTxnWriteIdList txnWriteIds = null; +if (driverContext.getCompactionWriteIds() != null) { + // This is kludgy: here we need to read with Compactor's snapshot/txn rather than the snapshot of the current + // {@code txnMgr}, in effect simulating a "flashback query" but can't actually share compactor's txn since it + // would run multiple statements. See more comments in {@link org.apache.hadoop.hive.ql.txn.compactor.Worker} + // where it start the compactor txn*/ + if (txnTables.size() != 1) { +throw new LockException("Unexpected tables in compaction: " + txnTables); + } + txnWriteIds = new ValidTxnWriteIdList(driverContext.getCompactorTxnId()); + txnWriteIds.addTableValidWriteIdList(driverContext.getCompactionWriteIds()); +} else { + txnWriteIds = driverContext.getTxnManager().getValidWriteIds(txnTables, txnString); +} +if (driverContext.getTxnType() == TxnType.READ_ONLY && !getTables(false, true).isEmpty()) { + throw new IllegalStateException(String.format( + "Inferred transaction type '%s' doesn't conform to the actual query string '%s'", + driverContext.getTxnType(), driverContext.getQueryState().getQueryString())); +} +return txnWriteIds; + } + + private void setValidWriteIds(ValidTxnWriteIdList txnWriteIds) { +driverContext.getConf().set(ValidTxnWriteIdList.VALID_TABLES_WRITEIDS_KEY, txnWriteIds.toString()); +if (driverContext.getPlan().getFetchTask() != null) { + // This is needed for {@link HiveConf.ConfVars.HIVEFETCHTASKCONVERSION} optimization which initializes JobConf + // in FetchOperator before recordValidTxns() but this has to be done after locks are acquired to avoid race + // conditions in ACID. This case is supported only for single source query. + Operator source = driverContext.getPlan().getFetchTask().getWork().getSource(); + if (source instanceof TableScanOperator) { +TableScanOperator tsOp = (TableScanOperator)source; +String fullTableName = AcidUtils.getFullTableName(tsOp.getConf().getDatabaseName(), +tsOp.getConf().getTableName()); +ValidWriteIdList writeIdList = txnWriteIds.getTableValidWriteIdList(fullTableName); +if (tsOp.getConf().isTranscationalTable() && (writeIdList == null)) { + throw new IllegalStateException(String.format( + "ACID table: %s is missing from the ValidWriteIdList config: %s", fullTableName, txnWriteIds.toString())); +} +if (writeIdList != null) { + driverContext.getPlan().getFetchTask().setValidWriteIdList(writeIdList.toString()); +} + } +} + } + + /** + * Checks whether txn list has been invalidated while planning the query. + * This would happen if query requires exclusive/semi-shared lock, and there has been a committed transaction + * on the table over which the lock is required. + */ + boolean isValidTxnListState() throws LockException { +// 1) Get valid txn list. +String txnString = driverContext.getConf().get(ValidTxnList.VALID_TXNS_KEY); +if (txnString == null) { +
[jira] [Work logged] (HIVE-23789) Merge ValidTxnManager into DriverTxnHandler
[ https://issues.apache.org/jira/browse/HIVE-23789?focusedWorklogId=453289&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-453289 ] ASF GitHub Bot logged work on HIVE-23789: - Author: ASF GitHub Bot Created on: 01/Jul/20 08:07 Start Date: 01/Jul/20 08:07 Worklog Time Spent: 10m Work Description: miklosgergely commented on a change in pull request #1194: URL: https://github.com/apache/hive/pull/1194#discussion_r448190290 ## File path: ql/src/java/org/apache/hadoop/hive/ql/Compiler.java ## @@ -188,7 +188,6 @@ private BaseSemanticAnalyzer analyze() throws Exception { // because at that point we need access to the objects. Hive.get().getMSC().flushCache(); -driverContext.setBackupContext(new Context(context)); Review comment: The usage of backupContext was removed by Peter Varga recently (https://github.com/apache/hive/commit/e2a02f1b43cba657d4d1c16ead091072be5fe834#diff-71a166c053d9c698f9cb64eaef832aff), I've asked him to confirm, and it was intentional. After this change we are only setting the backup context here, but it is never used. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 453289) Time Spent: 0.5h (was: 20m) > Merge ValidTxnManager into DriverTxnHandler > --- > > Key: HIVE-23789 > URL: https://issues.apache.org/jira/browse/HIVE-23789 > Project: Hive > Issue Type: Sub-task > Components: Hive >Reporter: Miklos Gergely >Assignee: Miklos Gergely >Priority: Major > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23789) Merge ValidTxnManager into DriverTxnHandler
[ https://issues.apache.org/jira/browse/HIVE-23789?focusedWorklogId=453281&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-453281 ] ASF GitHub Bot logged work on HIVE-23789: - Author: ASF GitHub Bot Created on: 01/Jul/20 07:54 Start Date: 01/Jul/20 07:54 Worklog Time Spent: 10m Work Description: pvary commented on a change in pull request #1194: URL: https://github.com/apache/hive/pull/1194#discussion_r448183060 ## File path: ql/src/java/org/apache/hadoop/hive/ql/DriverTxnHandler.java ## @@ -288,15 +313,231 @@ private void acquireLocksInternal() throws CommandProcessorException, LockExcept } } - public void addHiveLocksFromContext() { + /** + * Write the current set of valid write ids for the operated acid tables into the configuration so + * that it can be read by the input format. + */ + private ValidTxnWriteIdList recordValidWriteIds() throws LockException { +String txnString = driverContext.getConf().get(ValidTxnList.VALID_TXNS_KEY); +if (Strings.isNullOrEmpty(txnString)) { + throw new IllegalStateException("calling recordValidWritsIdss() without initializing ValidTxnList " + + JavaUtils.txnIdToString(driverContext.getTxnManager().getCurrentTxnId())); +} + +ValidTxnWriteIdList txnWriteIds = getTxnWriteIds(txnString); +setValidWriteIds(txnWriteIds); + +LOG.debug("Encoding valid txn write ids info {} txnid: {}", txnWriteIds.toString(), +driverContext.getTxnManager().getCurrentTxnId()); +return txnWriteIds; + } + + private ValidTxnWriteIdList getTxnWriteIds(String txnString) throws LockException { +List txnTables = getTransactionalTables(getTables(true, true)); +ValidTxnWriteIdList txnWriteIds = null; +if (driverContext.getCompactionWriteIds() != null) { + // This is kludgy: here we need to read with Compactor's snapshot/txn rather than the snapshot of the current + // {@code txnMgr}, in effect simulating a "flashback query" but can't actually share compactor's txn since it + // would run multiple statements. See more comments in {@link org.apache.hadoop.hive.ql.txn.compactor.Worker} + // where it start the compactor txn*/ + if (txnTables.size() != 1) { +throw new LockException("Unexpected tables in compaction: " + txnTables); + } + txnWriteIds = new ValidTxnWriteIdList(driverContext.getCompactorTxnId()); + txnWriteIds.addTableValidWriteIdList(driverContext.getCompactionWriteIds()); +} else { + txnWriteIds = driverContext.getTxnManager().getValidWriteIds(txnTables, txnString); +} +if (driverContext.getTxnType() == TxnType.READ_ONLY && !getTables(false, true).isEmpty()) { + throw new IllegalStateException(String.format( + "Inferred transaction type '%s' doesn't conform to the actual query string '%s'", + driverContext.getTxnType(), driverContext.getQueryState().getQueryString())); +} +return txnWriteIds; + } + + private void setValidWriteIds(ValidTxnWriteIdList txnWriteIds) { +driverContext.getConf().set(ValidTxnWriteIdList.VALID_TABLES_WRITEIDS_KEY, txnWriteIds.toString()); +if (driverContext.getPlan().getFetchTask() != null) { + // This is needed for {@link HiveConf.ConfVars.HIVEFETCHTASKCONVERSION} optimization which initializes JobConf + // in FetchOperator before recordValidTxns() but this has to be done after locks are acquired to avoid race + // conditions in ACID. This case is supported only for single source query. + Operator source = driverContext.getPlan().getFetchTask().getWork().getSource(); + if (source instanceof TableScanOperator) { +TableScanOperator tsOp = (TableScanOperator)source; +String fullTableName = AcidUtils.getFullTableName(tsOp.getConf().getDatabaseName(), +tsOp.getConf().getTableName()); +ValidWriteIdList writeIdList = txnWriteIds.getTableValidWriteIdList(fullTableName); +if (tsOp.getConf().isTranscationalTable() && (writeIdList == null)) { + throw new IllegalStateException(String.format( + "ACID table: %s is missing from the ValidWriteIdList config: %s", fullTableName, txnWriteIds.toString())); +} +if (writeIdList != null) { + driverContext.getPlan().getFetchTask().setValidWriteIdList(writeIdList.toString()); +} + } +} + } + + /** + * Checks whether txn list has been invalidated while planning the query. + * This would happen if query requires exclusive/semi-shared lock, and there has been a committed transaction + * on the table over which the lock is required. + */ + boolean isValidTxnListState() throws LockException { +// 1) Get valid txn list. +String txnString = driverContext.getConf().get(ValidTxnList.VALID_TXNS_KEY); +if (txnString == null) { + retu
[jira] [Updated] (HIVE-23789) Merge ValidTxnManager into DriverTxnHandler
[ https://issues.apache.org/jira/browse/HIVE-23789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-23789: -- Labels: pull-request-available (was: ) > Merge ValidTxnManager into DriverTxnHandler > --- > > Key: HIVE-23789 > URL: https://issues.apache.org/jira/browse/HIVE-23789 > Project: Hive > Issue Type: Sub-task > Components: Hive >Reporter: Miklos Gergely >Assignee: Miklos Gergely >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23789) Merge ValidTxnManager into DriverTxnHandler
[ https://issues.apache.org/jira/browse/HIVE-23789?focusedWorklogId=453278&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-453278 ] ASF GitHub Bot logged work on HIVE-23789: - Author: ASF GitHub Bot Created on: 01/Jul/20 07:51 Start Date: 01/Jul/20 07:51 Worklog Time Spent: 10m Work Description: pvary commented on a change in pull request #1194: URL: https://github.com/apache/hive/pull/1194#discussion_r448181689 ## File path: ql/src/java/org/apache/hadoop/hive/ql/Compiler.java ## @@ -188,7 +188,6 @@ private BaseSemanticAnalyzer analyze() throws Exception { // because at that point we need access to the objects. Hive.get().getMSC().flushCache(); -driverContext.setBackupContext(new Context(context)); Review comment: Are we sure about this? We create a backup context so if we have to reexecute the query then we have a context at hand (for removing temporary files etc) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 453278) Remaining Estimate: 0h Time Spent: 10m > Merge ValidTxnManager into DriverTxnHandler > --- > > Key: HIVE-23789 > URL: https://issues.apache.org/jira/browse/HIVE-23789 > Project: Hive > Issue Type: Sub-task > Components: Hive >Reporter: Miklos Gergely >Assignee: Miklos Gergely >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-23703) Major QB compaction with multiple FileSinkOperators results in data loss and one original file
[ https://issues.apache.org/jira/browse/HIVE-23703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17149194#comment-17149194 ] Peter Vary commented on HIVE-23703: --- [~pvargacl]: Could you please check with the flaky test jenkins: http://ci.hive.apache.org/job/hive-flaky-check/55/ If it is flaky, then we should disable it until it is fixed. Thanks, Peter > Major QB compaction with multiple FileSinkOperators results in data loss and > one original file > -- > > Key: HIVE-23703 > URL: https://issues.apache.org/jira/browse/HIVE-23703 > Project: Hive > Issue Type: Bug >Reporter: Karen Coppage >Assignee: Karen Coppage >Priority: Critical > Labels: compaction, pull-request-available > Fix For: 4.0.0 > > Time Spent: 1h 20m > Remaining Estimate: 0h > > h4. Problems > Example: > {code:java} > drop table if exists tbl2; > create transactional table tbl2 (a int, b int) clustered by (a) into 4 > buckets stored as ORC > TBLPROPERTIES('transactional'='true','transactional_properties'='default'); > insert into tbl2 values(1,2),(1,3),(1,4),(2,2),(2,3),(2,4); > insert into tbl2 values(3,2),(3,3),(3,4),(4,2),(4,3),(4,4); > insert into tbl2 values(5,2),(5,3),(5,4),(6,2),(6,3),(6,4);{code} > E.g. in the example above, bucketId=0 when a=2 and a=6. > 1. Data loss > In non-acid tables, an operator's temp files are named with their task id. > Because of this snippet, temp files in the FileSinkOperator for compaction > tables are identified by their bucket_id. > {code:java} > if (conf.isCompactionTable()) { > fsp.initializeBucketPaths(filesIdx, AcidUtils.BUCKET_PREFIX + > String.format(AcidUtils.BUCKET_DIGITS, bucketId), > isNativeTable(), isSkewedStoredAsSubDirectories); > } else { > fsp.initializeBucketPaths(filesIdx, taskId, isNativeTable(), > isSkewedStoredAsSubDirectories); > } > {code} > So 2 temp files containing data with a=2 and a=6 will be named bucket_0 and > not 00_0 and 00_1 as they would normally. > In FileSinkOperator.commit, when data with a=2, filename: bucket_0 is moved > from _task_tmp.-ext-10002 to _tmp.-ext-10002, it overwrites the files already > there with a=6 data, because it too is named bucket_0. You can see in the > logs: > {code:java} > WARN [LocalJobRunner Map Task Executor #0] exec.FileSinkOperator: Target > path > file:.../hive/ql/target/tmp/org.apache.hadoop.hive.ql.TestTxnNoBuckets-1591107230237/warehouse/testmajorcompaction/base_002_v013/.hive-staging_hive_2020-06-02_07-15-21_771_8551447285061957908-1/_tmp.-ext-10002/bucket_0 > with a size 610 exists. Trying to delete it. > {code} > 2. Results in one original file > OrcFileMergeOperator merges the results of the FSOp into 1 file named > 00_0. > h4. Fix > 1. FSOp will store data as: taskid/bucketId. e.g. 0_0/bucket_0 > 2. OrcMergeFileOp, instead of merging a bunch of files into 1 file named > 00_0, will merge all files named bucket_0 into one file named bucket_0, > and so on. > 3. MoveTask will get rid of the taskId directories if present and only move > the bucket files in them, in case OrcMergeFileOp is not run. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HIVE-23718) Extract transaction handling from Driver
[ https://issues.apache.org/jira/browse/HIVE-23718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Miklos Gergely resolved HIVE-23718. --- Resolution: Fixed > Extract transaction handling from Driver > > > Key: HIVE-23718 > URL: https://issues.apache.org/jira/browse/HIVE-23718 > Project: Hive > Issue Type: Sub-task > Components: HiveServer2 >Reporter: Miklos Gergely >Assignee: Miklos Gergely >Priority: Major > Labels: pull-request-available > Time Spent: 2h 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-23789) Merge ValidTxnManager into DriverTxnHandler
[ https://issues.apache.org/jira/browse/HIVE-23789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Miklos Gergely reassigned HIVE-23789: - > Merge ValidTxnManager into DriverTxnHandler > --- > > Key: HIVE-23789 > URL: https://issues.apache.org/jira/browse/HIVE-23789 > Project: Hive > Issue Type: Sub-task > Components: Hive >Reporter: Miklos Gergely >Assignee: Miklos Gergely >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-23703) Major QB compaction with multiple FileSinkOperators results in data loss and one original file
[ https://issues.apache.org/jira/browse/HIVE-23703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17149167#comment-17149167 ] Peter Varga commented on HIVE-23703: [~klcopp] looks like one of the new test is flaky: [http://ci.hive.apache.org/blue/organizations/jenkins/hive-precommit/detail/PR-1143/6/tests] > Major QB compaction with multiple FileSinkOperators results in data loss and > one original file > -- > > Key: HIVE-23703 > URL: https://issues.apache.org/jira/browse/HIVE-23703 > Project: Hive > Issue Type: Bug >Reporter: Karen Coppage >Assignee: Karen Coppage >Priority: Critical > Labels: compaction, pull-request-available > Fix For: 4.0.0 > > Time Spent: 1h 20m > Remaining Estimate: 0h > > h4. Problems > Example: > {code:java} > drop table if exists tbl2; > create transactional table tbl2 (a int, b int) clustered by (a) into 4 > buckets stored as ORC > TBLPROPERTIES('transactional'='true','transactional_properties'='default'); > insert into tbl2 values(1,2),(1,3),(1,4),(2,2),(2,3),(2,4); > insert into tbl2 values(3,2),(3,3),(3,4),(4,2),(4,3),(4,4); > insert into tbl2 values(5,2),(5,3),(5,4),(6,2),(6,3),(6,4);{code} > E.g. in the example above, bucketId=0 when a=2 and a=6. > 1. Data loss > In non-acid tables, an operator's temp files are named with their task id. > Because of this snippet, temp files in the FileSinkOperator for compaction > tables are identified by their bucket_id. > {code:java} > if (conf.isCompactionTable()) { > fsp.initializeBucketPaths(filesIdx, AcidUtils.BUCKET_PREFIX + > String.format(AcidUtils.BUCKET_DIGITS, bucketId), > isNativeTable(), isSkewedStoredAsSubDirectories); > } else { > fsp.initializeBucketPaths(filesIdx, taskId, isNativeTable(), > isSkewedStoredAsSubDirectories); > } > {code} > So 2 temp files containing data with a=2 and a=6 will be named bucket_0 and > not 00_0 and 00_1 as they would normally. > In FileSinkOperator.commit, when data with a=2, filename: bucket_0 is moved > from _task_tmp.-ext-10002 to _tmp.-ext-10002, it overwrites the files already > there with a=6 data, because it too is named bucket_0. You can see in the > logs: > {code:java} > WARN [LocalJobRunner Map Task Executor #0] exec.FileSinkOperator: Target > path > file:.../hive/ql/target/tmp/org.apache.hadoop.hive.ql.TestTxnNoBuckets-1591107230237/warehouse/testmajorcompaction/base_002_v013/.hive-staging_hive_2020-06-02_07-15-21_771_8551447285061957908-1/_tmp.-ext-10002/bucket_0 > with a size 610 exists. Trying to delete it. > {code} > 2. Results in one original file > OrcFileMergeOperator merges the results of the FSOp into 1 file named > 00_0. > h4. Fix > 1. FSOp will store data as: taskid/bucketId. e.g. 0_0/bucket_0 > 2. OrcMergeFileOp, instead of merging a bunch of files into 1 file named > 00_0, will merge all files named bucket_0 into one file named bucket_0, > and so on. > 3. MoveTask will get rid of the taskId directories if present and only move > the bucket files in them, in case OrcMergeFileOp is not run. -- This message was sent by Atlassian Jira (v8.3.4#803005)